TCN–Transformer Spatio-Temporal Feature Decoupling and Dynamic Kernel Density Estimation for Gas Concentration Fluctuation Warning

Wang, Yanping; Zhang, Longcheng; Yan, Zhenguo; Deng, Jun; Huang, Yuxin; Qin, Zhixin; Cao, Yuqi; Wang, Yiyang

doi:10.3390/fire8050175

Open AccessArticle

TCN–Transformer Spatio-Temporal Feature Decoupling and Dynamic Kernel Density Estimation for Gas Concentration Fluctuation Warning

by

Yanping Wang

,

Longcheng Zhang

^*,

Zhenguo Yan

,

Jun Deng

,

Yuxin Huang

,

Zhixin Qin

,

Yuqi Cao

and

Yiyang Wang

College of Safety Science and Engineering, Xi’an University of Science and Technology, Xi’an 710054, China

^*

Author to whom correspondence should be addressed.

Fire 2025, 8(5), 175; https://doi.org/10.3390/fire8050175

Submission received: 14 March 2025 / Revised: 19 April 2025 / Accepted: 22 April 2025 / Published: 30 April 2025

Download

Browse Figures

Versions Notes

Abstract

This study addresses the problems of multi-source data redundancy, insufficient feature capture timing, and delayed risk warning in the prediction of gas concentration in fully mechanized coal-mining operations by constructing a three-pronged technical approach that integrates feature dimensionality reduction, hybrid modeling, and intelligent early warning. First, sparse kernel principal component analysis (SKPCA) is used to accomplish the feature decoupling of multi-source monitoring data, and its optimal dimensionality reduction performance is verified using long-term and short-term neural networks (LSTMs). Second, an innovative TCN–Transformer hybrid architecture is proposed. The transient fluctuation characteristics of gas concentration are captured using causal dilation convolution, while a multi-head self-attention mechanism is used to analyze the cross-scale correlation of geological mining parameters. A flood optimization algorithm (FLA) is used to establish a hyperparameter collaborative optimization framework. Compared to TCN-LSTM, CNN-GRU, and other hybrid models, the hybrid model proposed in this study exhibits superior point prediction performance, with a maximum R² of 0.980988. Finally, a dynamic confidence interval is established using the locally weighted kernel density estimation (LWD-KDE) method with an optimized bandwidth, and an unsupervised early warning mechanism for the risk of gas concentration fluctuations in coal mines is constructed. The results provide a comprehensive approach to preventing and controlling gas disasters in fully mechanized mining operations. This research effectively promotes the transformation and upgrading of coal-mine-safety-monitoring systems to an active defense paradigm.

Keywords:

TCN; transformer; SKPCA; FLA; dynamic kernel density estimation; risk warning

1. Introduction

Preventing gas outbursts and deflagration disasters is the primary objective of the coal mine safety production system. With the depth of coal mining in China increasing at an average annual rate of 8~12 m, total gas emissions are increasing exponentially, making dynamic gas monitoring and precise disaster prevention and control essential components of modern mine safety management. Statistics indicate that there were 14 significant gas accidents in Chinese coal mines from 2020 to 2024, resulting in 171 deaths and direct economic losses amounting to CNY 196 million [1]. In mines, particularly those with a high likelihood of gas outburst, the time–space contradiction between high-intensity mining and complex gas occurrence conditions can increase the frequency of gas overrun in the upper corners of the working face, creating a major safety hazard. In this context, constructing a multi-scale prediction system for gas concentration in intelligent, fully mechanized mining operations has become a critical technical obstacle to overcome in mine safety engineering. By improving the predictive accuracy and real-time detection of gas concentration, the response time for early disaster warnings can be significantly reduced, facilitating a critical window for underground personnel evacuation and equipment protection [2,3].

As a core topic in the field of coal mine safety research, methods of predicting gas concentration have evolved from traditional statistical models to advanced intelligent algorithms. Utilizing gray relational analysis (GRA) and fuzzy set pair analysis (F-SPA), Ni et al. [4] established four primary geological control indexes for coal and gas outbursts. To enhance the parameter sensitivity of the gray prediction model GM (1,1) et al. [5] introduced the buffer-weakening operator to enhance the correlation of sensitive parameters, such as

q_{m}, {∆ h}_{2}, K_{1},

and

S

. Utilizing the percolation factor theory, Ma et al. [6] established a multi-physical field coupling model encompassing stress, porosity, and seepage fields (R² = 0.91) to reveal the dynamic evolution of gas migration under mining stress disturbances. Dreger et al. [7] determined the probability of coal seam collapse by the numerical range of methane content, coal hardness, desorption strength, effective diffusion coefficient and methane adsorption capacity. Although the above methods have achieved remarkable results in univariate time series prediction, limitations in multi-source heterogeneous data fusion modeling (multiple regression model R² < 0.78) remains.

A machine learning algorithm can improve the engineering adaptability of a model through parameter optimization. Anani et al. [8] used a hybrid method to review the application of machine learning (ML) in predicting coal and gas outbursts in coal mines and pointed out that the most important parameter of gas emission is the initial velocity of gas. Tutak et al. [9] introduced a method for predicting methane concentration at specific points in a coal mine using artificial neural networks, whereby by selecting an appropriate neural network based on ventilation measurements, it is possible to predict methane concentrations at selected excavation points at an acceptable level. Liu et al. [10] relied on the SPCA algorithm to establish an Informer prediction model by combining several factors closely related to the gas concentration in the working face and realized a multi-step prediction of the gas concentration in the working face. Liang [11,12] proposed the ISSA-MCMC optimized SVM model and the PHHO-KELM hybrid model, effectively solving the problem of high-dimensional parameter optimization and missing data. M et al. [13] combined WOA-ELM with case-based reasoning (CBR) to construct a gas outburst risk index system encompassing the prevention and control processes. Cai et al. [14] approached the gas concentration warning as a binary classification problem. Based on the concentration threshold, the gas concentration data were categorized into early warning and non-alarm classes, and a probability density machine (PDM) algorithm with good adaptability to data distribution imbalances was proposed. However, traditional machine learning methods still have theoretical limitations in modeling the periodic and random composite characteristics of gas concentration.

Deep learning technology provides a new paradigm for solving the above problems. Demirkan et al. [15] used the optimized long-term and short-term memory method to detect the formation of explosive methane–air mixtures on long-wall working faces and identify possible explosive gas accumulation before it becomes dangerous. The LASSO-RNN model developed by Song et al. [16] achieves multi-parameter fusion prediction through feature selection. The Pearson–LSTM model constructed by Lin et al. [17] incorporated Pearson coefficient features to screen key features and adopted the adaptive moment estimation (Adam) optimization strategy to significantly improve the stability of time series prediction. Kumari et al. [18] proposed a Uniform Flow Approximation and Projection (UMAP) and Long Short-Term Memory (LSTM) deep learning model to provide miners with early warnings about impending mine hazards. Nguyen et al. [19] utilized the enhanced particle swarm optimization (EPSO) algorithm for hyperparameter adjustment and the CNN-LSTM model for feature extraction and time pattern learning, enabling indoor air quality (IAQ) prediction in intelligent buildings. Based on the recursive feature elimination cross-validation RFECV-BiLSTM architecture, Liu et al. [20] established a framework for multifactor correlation prediction under complex geological conditions. Prasanjit et al. [21] proposed t-SNE_VAE_BiLSTM, which minimizes the dimensionality of recorded gas concentrations and explores the intrinsic features of low-dimensional gas concentrations. The Attention–TCN model proposed by Xue et al. [22] was verified via nitrogen injection replacement experiments, demonstrating the gradient advantage of long sequence processing. JIA et al. [23] used adaptive moment estimation (Adam) as an optimization algorithm to determine the learning parameters of the GRU model for predicting gas concentration values. It is worth noting that the introduction of the Transformer architecture [24] can further expand the prediction dimension. Based on the multi-sensor optimal attention (MOA) mechanism, Yang et al. [25] realized a coordinated early warning system for temperature, air pressure, and other indicators. Yan et al. [26] introduced the adaptive normalization (AN) model to standardize gas sequence data and verified the prediction performance of the standardization technology. Wang et al. [27] analyzed the Pearson correlation coefficients of different sensor data to determine the optimal parameters for gas concentration prediction and proposed an LSTM-LightGBM model based on the residual assignment of a variable weight combination approach. Dong et al. [28] optimized the sensor calibration confidence interval using Gaussian process regression, improving data reliability during the monitoring failure period. Zhang et al. [29] proposed a two-stage feature extraction method for power transformer fault prediction by combining feature ranking and genetic programming (GP), effectively facilitating the early warning of transformer faults.

Although the existing research has made significant progress in gas concentration prediction, the following technical obstacles remain: First, generalization performance in cross-time scale migration is significantly attenuated, making it difficult to adapt to the dynamic evolution of deep mining conditions. Second, over-reliance on historical data results in insufficient representation of potential evolutionary rules. Finally, the deterministic point prediction method struggles to quantify the uncertainty of the predictions, posing a risk of underreporting in extreme conditions. In view of this, this study integrates methods such as “kernel method-hybrid architecture-optimization algorithm-probability statistics”, offering a solution with theoretical rigor and engineering practicability for predicting gas concentration. First, a sparse kernel principal component analysis (SKPCA) is used to eliminate multi-index redundancy. Then, the TCN–Transformer hybrid model is constructed, and local fluctuations are captured via causal dilation convolution. The self-attention mechanism is used to model the global dependency features, and the network hyperparameters are optimized utilizing the flood optimization algorithm (FLA). Finally, the adaptive bandwidth kernel density estimation algorithm (LWD-KDE) is designed to determine the probabilistic safety interval, establish a mechanism for providing early warnings of coal mine gas concentration fluctuation risks, reduce the risk of extreme value underreporting, provide more reliable technical support for coal mine gas outburst prevention and control, and contribute to a new stage of intelligent perception and dynamic decision-making for predicting gas concentration.

2. Materials and Methods

In this study, an intelligent prediction and dynamic risk assessment system of gas concentration driven by multi-source data is constructed. Its core technical framework integrates time series feature learning and uncertainty quantification methods to form a whole process solution of “data preprocessing-hybrid modeling-risk early warning”. Based on the mine-safety-monitoring system, the standardized data-processing flow is constructed by integrating multi-dimensional time series parameters such as coal seam thickness, gas content, mining height, daily footage, ventilation volume and temperature, and humidity. Firstly, Tukey’ s Fences criterion is used to detect outliers and eliminate interference factors such as sensor drift. Then, the parameters are normalized using Z-score standardization. SKPCA is used to construct the feature space dimension reduction framework, and the radial basis kernel function and L1 regularization constraint are used to realize the feature dimension reduction.

2.1. Sparse Kernel Principal Component Analysis

With the deep integration of artificial intelligence and automation technology in the field of coal mine safety monitoring, the limitations of traditional principal component analysis (PCA) in processing dynamic nonlinear mine gas data have become increasingly prominent. PCA based on linear covariance matrix decomposition struggles to effectively capture the nonlinear coupling relationship between gas emission parameters. Although the improved sparse principal component analysis (SPCA) achieves variable selection through L1 regularization, its linear framework limits its ability to represent complex data structures. Although kernel principal component analysis (KPCA) uses kernel function mapping to achieve nonlinear dimensionality reduction, the computational complexity of the high-dimensional kernel space increases exponentially (time complexity O (

n^{3}

)), making it difficult to satisfy the requirements of real-time mine monitoring.

In view of the above technical challenges, SKPCA [30] achieves the collaborative optimization of nonlinear feature extraction and computational efficiency by combining kernel techniques and the elastic net regularization mechanism. Elastic net regularization is introduced into the kernel space feature decomposition process, and the original data are nonlinearly mapped to the reproducing kernel Hilbert space (RKHS) by the radial basis kernel function (RBF Kernel), which effectively decouples the nonlinear interaction between gas parameters. On this basis, the sparse reduction in the feature space is realized via the L1 and L2 mixed regularization constraints, and redundant variables with a contribution rate to the principal component lower than the threshold are automatically eliminated, improving the noise suppression rate [31]. At the same time, further combination with the Nyström approximation method reduces the computational complexity of the kernel matrix and improves its engineering applicability. This technology provides a solution that takes both accuracy and efficiency into account for the feature extraction of gas data in complex mine environments through the cooperative mechanism of kernel space nonlinear representation and sparse constraints [32].

In this study, the SKPCA algorithm was used to construct a framework for preprocessing gas data. The implementation was based on strict mathematical derivation and industrial scene adaptation optimization. The specific steps are as follows:

(1): Data standardization and kernel matrix construction

Z-score standardization is performed on the original monitoring data matrix

X \in R^{m \times n}

to eliminate the dimensional differences in heterogeneous parameters, such as coal seam thickness and gas content, and to ensure the standardization error. The Gaussian radial basis kernel function K is selected to construct the kernel matrix, and the sensor system error is eliminated via centralization processing of the kernel matrix.

K (x_{i}, x_{j}) = \exp (- \frac{{∥x_{i} - x_{j}∥}_{2}^{2}}{2 σ^{2}})

(1)

\tilde{K} = K_{1} - K_{1} I_{n} - I_{n} K_{1} + I_{n} K_{1} I_{n}

(2)

In Equation (1),

σ

represents the parameter of the Gaussian kernel function, which controls the smoothness of data distribution after mapping to a high-dimensional space. If

σ

is too small, the noise will be over-fitted, and if

σ

is too large, the local characteristics, such as gas outburst peaks, will be ignored. In Equation (2)

I_{n}

represents an n-dimensional matrix with elements of

1 / n

. After centralization, the row and mean value of the kernel matrix

\tilde{K}

is zero, which is used to eliminate the interference of different sensor dimension differences on the principal component extraction process.

(2): Principal component optimization and feature selection

Based on the cumulative variance contribution rate criterion (more than

85 %

), the principal component is extracted, and the feature subspace

M_{S K P C A} = [m_{1}, m_{2}, \dots, m_{k}]

is constructed. Elastic net regularization is introduced to establish the optimization objective function

N_{S K P C A_{j}}

.

N_{S K P C A_{j}} = a r g m i n {∥\tilde{K} - m_{j} N_{S K P C A_{j}}^{T} \tilde{K}∥}_{2}^{2} + λ {∥N_{S K P C A_{j}}∥}_{1} + λ {∥N_{S K P C A_{j}}∥}_{2}^{2}

(3)

In Equation (3),

λ

represents the regularization coefficient, which is used to balance the reconstruction error with sparsity and smoothness. The first item on the right side of the equation measures the reconstruction error of the data after dimensionality reduction; attempts are made to make the reconstruction error as small as possible through optimization. The second and third terms on the right are the

L_{1}

and

L_{2}

regularization terms, respectively. The

L_{1}

regularization term can make the weight coefficient vector sparse, and the

L_{2}

regularization term is used to prevent over-fitting and improve the robustness of the model to sensor noise so as to achieve automatic feature selection.

For a given

N_{S K P C A} = [\begin{matrix} N_{S K P C A}, N_{S K P C A_{2}}, \dots, N_{S K P C A_{k}} \end{matrix}]

, singular value decomposition is performed to obtain

N_{S K P C A} = U D V^{T}

, and

M_{S} K P C A = U V^{T}

is then updated.

M_{S K P C A}

is updated via the singular value iterative optimization of the matrix until the objective function is stable.

(3): Weight standardization and feature interpretation

The final principal component load matrix is standardized, and the interpretable weight vector

V_{j}

is generated using Equation (4). Each element of

V_{j}

represents the contribution of the corresponding original feature to the jth principal component:

V_{j} = N_{S K P C A_{j}} / ∥N_{S K P C A_{j}}∥, j = 1, 2, \dots, k

(4)

2.2. Construction of Gas Prediction Model Based on TCN–Transformer

A high-performance gas concentration prediction model must be capable of dynamic feature decoupling, that is, accurately separating the potential evolution mode from the noise interference and normal-working-condition data. Gas time series data have significant long-range dependence and nonlinear dynamic characteristics, presenting dual requirements for the space–time modeling ability of the model, which needs to not only capture local sensitivity characteristics such as sensor noise and equipment vibration but must also globally simplify and analyze daily and weekly trends in mining disturbances.

A traditional convolutional neural network (CNN) is limited by its local receptive field (typical value < 32 time steps) and fixed sliding-window mechanism and struggles to effectively capture periodic fluctuations in concentration caused by mining operations. TCN avoids future information leakage through causal convolution and uses exponential expansion coefficient d to expand the receptive field layer by layer to 256 time steps, improving the coverage by eight times compared to a CNN of the same depth, significantly enhancing its adaptability to the complex working conditions of a mine. Although recurrent neural networks (RNNs) and their derivative models (LSTM, BiLSTM, and GRU) alleviate gradient disappearance through a gating mechanism, their serial computing characteristics lead to low training efficiency and insufficient global dependence modeling. The Transformer architecture breaks through sequence processing bottleneck through its self-attention mechanism. Multi-head attention can capture the spatio-temporal correlation of an entire sequence, such as the cross-period coupling effect of the gas concentration in the upper corner and the return air trough, in parallel. It retains sequential order information in combination with Learnable Positional Encoding, showing better gradient propagation stability and computational efficiency in long-sequence tasks.

This study proposes a TCN–Transformer hybrid architecture, which achieves the collaborative optimization of local–global modeling through a multi-scale spatio-temporal feature fusion mechanism. As shown in Figure 1, the architecture uses a three-level parallel processing structure. After the initial feature mapping of the input layer is completed via 1D causal convolution, timing information is processed synchronously by three parallel branches with the same structure. Each branch is composed of TCN–Transformer series modules. The TCN module constructs a multi-scale feature extraction channel through four-level dilated causal convolution. The underlying convolution captures the transient fluctuations caused by sensor noise. The high-level convolution expands the receptive field to 256 time steps to cover the periodic concentration fluctuations caused by the mining operation and combines batch normalization and residual connections to ensure gradient stability. The Transformer module injects absolute time series information through learnable position coding and uses the multi-head self-attention mechanism to analyze the nonlinear coupling relationship between gas concentration and environmental parameters, such as coal seam thickness and mining intensity, and the feedforward network enhances the nonlinear expression ability of features. The innovation of this design is reflected in the complementary functions of the TCN and the Transformer. The TCN accurately captures the short-term fluctuations caused by equipment noise and instantaneous emissions with local perception ability, while the Transformer models the long-term spatial–temporal correlation between geological structure activation and coal mining technology through a global attention weight matrix. Finally, nonlinear feature fusion is achieved via layer normalization and a feedforward network, and the prediction results are output through the full connection layer. By enhancing the feature diversity of parallel branches, this architecture can not only suppress the interference of local data anomalies in the overall prediction but also mine the deep association rules in multi-source heterogeneous data and provide a robust and interpretable prediction framework for high-noise and strong non-stationary gas concentration sequences. The core mechanism and structural innovation of the TCN and Transformer encoders are described below.

2.2.1. Temporal Convolutional Network

A TCN is a convolutional neural network dedicated to processing time series. Its main structure contains multiple residual blocks, and each residual block contains dilated and causal convolution (Figure 2). Causal convolution can ensure that the output results only depend on previous input information, which effectively avoids the leakage of future information. The dilated convolution expands the receptive field of the convolutional layer without adding additional parameters through the interval sampling mechanism, thus effectively solving the problem of extracting multivariate time series features. This design enables the top-level output to integrate a wider range of historical input information, significantly enhancing the model’s ability to capture long-distance gas emission dependencies while avoiding efficiency losses due to increased network depth. In addition, the time convolution network adopts a one-dimensional full convolution architecture and strictly maintains the temporal length consistency between each hidden layer and the input layer through a zero-filling operation (ZP), ensuring that the model can process sequence data end-to-end, providing a structural basis for multi-scale feature extraction.

Suppose that the input time series

X = {X_{0}, X_{1}, \dots, X_{t}}

is given, and the corresponding output

y = {y_{0}, y_{1}, \dots, y_{t}}

is expected; then, under the action of filter f, there is

V_{j} = N_{S K P C A_{j}} / ∥N_{S K P C A_{j}}∥, j = 1, 2, \dots, k

(5)

In Equation (5),

y [t]

denotes the output of the convolution operation,

d

denotes the expansion factor,

f [k]

represents the weight of the convolution kernel,

k

represents the size of the filter, and

x [t - d \cdot k]

denotes the element

k

of the input sequence. As the network depth n increases, the amount of data obtained when collecting data from the next layer of network interval

(d - 1)

data points decreases by

d^{(n - 1)}

, which enables the filter to capture the characteristics of the gas emission sequence with a wider field of view while continuing to filter data.

To avoid performance degradation with an increase in network depth, a residual module is added to the TCN to ensure the learning performance of the deep network through the “jump connection” operation. By ensuring information transmission and effective learning, the problem of gradient disappearance in deep network training is avoided. The residual module is calculated as follows:

o (X) = A ctivation (Γ (x) + X)

(6)

In Equation (6), Activation () denotes the activation function, and γ () represents the residual function.

2.2.2. Transformer Encoder

The Transformer is a new type of deep neural network based entirely on an attention mechanism, without recursion and convolution operations. Its architecture is mainly composed of an encoder and decoder, which are each composed of stacked self-attention layers and point-by-point full connection layers. In this study, the Transformer encoder was used to achieve global correlation learning. The Transformer encoder is generally stacked with multiple encoder layers, each including an attention sublayer and a feedforward neural network sublayer. The attention sublayer includes a multi-head self-attention mechanism, residual connection, and layer normalization. The feedforward neural network sublayer includes a feedforward neural network, residual connection, and layer normalization.

To enable the model to perceive the position information of the gas concentration in the sequence, the gas vector matrix

X

is processed via feature engineering and is first flattened in one dimension. Then, sine and cosine functions based on different frequencies are introduced to encode the absolute position. The position coding is the same as the

X

dimension of the input gas vector matrix, so the position coding can be added to the corresponding position of the input vector. The specific Equation (7) is as follows:

\{\begin{matrix} E_{P E} (P_{p o s}, 2 i) = \sin (P_{p o s} / {10,000}^{2 i / d_{m o d e l}}) \\ E_{P E} (P_{p o s}, 2 i + 1) = \cos (P_{p o s} / {10,000}^{2 i / d_{m o d e l}}) \\ X_{e n c} = X + E_{P E} \end{matrix}

(7)

In Equation (7),

E_{P E} \in R^{L \cdot d_{m o d e l}}

represents position coding;

L

represents the sequence length;

2 i

and

2 i + 1

denote the parity identity,

i \in [0, d_{m o d e l} / 2)

;

P_{p o s}

represents the absolute position of the input vector in the sequence;

d_{m o d e l}

represents the dimension of the input vector;

X_{e n c}

denotes the encoded gas vector matrix.

The multi-head self-attention mechanism is the core of the Transformer encoder. Figure 3 shows the interaction process of query, key, and value matrix in the multi-head self-attention mechanism. The output of multiple attention heads is linearly spliced to obtain the final multi-head attention output.

\{\begin{matrix} a t t e n t i o n (Q, K, V) | = | s o f t m a x (Q K^{T} / \sqrt{d_{k}}) V \\ z_{i} = A t t e n t i o n (X_{e n c} W_{i}^{Q}, X_{e n c} W_{i}^{K}, X_{e n c} W_{i}^{V}) \\ Z = C o n c a t (z_{1}, z_{2}, \dots z_{i}) W^{O} \end{matrix}

(8)

In Equation (8),

Q

,

K,

and

V

represent the query matrix, key matrix, and value matrix, respectively.

W_{i}^{Q},

W_{i}^{K},

and

W_{i}^{V}

represent the parameter matrices of the query, key, and value linear transformations, respectively.

z_{i}

represents the i-th attention head,

C o n c a t ()

represents the aggregation function of

H

attention heads, and

W^{O}

is the parameter matrix of the final linear transformation of the multi-head attention mechanism.

Figure 3. Multi-head attention architecture.

2.3. Hyperparameter Optimization Framework

2.3.1. Flood Optimization Algorithm

The Flood Optimization Algorithm (FLA) is a new meta-heuristic algorithm proposed by Ghasemi et al. [33] in 2024. Its design inspiration comes from the dynamic interaction between a water body and the surface environment during a flood. The algorithm constructs an intelligent optimization framework with both global exploration and local development capabilities by mathematically modeling the multi-scale physical phenomena of flood movement, including slope runoff movement, seepage–diffusion balance, periodic water level oscillation, and population variation caused by turbulence. Differing from the traditional heuristic algorithm, the FLA introduces the infiltration and diffusion equations from hydrological dynamics into the optimization field, forming a unique “gradient drive-phase conversion” coordination mechanism. The specific strategy is as follows:

(1): Slope runoff movement strategy: Slope runoff movement is abstracted as a gradient-driven directional search process. The individual population updates its position along the negative gradient direction of the fitness function, and its step size is dynamically adjusted by the permeability coefficient, enabling it to not only expand the exploration range in the flat area but also finely converge in the steep area. Its moving equation is as follows:

$S_{i}^{n e w} = S_{b e s t} + r a n d (1, d i m) \cdot (S_{j} - S_{i})$

(9)

In Equation (9),

S_{i}

represents the individual;

S_{b e s t}

represents the global optimal solution;

d i m

represents the dimension of the problem to be optimized;

r a n d (1, d i m)

represents the generation of a random vector with an element value between 0 and 1; and

S_{i}^{n e w}

represents the position of the individual water mass after moving.

(2): Water depletion strategy. The water depletion coefficient $P k$ reflects the phenomenon of water depletion, which has a random effect on the movement and search behavior of the water mass. The calculation equation is as follows:

$P k = \frac{1.2}{i t} {[+ 1 / (\frac{m a x i t e r}{4}) \cdot i t \cdot \ln (\sqrt{m a x i t e r \cdot {i t}^{2} + 1} + \frac{m a x i t e r}{4})]}^{- \frac{2}{3}}$

(10)

In Equation (10),

m a x i t e r

denotes the maximum value of the algorithm algebra;

i t

denotes the current number of iterations of the algorithm.

(3): Flood event strategy: A flood event occurs with probability ${P k}^{r a n d n}$ , during which the individual water mass will move randomly so as to enhance the global search ability of the algorithm and prevent the algorithm from falling into the local optimum. Its moving equation is as follows:

$S_{i}^{n e w} = S_{i} + ({P k}^{r a n d n} / i t) \cdot (r a n d (1, d i m) \cdot (S_{m a x} - S_{m i n}) + S_{m i n})$

(11)
(4): Penetration and diffusion strategy: The weight distribution of local development and global exploration is realized via the coupling penetration and diffusion coefficient $P e_{i}$ of an individual $S_{i}$ from among water mass population, in which the penetration term dominates the deep mining of high-potential areas and the diffusion term promotes the maintenance of population diversity to ensure that the algorithm can explore more potential areas. The lower the fitness function value $f (S_{i})$ is, the more obvious the penetration and diffusion of water masses are. The equation is as follows:

$\{\begin{matrix} P e_{i} = {((f (S_{i}) - f_{m i n}) / (f_{m a x} - f_{m i n}))}^{2} \\ i f r a n d > r a n d + P e_{i} \end{matrix}$

(12)

In Equation (12),

f_{m i n}

and

f_{m a x}

represent the current optimal and inferior fitness function values.

(5): Water cycle simulation and individual elimination strategy: By introducing a sinusoidal disturbance factor to simulate seasonal flood fluctuations, some inferior solutions are periodically reset to jump out of the local optimum, and the oscillation period is adaptively matched with the problem dimension. To ensure the quality of the population, it is necessary to eliminate some underperforming individuals by eliminating the probability $P t$ and introduce new individuals to enhance the diversity of the population. Its calculation and movement equations are as follows:

$P t = |\sin (r a n d () / i t)|$

(13)

$\begin{matrix} S_{e}^{n e w} = S_{b e s t} + r a n d \cdot (r a n d \cdot (S_{m a x} - S_{m i n}) + S_{m i n}), e = 1 : N e \end{matrix}$

(14)

In Equation (14),

N e

denotes the number of eliminated individuals.

2.3.2. FLA Optimizes TCN–Transformer Network Hyperparameters

The hyperparameter optimization framework of this study uses the Flooding Algorithm (FLA) to construct a closed-loop iterative mechanism. The core process is shown in Figure 4. First, the hyperparameter set of the TCN–Transformer model is initialized based on prior knowledge, including the learning rate, convolution kernel size, number of attention heads, and expansion factor sequence. In the training phase, the model calculates the root mean square error (RMSE) between the predicted value and the real value through forward propagation and uses back propagation to update the network weight parameters. The FLA achieves efficient optimization by dynamically adjusting the hyperparameter search space, and each individual corresponds to a set of hyperparameter configurations.

In the optimization process, the TCN–Transformer model uses five-fold cross-validation to avoid over-fitting, and the batch size is fixed at 64 to balance memory usage and gradient stability. The final Pareto optimal solution is regarded as the optimal hyperparameter configuration, and the final prediction is performed on the test set. This step aims to evaluate the generalization ability of the model, that is, the performance of the model when dealing with unseen data. The optimization framework can significantly enhance the generalization ability of the model in a complex mine environment through the collaborative mechanism of hyperparameter space exploration and model performance feedback.

2.4. Local Weighted Regression Kernel Function Density Estimation

While the traditional point prediction method has significant limitations in the prediction of gas concentration, interval prediction technology has received increasing attention due to its probabilistic risk assessment ability [34]. Compared with deterministic point prediction, interval prediction provides statistical significance support for mine safety decision-making by constructing confidence intervals to quantify the uncertainty of prediction results to optimize risk management and control strategies and reduce false alarm rates. In the field of coal mine gas prediction, it is difficult to adapt the traditional kernel density estimation (KDE) [35] method to the non-stationary characteristics of gas concentration data because of its global fixed bandwidth. In the peak gas concentration area, the fixed bandwidth can easily cause the probability density function to be too smooth and mask the key risk signal. In the area with stable working conditions, an insufficient bandwidth will amplify the sensor’s noise interference. Although adaptive bandwidth kernel density estimation (ABKDE) partially alleviates the above problems by dynamically adjusting the bandwidth, its bandwidth adjustment mechanism usually relies only on the prior assumption of sample density (such as the k-nearest neighbor criterion) and fails to effectively mine the dynamic correlation characteristics of the local neighborhood of the data, so a theoretical bottleneck in the representation of complex nonlinear patterns remains.

This study proposes a local weighted regression kernel function density estimation (LWD-KDE) algorithm based on local weighted regression (LWR), whose core is to construct a data-driven adaptive bandwidth adjustment mechanism. The algorithm models the bandwidth as a dynamic function of the neighborhood distribution characteristics and introduces a distance-based exponential attenuation weight allocation strategy to ensure that adjacent samples have a differentiated impact on the bandwidth calculation. Specifically, for the local neighborhood of each data point, its bandwidth is dynamically optimized according to the sample distribution density and gradient change rate so that the algorithm is capable of “adapting to local conditions”. The bandwidth is reduced in the area of gas concentration fluctuation to retain the peak details, and the bandwidth is expanded in the area with stable working conditions to suppress noise interference. Compared with the traditional fixed-bandwidth kernel density estimation and adaptive bandwidth method, LWD-KDE can significantly improve prediction reliability by incorporating the neighborhood correlation analysis ability of local weighted regression, especially in processing complex data, such as in the context of coal mine gas prediction.

The specific steps to the implementation of the LWD-KDE method proposed in this study are shown below:

(1): Data preprocessing and standardization: The original gas concentration data to be predicted are standardized, the dimensional difference is eliminated, and the scale invariance of the model to the gas concentration data is ensured. The standardized data set $X = {x_{1}, x_{2}, \dots, x_{T}} \in R^{T \times d}$ , $d$ is the feature dimension, and the time step $x_{T + 1}$ is reserved as the sample point to be estimated.
(2): Kernel function selection and bandwidth initialization: The Gaussian kernel function K is used to construct a probability density mapping framework, and its infinite differentiability is used to provide support for subsequent gradient optimization so as to establish a nonlinear probability density mapping framework and lay a foundation for adaptive optimization. The initial bandwidth $h_{i j} (0)$ improves the dynamic initialization strategy based on the Silverman empirical equation, which balances the global statistical characteristics and local data distribution and provides a robust starting point for adaptive optimization.

$h_{i j} (0) = \frac{1}{Z} \exp (- {(x - x_{i})}^{2} / 2 σ^{2}) * {(3 N / 4)}^{\frac{1}{5}} * σ$

(15)

In Equation (15),

u

denotes the sample mean,

Z

represents the normalization factor, and

n

represents the number of samples.

(3): Adaptive neighborhood density estimation and bandwidth dynamic adjustment: The neighborhood data set $N (x_{i}) = {x_{j} | ∥ x_{i} - x_{j} ∥ \leq ε}$ , based on the quantile threshold $ε$ , is constructed with the sample point $x_{i}$ as the center, where ε is adaptively determined by the data quantile to ensure that the neighborhood coverage matches the spatial heterogeneity of the gas concentration. In order to characterize the unsteady characteristics of local gas emission, Equation (16) is applied to calculate the neighborhood fluctuation intensity coefficient $γ_{i}$ , and then the adaptive adjustment bandwidth parameter $h_{i j}$ is calculated. Equation (17) establishes the negative correlation mapping between the fluctuation intensity and the bandwidth parameter through the exponential attenuation mechanism so that the bandwidth of the high-fluctuation region ( $γ_{i}$ > 0.1) shrinks to enhance the resolution of the abrupt event, while the bandwidth of the low-fluctuation region ( $γ_{i}$ < 0.01) expands to enhance the noise suppression ability. Finally, the density estimator ${\hat{f}}_{h} (x_{i})$ in the local neighborhood is calculated using Equation (18) to reflect the distribution characteristics of the local data, capture the characteristics of the gas concentration fluctuation, and construct a local probability density field.

$γ_{i} = \frac{1}{∆ T} \sum_{t = t_{i} - ∆ T}^{t_{i}} {(y_{t} - \bar{y_{i}})}^{2}$

(16)

$h_{i j} = 0.023 \cdot ∆ T \cdot {γ_{i}}^{- \frac{1}{5}}$

(17)

${\hat{f}}_{h} (x_{i}) = \sum_{x_{j} ϵ ε (x_{i})} \frac{1}{h_{i j}} K ((x - x_{i}) / h_{i j})$

(18)

In Equation (16),

∆ T

represents the span of the dynamic time window, and

y_{i}

represents the average concentration in the window.

(4): Loss function definition, gradient derivation, and second-order convergence bandwidth optimization: The deviation between the estimated density $\hat{f}$ and the true density $f$ is measured using the integral square error $L (h_{i j})$ . A Monte Carlo approximation, Equation (19), is used to simplify the operation to transform the integral into a discrete summation, which fulfills the objective function by driving the bandwidth parameter to converge to the optimal solution, and the Newton iterative search method, Equation (20), is used to quickly and very precisely update the bandwidth parameter.

$L (h_{i j}) \approx \frac{1}{M} \sum_{k = 1}^{M} {(\hat{f} (x_{k}) - f (x_{k}))}^{2}$

(19)

$h_{i j} (k + 1) = h_{i j} (k) - η \cdot L (h_{i j} (k)) / \nabla^{2} L (h_{i j} (k))$

(20)

In Equation (20),

η \in (0, 1)

represents the learning rate, and

\nabla^{2} L

denotes the Hessian matrix.

The convergence criterion of the iteration is

∥ L (h_{i j} (k + 1)) - L (h_{i j} (k)) ∥ < δ o r k \geq K_{m a x}

(21)

In Equation (21),

δ

represents the error threshold, which is usually

10^{- 6},

and

K_{m a x}

represents the maximum number of iterations, which is usually 100.

(5): Dynamic bandwidth matrix analysis and application: The optimized bandwidth matrix $H \in R^{T \times T}$ contains the optimal smoothing parameter $h_{i i}$ of each sample point, which characterizes the local smoothing strength of each sample point. When $h_{i i}$ is small, it indicates that this area is an area of gas concentration fluctuation, requiring a small bandwidth to capture detail; when $h_{i i}$ is large, it indicates that in the stable working condition area, the large bandwidth can suppress the noise. The matrix output provides adaptive parameter configuration matching with local features for predicting gas concentration.

LWD-KDE has the advantages of local and global collaborative optimization, improved computational efficiency, and the ability to provide important support for safety decision-making. It has broad application prospects in the fields of gas concentration monitoring and risk assessment.

2.5. Model Performance Evaluation Index

2.5.1. Evaluation Index of Point Prediction

In order to quantify the generalization ability of the model’s point prediction performance, a four-dimensional evaluation system was constructed, including the determination coefficient (

R^{2}

), mean absolute error (

M A E

), mean absolute percentage error (

M A P E

), and root mean square error (

R M S E

). The corresponding expressions are as follows:

\{\begin{matrix} R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}} \\ M A E = \frac{1}{N} \sum_{i = 1}^{N} |y_{i} - {\hat{y}}_{i}| \\ M A P E = \frac{1}{N} \sum_{i = 1}^{N} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}| \\ M A P E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}} \end{matrix}

(22)

In Equation (22),

N

represents the sample size of the test set,

y_{i}

represents the real gas concentration observation value of the

i

th sample,

\bar{y}

represents the average actual observation value, and

{\hat{y}}_{i}

represents the predicted value provided by the prediction model.

2.5.2. Evaluation Index of Interval Prediction

To systematically evaluate the effectiveness of the probabilistic interval prediction model, this study also constructed a four-dimensional quantitative evaluation framework to comprehensively quantify the quality of interval prediction and provide a multi-dimensional evaluation benchmark for the early warning of coal mine gas concentration risk. Four interval prediction evaluation indexes were selected for comparative analysis, including the prediction interval coverage probability (

P I C P

), normalized average interval width (

P I N A W

), coverage width criterion (

C W C

), and continuous ranking probability score (

C R P S

).

The

P I C P

represents the probability that the actual observed value falls within the specified confidence level of the prediction interval (

P I

). The larger the

P I C P

value, the greater the probability that the actual observed value falls between the upper and lower bounds of the prediction. The equation is as follows:

P I C P = \frac{1}{N} \sum_{i = 1}^{N} K_{i}

(23)

In Equation (23),

K_{i}

represents the Boolean variable. If

{\hat{y}}_{i}

belongs to the prediction interval of the ith sample

[{L b}_{i}, {U b}_{i}]

, then

K_{i}

= 1; otherwise, it is 0.

The

P I N A W

is used to measure the average width of the prediction interval and standardize it to eliminate the influence of scale differences between different datasets. When the

P I C P

is the same, the model with the smaller

P I N A W

value has better interval prediction accuracy. The equation is as follows:

P I N A W = \frac{1}{N Q} \sum_{i = 1}^{N} ({U b}_{i} - {L b}_{i})

(24)

In Equation (24),

Q

represents the variation range of the predicted value.

The

C W C

is an indicator that comprehensively considers the coverage and width of the prediction interval.

\{\begin{matrix} C W C = P I N A W + γ \cdot \exp [η \cdot (P I C P - β)] \\ β = \frac{1}{2} (1 - z) \\ γ = \{\begin{matrix} 0, P I C P \geq β \\ 1, P I C P < β \end{matrix} \end{matrix}

(25)

In Equation (25),

β

represents the given confidence level, and

η

represents the penalty factor, which is used to control the degree of punishment when the expected coverage rate is not reached.

The

C R P S

is an index used to evaluate the quality of probability prediction. It measures the difference between the predicted distribution and the actual observation value. The smaller the

C R P S

value, the smaller the difference between the predicted distribution and the actual observation value. Its calculation equation is

C R P S (F, y) = \int_{- \infty}^{+ \infty} (F (x) - H (y - x))^{2} d x

(26)

In Equation (26),

F (x)

represents the predicted cumulative distribution function, which represents the probability that the predicted value is less than or equal to

x

.

H (y - x)

denotes the Heaviside step function.

3. Results and Discussion

3.1. Analysis of Influence Gas Emission Quantity Data

The temporal and spatial evolution of gas emission in a fully mechanized mining face is driven by multi-field coupling. The key influencing factors can be divided into three categories: geological structure characteristics, mining-process parameters, and environmental monitoring data. Based on industrial monitoring data from Huangling No. 2 Coal Mine from early January 2025, this study constructed a multi-source dataset containing 16-dimensional features. The geological structure characteristics included the buried depth of the coal seam

X_{1} (m)

, the thickness of the coal seam

X_{2} (m)

, the gas content of the coal seam

X_{3}

(

m^{3} / t

), the gas content of the adjacent layer

X_{4} (m^{3} / t)

, the dip angle of the coal seam

X_{5}

(

°

), and the interlayer spacing

X_{6} (m)

. The mining-process parameters included the daily advance degree

X_{7} (m / d)

, daily output

X_{8}

(million tons), mining height

X_{9} (m)

, recovery rate

X_{10} (%)

, firmness coefficient

X_{11}

, gas pressure

X_{12} (M P a),

and working face air intake

X_{13} (m^{3} / m i n)

.

X_{14} (%)

denotes the gas concentration of the air flow outside at 10 m from the working face in the mine air inlet crossheading;

X_{15} (%)

denotes the gas concentration of the air flow outside at 10 m from the working face in the return air crossheading; and

X_{16} (%)

denotes the gas concentration of the air flow at 10~15 m from the return air crossheading in the mine. The target variable

Y (%)

to be predicted is the gas concentration of the return air corner of the fully mechanized mining face. By setting gas concentration monitoring points at different positions on the working face, the gas concentration distribution can be better monitored. The experimental data collection cycle was 10 min/time, the total number of original sample groups was 628, and 620 groups of effective samples were removed after the outliers were removed. Some of the data are shown in Table 1. By integrating geological occurrence conditions, mining disturbance intensity, and real-time monitoring information, the dataset provides a complete basis for actual scene verification for a multi-factor coupling model of gas emission.

3.2. Reliability and Effect Analysis of SKPCA Dimensionality Reduction

3.2.1. Feature Dimensionality Reduction and Nonlinear Correlation Analysis

Pearson’s correlation coefficient was used to measure the degree of linear correlation between variables, and the p value was calculated using a two-tailed t-test. Figure 5 shows the correlation heat map of the influence index of gas concentration. At the significance level p < 0.01, strong correlations are marked as red, where the daily propulsion degree

X_{7}

has a significant positive correlation with the daily output

X_{8}

(

R_{7, 8}

= 0.99) and a strong linear dependence on the working face recovery rate

X_{10}

(

R_{7, 10}

= 0.96,

R_{7, 10}

= 0.99). The thickness of the coal seam

X_{2}

is positively correlated with the mining height

X_{9}

of the working face (

R_{2, 9}

= 0.94), which confirms that there is multicollinearity between the mining-process parameters that needs to be eliminated via dimensionality reduction. The concentration

Y

of the target variable return air corner has a significant positive correlation with the concentrations

X_{15}

and

X_{16}

of the return air roadway (

R_{15, Y}

= 0.87,

R_{16, Y}

= 0.75), and a weak correlation with the concentration

X_{14}

of the intake air roadway (

R_{14, Y}

= 0.29), which verifies that gas accumulation in the goaf is the main control mechanism responsible for the risk of overrun. The above analysis provides a theoretical basis for feature engineering optimization. Nonlinear principal component fusion can be performed on high-correlation variable groups such as

X_{7}

–

X_{8}

,

X_{8} - X_{10}

, and

X_{7}

–

X_{10}

to eliminate feature redundancy. Weak correlation parameters such as the coal seam dip angle

X_{5} (R_{5, Y}

= −0.013) are eliminated to improve data reliability.

Table 2 shows a performance comparison of PCA, SPCA, KPCA, and SKPCA. Among them, the sparse kernel principal component analysis (SKPCA) shows significant advantages in nonlinear feature compression and noise suppression. The experimental data show that the cumulative variance contribution rate of the first five principal components of the SKPCA was 79.18%, which was 13.25% and 10.1% higher than that of the SPCA (65.93%) and PCA (69.08%), respectively. When seven principal components were extracted, the cumulative contribution rate of SKPCA increased to 88.93%, indicating that it could retain 90% of the effective information in the original 16-dimensional data in the low-dimensional feature space. This verifies the applicability of the kernel space sparse strategy in the nonlinear evolution modeling of gas concentration. The cumulative contribution rate curve in Figure 6 shows that the SKPCA is always better than the comparison algorithm under the same number of principal components, and the contribution rate of the 6th–10th principal components increases slowly (from 84.91% to 97.03%), indicating that the algorithm effectively filters out sensor noise and redundant working condition parameters through elastic network regularization and demonstrates excellent noise robustness.

Further analysis shows that SKPCA achieves a balance between the strength of feature interpretation and computational efficiency. The variance contribution rate of the first principal component is 10.628, which is 1.3 times that of the KPCA first principal component (8.156), highlighting the ability to focus on key cause characteristics. In the 16-dimensional → 5-dimensional dimensionality reduction process, SKPCA takes only 56 ms, which is 30.4% faster than KPCA (73 ms) and further meets the real-time and accurate data analysis requirements of the mine-monitoring system.

3.2.2. Influence of Different Principal Component Numbers on Prediction Effect

Based on the long short-term memory (LSTM) architecture, this study compared the prediction performance differences of four dimensionality reduction methods under the conditions

k

= 5 and

k

= 7 (Table 3). The experimental results show that as the number of principal components increases from 5 to 7, the prediction accuracy of the model shows a systematic upward trend. When

k

= 5, the LSTM models corresponding to each dimensionality reduction method show basic prediction performance (average

R^{2}

= 0.7756), while the prediction index is significantly optimized when

k

= 7 (average

R^{2}

= 0.8857). In particular, the SK7-LSTM model based on SKPCA demonstrates the best performance at

k

= 7, and its mean square error (

M S E

= 0.000432), root mean square error (

R M S E

= 0.020784), mean absolute error (

M A E

= 0.011356), and relative mean square error (

R M E

= 0.315281) are the lowest.

The comparison of the prediction curves in Figure 7 and Figure 8 further reveals that under the condition

k

= 7, the capture error of SK7-LSTM for the event of an instantaneous change in gas concentration is small, verifying the enhancing effect of the increase in the number of principal components on the characterization of complex nonlinear modes. It is worth noting that when the number of principal components is the same, the model based on SKPCA dimensionality reduction is significantly better than the traditional PCA, SPCA, and KPCA methods according to all indicators due to the synergy of its elastic network regularization and kernel skills. Comprehensive analysis shows that when the cumulative variance contribution rate exceeds 85% (

k

= 7), SK7-LSTM provides the most reliable prediction results. The LSTM model driven by SKPCA dimensionality reduction data can balance computational efficiency and prediction reliability. This conclusion was determined to be statistically significant through a five-fold cross-validation (standard deviation

σ

< 0.015) robustness test.

3.3. Performance Verification of FLA Based on CEC 2022 Test Set

3.3.1. FLA Performance Test on CEC 2022 Test Set

In order to evaluate the global optimization ability of the FLA, based on the IEEE CEC 2022 standard test set [26,27], a Wilcoxon signed-rank test (significance level α = 0.05) was used to compare the performance difference between the FLA and mainstream methods, such as particle swarm optimization (PSO), gray wolf optimization (GWO), a sparrow search algorithm (SSA), a genetic algorithm (GA), an ant algorithm (DBO), whale optimization (WOA), and an escape algorithm (ESC). Table 4 shows the significance test results (

p

values) of each algorithm running 30 times on 12 benchmark functions under 10-dimensional conditions. Among them, the optimization accuracy of the FLA for the unimodal functions

F 1

,

F 2

,

F 6,

and

F 9

achieves rankings of 1, 3, 4, and 1, respectively, showing a strong ability to solve convex problems. In the test of multi-peak functions

F 3

,

F 4

, and

F 10

, the FLA rankings are 1, 4, and 2, respectively, and its fitness variance is 2–3 orders of magnitude higher than that of the comparison algorithm (e.g., the FLA’s variance in

F 10

is 1.11 × 10⁻³, while the PSO is 4.05 × 10⁻¹), verifying its strong robustness to noise interference.

A convergence curve analysis of Figure 9 shows that when dealing with high-dimensional, ill-conditioned problems, although the convergence speed of the FLA lags behind that of the SSA, which is an optimization method based on crowd evacuation behaviors (ESC) and the PSO in the initial stage (the convergence rate is reduced by 15–28% within 50 iterations), its global search mechanism shows significant advantages in the later stage of iteration (>150 times), and its final fitness value is 37.6% higher than that of the suboptimal algorithm on average. For example, in the optimization of the

F 3

function, the fitness of the FLA is reduced to 1.73 × 10⁻⁶ after 200 iterations, while the SSA and PSO remain at 2.22 × 10⁻⁴ and 3.59 × 10⁻⁴, respectively.

Figure 10 shows the average fitness rankings of different algorithms on a variety of test functions. The height of the column represents the average ranking of the algorithm. The lower the ranking (the shorter the column), the better the performance of the algorithm. The FLA is significantly better than the SSA (3.42), PSO (4.15), and GWO (4.76), with the lowest average ranking (1.83), and its advantages are particularly prominent when dealing with multimodal and asymmetric functions. Specifically, the average fitness of the FLA on the shift rotation function (

F 7 - F 12

) is 42.8% lower than that of the suboptimal algorithm, proving that its adaptive step size adjustment mechanism can effectively balance exploration and development. By introducing the dynamic flood diffusion operator and elite retention strategy, the algorithm realizes the simultaneous improvement of computational efficiency and optimization accuracy on the CEC 2022 test set and provides a reliable tool for the hyperparameter optimization of a gas prediction model in a complex mine environment.

3.3.2. TCN–Transformer Hyperparameter Optimization Results

The FLA is used to systematically search the hyperparameter space of the TCN–Transformer model. According to the literature on selecting FLA core parameters [33], this study selected a population size N = 20, a maximum number of iterations Maxiter = 50, an elimination probability Pt = 0.2, the number of eliminated individuals Ne = 5, and a five-dimensional optimization space (covering the number of filters, the size of the convolution kernels, the Dropout rate, the number of attention heads, and the number of hidden layer nodes). The search space definition of each hyperparameter was based on prior experimental verification, and the number of filters was set to an integer domain of 16–52 to meet the extraction requirements of time series features of different scales. The size of the convolution kernel was limited to {3,5,7} odd sets, ensuring the effective modeling of temporal causality. The Dropout rate was optimized in the 0–0.5 continuous domain to balance the model’s complexity and generalization ability. The number of attention heads (1~12) and the number of hidden layer nodes (32~128) regulated the global dependency parsing depth and feature representation capacity, respectively.

As shown in Figure 11, the FLA reached the convergence threshold after 34 iterations to obtain the optimal hyperparameter combination. At this time, the number of filters was 23, the size of the convolution kernel was 5, the Dropout rate was 0.0023, the number of attention heads was 8, and the number of hidden layer nodes was 56. The configuration improved model performance via multi-mechanism coordination, and 23 filters realized the parallel extraction of transient characteristics and modes of periodic evolution of mining disturbance. The extremely low Dropout rate suppresses the risk of overfitting while retaining the model’s capacity, and the eight-head attention mechanism effectively captures the correlation characteristics of the mining-process cycle.

3.4. TCN–Transformer Performance Test

To verify the performance-improving effect of the FLA on the TCN–Transformer model, a comparative experimental system including CNN-LSTM, CNN-GRU, TCN-LSTM, and TCN-GRU was constructed. The experimental results (Table 5 and Figure 12) show that the TCN–Transformer model optimized using the FLA performs significantly better than the comparison model in all evaluation indexes. The mean square error

M S E

= 0.000114 is 50.6% lower than the sub-optimal model TCN-GRU; the root mean square error

R M S E

= 0.010701 and mean absolute error

M A E

= 0.008215 are reduced by 9.8% and 12.3%, respectively; and the determination coefficient

R^{2}

= 0.9809 shows that the model can explain 98.09% of the variation in gas concentration, which verifies the effectiveness of its spatio-temporal feature fusion mechanism.

To systematically evaluate the prediction performance of the TCN–Transformer model, this study constructed a multi-dimensional visual analysis system. Figure 13 shows the comparison between the prediction curves of each model and the real gas concentration time series data. The prediction curve of TCN–Transformer is the closest to the real data. The average absolute deviation between the predicted trajectory and the real value

M A E

= 0.008215 is significantly lower than that of TCN-LSTM (0.010227) and TCN-GRU (0.009364), and the peak capture error in the concentration change event (e.g., with an index of 135) is 2.33% lower than that of the sub-optimal model. The fluctuation range of the error curve is narrower than that of the comparison model, and the root mean square error (0.010701) reaches the optimal level, verifying the stability of the prediction results.

The diagonal error plot in Figure 14 reveals the performance differences between the models. For example, the prediction error points of the TCN–Transformer are densely distributed near the diagonal (the mean Euclidean distance is 0.000865), while CNN-LSTM (0.001219) and TCN-GRU (0.00155) show a significant divergence trend. The joint residual distribution of Figure 15 further verifies the reliability of the model, and its residuals are concentrated in the [−0.02,0.02] interval (accounting for 94.1%). The TCN–Transformer model shows the best performance in both quantitative indicators and visualization results. It is a more reliable prediction model, and it is worth further exploring its applicability and robustness in different application scenarios.

3.5. LWD-KDE Interval Prediction Analysis

As shown in Figure 16, the gas concentration interval prediction model based on LWD-KDE shows significant advantages in dynamic tracking and uncertainty quantification. The prediction system constructed via the lightweight weighted density estimation method can achieve high-precision time series matching. The visualization results of the confidence interval show that the 5–95% probability interval, illustrated as a region of light green to dark green based on the gradient color representation, has dynamic adaptive characteristics when covering the real concentration sequence (red asterisk mark), and the 95% confidence band achieves 96.7% actual coverage in the complete test cycle. It is worth noting that in the sensitive area in which the concentration gradient changes significantly (index 40–60 interval), a stable envelope is maintained, verifying the strong adaptability of the model to nonlinear dynamics.

The multi-dimensional performance evaluation indicators further reveal the technical advantages of the model. For the 95% confidence interval, the normalized average interval width is only 16.2%, achieves interval compactness while ensuring coverage. The CRPS is 0.132, surpassing the industry benchmark threshold of 0.15, indicating that the predicted probability distribution is highly consistent with the real value and the predicted probability distribution is statistically consistent with the real data generation mechanism. The comprehensive width-coverage index is better than the reference value of 1.0, which reflects the balance optimization of the model in uncertainty quantification. The results highlight the dual advantages of the LWD-KDE method in predicting gas concentration. On the one hand, local weighted regression was used to dynamically capture the peak concentration characteristics (such as a sudden rise at an index of 68). On the other hand, non-parametric density estimation was used to generate a probabilistic safety boundary, providing a highly precise and interpretable decision support tool for coal mine safety monitoring.

The comparative analysis results of the prediction error distribution based on kernel density estimation are shown in Figure 17. Among them, the optimized bandwidth LWD-KDE method (red solid line), the traditional fixed-bandwidth KDE method (solid purple line), and its corresponding 95% confidence interval (yellow area) form a visual representation system. Quantitative analysis shows that LWD-KDE has a significant advantage in density estimation within the error core distribution interval [−0.02,0.02]. Its probability density peak reaches 40.64, which is a relative improvement of 11.3% compared to the traditional KDE method (peak 36.53). This reveals that the adaptive bandwidth mechanism effectively enhances the characterization accuracy of the error concentration trend by dynamically adjusting the kernel function smoothing parameters.

The statistical verification shows that the optimized kernel density curve and the 95% confidence interval envelope are highly consistent in the core error domain (Kolmogorov–Smirnov test statistic

D

= 0.032,

p

> 0.05

), confirming that the LWD-KDE method can accurately reconstruct the true probability structure of the error distribution. In addition, in the tail region of the error distribution (|error|

>

0.02), the bandwidth optimization strategy improves the probability density function decay rate compared to KDE, which significantly improves the ability to characterize extreme error events. Through the adaptive bandwidth optimized via the Silverman criterion, the Pareto improvement of the variance–deviation trade-off is realized, and the integral mean square error (

I M S E

= 0.087) is reduced by 23.8% compared with the fixed-bandwidth method, demonstrating the statistical superiority of the method in non-parametric estimation.

Figure 18a–d shows a comparison of the kernel density estimation curves for four sampling points (30, 60, 90, and 120) and reveals the time evolution law of error distribution prediction. The kernel density curve of each sampling point shows a significant unimodal characteristic, and its shape is visually consistent with the Gaussian distribution, indicating that the model can maintain a stable error distribution pattern at different stages of operation. Specifically, the peak density of sampling point 120 reaches 31.2 (corresponding to the mean error region), while the peak density of sampling point 90 decreases to 17.3. This difference in magnitude directly reflects the mean drift phenomenon of gas emission dynamics while the mine working face is being advanced.

A quantitative comparison between the observed data and the predicted distribution shows that the actual concentration value (red solid line) has a systematic positive offset relative to the density peak at the sampling points of 30, 60, and 120, which is consistent with the low-frequency and high-amplitude characteristics of a sudden increase in gas concentration, indicating that the model has the ability to optimize the ability to capture low-frequency and high-amplitude abnormal events. It is worth noting that the actual concentration value of sampling point 90 is located in the right tail region of the predicted distribution, and its corresponding probability density is only 66% of the peak value, suggesting that the sensitivity of the model to extreme conditions needs to be improved.

3.6. Early Warning Mechanism for Gas Concentration Mutation Risk

The early warning mechanism for the risk of changes in coal mine gas concentration proposed in this study is based on two criteria: the dynamic confidence interval and kernel density distribution characteristics. With the combination of time series volatility quantification, probability distribution morphology analysis, and risk evolution trend prediction, an unsupervised intelligent early warning mechanism is realized by integrating time series volatility and probability density morphology. When the normalized average width (

P I N A W

) of the prediction interval exceeds two times the standard deviation of the historical mean (such as when the

P I N A W

value at index 68 surges from a baseline of 0.16 to 0.28, an increase of 75%), the kernel density curve shows multi-peak distribution characteristics, and the evolution rate of the upper bound of the confidence interval breaks through the specific accelerated cumulative threshold, indicating that the fluctuation in gas concentration deviates significantly from the normal range, indicating an area with an abnormal risk of gas emission.

\{\begin{matrix} P I N A W (t) > μ_{P I N A W} + 2 σ_{P I N A W} \\ μ_{P I N A W} = \frac{1}{T} \sum_{t = 1}^{T} P I N A W (t) \end{matrix}

(27)

\exists i \in T s . t . |m_{i} - m_{i + 1}| > 0.02 a n d f (m_{i}) > 0.7 f_{m a x}, f (m_{i + 1}) > 0.5 f_{m a x}

(28)

\nabla_{u p p e r} (t) = \frac{U (t) - U (t - Δ t)}{Δ t}

(29)

In Equations (27)–(29),

μ_{P I N A W}

represents the historical average width;

σ_{P I N A W}

represents the standard deviation, representing the statistical mean and the degree of dispersion;

T

represents the gas timing cycle;

{m_{1}, m_{2}, \dots, m_{T}}

represents the extreme point sequence of the kernel density estimation curve; and

U (t)

represents the upper bound function of the confidence interval.

The mechanism further introduces the evolution rate of the upper bound of the confidence interval

\nabla_{u p p e r}

as a trend prediction index to monitor the evolution of risk. When

\nabla_{u p p e r} > 0.05 / s t e p

(such as the index 40–60 area), it indicates an accelerated cumulative gas concentration risk and that emergency ventilation must be started in advance. This early warning method, which is based on the combination of statistical distribution pattern analysis and dynamic trend quantification, overcomes the delay limitation of traditional threshold alarms. By upgrading the mode of risk identification from discrete threshold judgment to continuous probability space analysis, the technical improvement of the monitoring system from a passive alarm to active defense is realized.

4. Conclusions

4.1. Main Results

Aiming to eliminate the technical bottleneck of predicting gas concentration under complex conditions in deep mines, this study proposed an intelligent prediction and early warning system that combines spatio-temporal feature learning and uncertainty quantification. The main conclusions are as follows:

(1): A TCN–Transformer hybrid architecture was proposed to capture the transient fluctuation characteristics of gas concentration through causal dilation convolution, and the long-range spatial–temporal correlation of geological–mining parameters was analyzed by incorporating the multi-head self-attention mechanism. The experiments showed that the model achieved a prediction accuracy of R² = 0.980988 in measured data from Huangling Coal Mine, an improvement of 4.37% over the traditional TCN-GRU model.
(2): CEC 2022 was used to verify the superior performance of the FLA, and a hyperparameter collaborative optimization framework based on the FLA was developed. The optimized parameters of the TCN–Transformer model, such as the number of filters (23) and the number of attention heads (8), significantly improved the model’s adaptability to non-stationary time series, and the MSE was reduced to 0.000114.
(3): A local weighted regression kernel density estimation (LWD-KDE) method was proposed to achieve a 95% confidence interval coverage of 96.7%, while the normalized average interval width (PINAW) was compressed to 16.2%. A triple-pronged “interval width–density multi-peak–trend evolution” early warning mechanism was innovatively constructed, reducing the false negative rate under extreme working conditions.

4.2. Discussion

The coal mine gas prediction and early warning method proposed in this study, which is based on the TCN–Transformer structure, can respond rapidly to surges in gas concentration, but the interpretability, prediction accuracy, and timeliness of the model under non-steady-state conditions need to be improved. Future research will focus on achieving multi-source information fusion and model interpretability optimization by combining the Transformer architecture, dynamic graph neural network, and shape analysis.

Author Contributions

Conceptualization, Z.Y. and J.D.; methodology, L.Z.; software, L.Z.; validation, L.Z., Y.H. and Y.W. (Yanping Wang); formal analysis, Y.C.; investigation, Z.Q.; resources, J.D.; data curation, L.Z.; writing—original draft preparation, L.Z.; writing—review and editing, Z.Y.; visualization, Y.H.; supervision, J.D.; project administration, Z.Y. and Y.W. (Yiyang Wang); funding acquisition, J.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Research and Development Program of Shaanxi Province, grant numbers 2022GY-150.

Data Availability Statement

The data are not publicly available due to commercial confidentiality, as they contain information that could compromise the privacy of research participants.

Acknowledgments

Thank you for the strong support of Xi’an University of Science and Technology platform, Yan Zhenguo and other members of the intelligent ventilation team research group.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ni, K.; Feng, Y. Research on the safety development laws of coal mines in major coal-producing countries around the world. China Coal 2024, 50, 213–223. [Google Scholar] [CrossRef]
Malozyomov, B.V.; Golik, V.I.; Brigida, V.; Kukartsev, V.V.; Tynchenko, Y.A.; Boyko, A.A.; Tynchenko, S.V. Substantiation of Drilling Parameters for Undermined Drainage Boreholes for Increasing Methane Production from Unconventional Coal-Gas Collectors. Energies 2023, 16, 4276. [Google Scholar] [CrossRef]
Chang, H. Research on Coal Mine Longwall Face Gas State Analysis and Safety Warning Strategy Based on Multi-Sensor Forecasting Models. Sci. Rep. 2024, 14, 13795. [Google Scholar] [CrossRef]
Nie, Y.; Wang, Y.; Wang, R. Coal and Gas Outburst Risk Prediction Based on the F-SPA Model. Energy Sources Part A Recovery Util. Environ. Eff. 2019, 45, 2717–2739. [Google Scholar] [CrossRef]
Lu, J.; Jia, X.; Guo, X. Analysis on sensitive indicators of gas outburst based on improved gray prediction method. China Saf. Sci. J. 2022, 32, 74–81. [Google Scholar] [CrossRef]
Ma, Y.; Chen, M. Identification of Gas Outburst Precursors Based on Outburst Percolation Theory. Sci. Rep. 2025, 15, 3228. [Google Scholar] [CrossRef]
Dreger, M.; Celary, P. The Outburst Probability Index (Ww) as a New Tool in the Coal Seam Outburst Hazard Forecasting. J. Sustain. Min. 2024, 23, 55–60. [Google Scholar] [CrossRef]
Anani, A.; Adewuyi, S.O.; Risso, N.; Nyaaba, W. Advancements in Machine Learning Techniques for Coal and Gas Outburst Prediction in Underground Mines. Int. J. Coal Geol. 2024, 285, 104471. [Google Scholar] [CrossRef]
Tutak, M.; Krenicky, T.; Pirník, R.; Brodny, J.; Grebski, W.W. Predicting Methane Concentrations in Underground Coal Mining Using a Multi-Layer Perceptron Neural Network Based on Mine Gas Monitoring Data. Sustainability 2024, 16, 8388. [Google Scholar] [CrossRef]
Liu, B.; Li, Z.; Zang, Z.; Yin, S.; Niu, Y.; Cai, M. Multi-Information Fusion Gas Concentration Prediction of Working Face Based on Informer. Min. Metall. Explor. 2025, 42, 597–613. [Google Scholar] [CrossRef]
Shao, L.; Gao, Y. A Gas Prominence Prediction Model Based on Entropy-Weighted Gray Correlation and MCMC-ISSA-SVM. Processes 2023, 11, 2098. [Google Scholar] [CrossRef]
Shao, L.; Chen, W. Coal and Gas Outburst Prediction Model Based on Miceforest Filling and PHHO–KELM. Processes 2023, 11, 2722. [Google Scholar] [CrossRef]
Miao, D.; Ji, J.; Chen, X.; Lv, Y.; Liu, L.; Sui, X. Coal and Gas Outburst Risk Prediction and Management Based on WOA-ELM. Appl. Sci. 2022, 12, 10967. [Google Scholar] [CrossRef]
Cai, Y.; Wu, S.; Zhou, M.; Gao, S.; Yu, H. Early Warning of Gas Concentration in Coal Mines Production Based on Probability Density Machine. Sensors 2021, 21, 5730. [Google Scholar] [CrossRef]
Demirkan, D.C.; Duzgun, H.S.; Juganda, A.; Brune, J.; Bogin, G. Real-Time Methane Prediction in Underground Longwall Coal Mining Using AI. Energies 2022, 15, 6486. [Google Scholar] [CrossRef]
Song, S.; Li, S.; Zhang, T.; Ma, L.; Pan, S.; Gao, L. Research on a Multi-Parameter Fusion Prediction Model of Pressure Relief Gas Concentration Based on RNN. Energies 2021, 14, 1384. [Google Scholar] [CrossRef]
Liu, C.; Zhang, A.; Xue, J.; Lei, C.; Zeng, X. LSTM-Pearson Gas Concentration Prediction Model Feature Selection and Its Applications. Energies 2023, 16, 2318. [Google Scholar] [CrossRef]
Kumari, K.; Dey, P.; Kumar, C.; Pandit, D.; Mishra, S.S.; Kisku, V.; Chaulya, S.K.; Ray, S.K.; Prasad, G.M. UMAP and LSTM Based Fire Status and Explosibility Prediction for Sealed-off Area in Underground Coal Mine. Process Saf. Environ. Prot. 2021, 146, 837–852. [Google Scholar] [CrossRef]
Nguyen, T.-P. AIoT-Based Indoor Air Quality Prediction for Building Using Enhanced Metaheuristic Algorithm and Hybrid Deep Learning. J. Build. Eng. 2025, 105, 112448. [Google Scholar] [CrossRef]
Lin, H.; Li, W.; Li, S.; Wang, L.; Ge, J.; Tian, Y.; Zhou, J. Coal Mine Gas Emission Prediction Based on Multifactor Time Series Method. Reliab. Eng. Syst. Saf. 2024, 252, 110443. [Google Scholar] [CrossRef]
Dey, P.; Saurabh, K.; Kumar, C.; Pandit, D.; Chaulya, S.K.; Ray, S.K.; Prasad, G.M.; Mandal, S.K. T-SNE and Variational Auto-Encoder with a Bi-LSTM Neural Network-Based Model for Prediction of Gas Concentration in a Sealed-off Area of Underground Coal Mines. Soft Comput. 2021, 25, 14183–14207. [Google Scholar] [CrossRef]
Xue, H.; Gui, X.; Wang, G.; Yang, X.; Gong, H.; Du, F. Prediction of Gas Drainage Changes from Nitrogen Replacement: A Study of a TCN Deep Learning Model with Integrated Attention Mechanism. Fuel 2024, 357, 129797. [Google Scholar] [CrossRef]
Jia, P.; Liu, H.; Wang, S.; Wang, P. Research on a Mine Gas Concentration Forecasting Model Based on a GRU Network. IEEE Access 2020, 8, 38023–38031. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process Syst. 2017, 30. [Google Scholar] [CrossRef]
Yang, H.; Wang, J.; Zhang, H. Research on Gas Multi-Indicator Warning Method of Coal Mine Working Face Based on MOA-Transformer. ACS Omega 2024, 9, 22136–22144. [Google Scholar] [CrossRef]
Yan, Z.; Qin, Z.; Fan, J.; Huang, Y.; Wang, Y.; Zhang, J.; Zhang, L.; Cao, Y. Gas Outburst Warning Method in Driving Faces: Enhanced Methodology through Optuna Optimization, Adaptive Normalization, and Transformer Framework. Sensors 2024, 24, 3150. [Google Scholar] [CrossRef]
Wang, X.; Xu, N.; Meng, X.; Chang, H. Prediction of Gas Concentration Based on LSTM-LightGBM Variable Weight Combination Model. Energies 2022, 15, 827. [Google Scholar] [CrossRef]
Liang, Y.; Li, S.; Li, Q.; Guo, Y.; Sun, W.; Zheng, M.; Wang, C. Prediction of Gas Concentration in the Upper Corner of Mining Working Face Based on the FEDformer-LGBM-AT Architecture. J. China Coal Soc. 2025, 50, 360–378. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, H.C.; Du, Y.; Chen, M.; Liang, J.; Li, J.; Fan, X.; Sun, L.; Cheng, Q.S.; Yao, X. Early Warning of Incipient Faults for Power Transformer Based on DGA Using a Two-Stage Feature Extraction Technique. IEEE Trans. Power Deliv. 2021, 37, 2040–2049. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T.; Tibshirani, R. Sparse Principal Component Analysis. J. Comput. Graph. Stat. 2006, 15, 265–286. [Google Scholar] [CrossRef]
Xu, Y.; Cheng, Y. Prediction of coal and gas outburst risk based on SKPCA and NEAT algorithm. J. Saf. Environ. 2021, 21, 1427–1433. [Google Scholar] [CrossRef]
Zhao, H.; Gong, Z.; Gan, K.; Gan, Y.; Xing, H.; Wang, S. Supervised Kernel Principal Component Analysis-Polynomial Chaos-Kriging for High-Dimensional Surrogate Modelling and Optimization. Knowl.-Based Syst. 2024, 305, 112617. [Google Scholar] [CrossRef]
Ghasemi, M.; Golalipour, K.; Zare, M.; Mirjalili, S.; Trojovský, P.; Abualigah, L.; Hemmati, R. Flood Algorithm (FLA) an Efficient Inspired Meta. J. Supercomput. 2024, 80, 22913–23017. [Google Scholar] [CrossRef]
Gao, J.; Cheng, Y.; Zhang, D.; Chen, Y. Physics-Constrained Wind Power Forecasting Aligned with Probability Distributions for Noise-Resilient Deep Learning. Appl. Energy 2025, 383, 125295. [Google Scholar] [CrossRef]
Su, Q.; Lu, H.; Yin, X.; Lu, Q.; Yan, J. Hybrid Point-Interval Prediction Method for Stochastic Dynamic Response of Subsea Umbilical Cable Based on BO-BiLSTM and Adaptive Bandwidth KDE. Ocean. Eng. 2025, 320, 120317. [Google Scholar] [CrossRef]

Figure 1. TCN–Transformer hybrid architecture.

Figure 2. Causal convolution and dilation convolution module.

Figure 4. FLA optimizes TCN–Transformer hyperparameter framework.

Figure 5. Correlation heat map of gas concentration influence index.

Figure 6. Principal component variance and contribution rate analysis.

Figure 7. Prediction effect of LSTM with five principal elements.

Figure 8. Prediction effect of LSTM with seven principal elements.

Figure 9. The optimal convergence diagram of FLA for different functions.

Figure 10. FLA rankings for different functions.

Figure 11. Hyperparameter iteration diagram of FLA optimization model.

Figure 12. Comparison of prediction performance of different hybrid models.

Figure 13. The prediction effect diagram of different mixed models.

Figure 14. Diagonal error plots of different hybrid models.

Figure 15. Prediction residual plot of TCN–Transformer model.

Figure 16. LWD-KDE interval prediction plots under different confidence intervals.

Figure 17. Kernel density estimation curve.

Figure 18. (a–d) Prediction accuracy analysis for sampling points 30, 90, 60, and 120.

Table 1. Partial data of factors influencing gas emission quantity.

	$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$\dots$	$X_{13}$	$X_{14}$	$X_{15}$	$X_{16}$	$Y$
$t_{1}$	525.52	2.887	4.210	4.037	1.940	33.143	$\dots$	1660.61	0.161	0.263	0.125	0.384
$t_{2}$	524.56	2.873	2.519	2.447	1.940	33.143	$\dots$	1680.36	0.165	0.345	0.129	0.416
$t_{3}$	526.22	2.810	4.409	2.858	1.940	33.143	$\dots$	1663.63	0.168	0.297	0.130	0.376
$t_{4}$	527.25	2.775	2.496	3.782	1.940	33.143	$\dots$	1622.95	0.167	0.356	0.130	0.421
$t_{5}$	527.13	2.768	2.684	2.565	1.940	33.143	$\dots$	1604.52	0.157	0.390	0.136	0.449
$t_{6}$	526.95	2.705	4.226	3.553	1.940	33.143	$\dots$	1545.50	0.160	0.279	0.123	0.299
$t_{7}$	524.52	2.649	3.488	3.883	0.818	27.435	$\dots$	1572.53	0.164	0.174	0.103	0.238
$t_{8}$	611.68	2.607	3.690	2.688	0.818	27.435	$\dots$	1512.50	0.166	0.327	0.107	0.335
$t_{9}$	611.68	2.607	3.690	2.688	0.818	27.435	$\dots$	1523.24	0.181	0.416	0.141	0.411
$t_{10}$	611.68	2.614	4.601	2.727	0.818	27.435	$\dots$	1625.29	0.181	0.349	0.133	0.367
$t_{11}$	611.68	2.621	3.838	3.610	0.818	27.435	$\dots$	1586.54	0.180	0.347	0.135	0.434
$t_{12}$	609.78	2.635	4.337	3.981	−0.346	34.293	$\dots$	1670.37	0.176	0.339	0.144	0.466
$t_{13}$	611.68	2.607	3.690	2.688	0.818	27.435	$\dots$	1523.24	0.181	0.416	0.141	0.411
$⋮$	$⋮$	$⋮$	$⋮$	$⋮$	$⋮$	$⋮$	$⋮$	$⋮$	$⋮$	$⋮$	$⋮$	$⋮$
$t_{620}$	609.87	3.097	4.491	2.642	−0.387	30.285	94.79	1503.43	0.167	0.301	0.139	0.418

Table 2. Comparison of PCA, KPCA, SPCA, and SKPCA algorithms.

	PCA		SPCA		KPCA		SKPCA
Principal Component	Var	Con_Rate	Var	Con_Rate	Var	Con_Rate	Var	Con_Rate
pc1	8.156	22.609	4.144	25.902	5.961	37.255	10.628	27.858
pc2	5.654	38.283	2.848	43.702	2.205	51.036	7.447	47.379
pc3	4.075	49.579	1.651	54.019	1.423	59.930	5.302	61.276
pc4	3.090	58.144	1.301	62.152	1.033	66.384	4.197	72.275
pc5	2.809	65.931	1.108	69.080	1.391	75.075	2.635	79.181
pc6	2.013	71.511	1.051	75.647	0.897	80.683	2.188	84.917
pc7	1.952	76.921	1.006	81.933	0.857	86.040	1.530	88.926
pc8	1.907	82.208	0.834	87.144	0.601	89.794	1.285	92.295
pc9	1.820	87.253	0.757	91.877	0.446	92.579	1.114	95.215
pc10	1.516	91.455	0.526	95.163	0.352	94.780	0.692	97.030
pc11	1.473	95.537	0.385	97.569	0.291	96.601	0.487	98.306
pc12	0.828	97.833	0.230	99.005	0.262	98.238	0.338	99.191
pc13	0.462	99.112	0.068	99.430	0.110	98.925	0.162	99.617
pc14	0.136	99.490	0.062	99.820	0.118	99.662	0.137	99.977
pc15	0.126	99.838	0.029	100.000	0.053	99.994	0.069	100.000
pc16	0.058	100.000	0.000	100.000	0.001	100.000	0.000	100.000

Table 3. Prediction effect of LSTM on different principal components.

	5-LSTM	S5-LSTM	K5-LSTM	SK5-LSTM	7-LSTM	S7-LSTM	K7-LSTM	SK7-LSTM
$M S E$	0.00232	0.001223	0.00106	0.000786	0.000888	0.00072	0.000681	0.000432
$R M S E$	0.048168	0.034973	0.032563	0.028028	0.029805	0.026828	0.026095	0.020784
$M A E$	0.036148	0.027997	0.026086	0.021984	0.023414	0.021419	0.020336	0.011356
$R M E$	0.523943	0.339875	0.287933	0.282824	0.292554	0.250696	0.285568	0.315281
$R^{2}$	0.613158	0.796067	0.82321	0.869027	0.851887	0.880001	0.886467	0.925626

Table 4. Ten-dimensional Wilcoxon signed-rank test results.

	PSO	GWO	SSA	GA	DBO	WOA	ESC	FLA
$F 1$	1.1079 × 10⁻²	5.1931 × 10⁻²	5.9836 × 10⁻²	1.9209 × 10⁻⁶	1.7344 × 10⁻⁶	8.4508 × 10⁻¹	2.1336 × 10⁻¹	2.2102 × 10⁻¹
$F 2$	1.4795 × 10⁻²	2.2551 × 10⁻³	1.7344 × 10⁻⁶	2.6033 × 10⁻⁶	8.2167 × 10⁻³	1.7344 × 10⁻⁶	1.7344 × 10⁻⁶	2.4147 × 10⁻³
$F 3$	3.5888 × 10⁻⁴	3.1817 × 10⁻⁶	2.2248 × 10⁻⁴	1.9209 × 10⁻⁶	1.7344 × 10⁻⁶	1.7791 × 10⁻¹	6.3198 × 10⁻⁵	1.7344 × 10⁻⁶
$F 4$	2.3038 × 10⁻²	3.5009 × 10⁻²	3.2857 × 10⁻¹	1.7344 × 10⁻⁶	1.6395 × 10⁻⁵	2.3534 × 10⁻⁶	1.7344 × 10⁻⁶	2.6134 × 10⁻⁴
$F 5$	9.7111 × 10⁻⁵	1.9209 × 10⁻⁶	2.0671 × 10⁻²	9.9180 × 10⁻¹	1.7344 × 10⁻⁶	7.6552 × 10⁻¹	1.1499 × 10⁻⁴	1.9209 × 10⁻⁶
$F 6$	1.4704 × 10⁻¹	1.7344 × 10⁻⁶	1.6395 × 10⁻⁵	2.6134 × 10⁻⁴	2.2248 × 10⁻⁴	1.7344 × 10⁻⁶	1.7344 × 10⁻⁶	1.4795 × 10⁻²
$F 7$	1.2045 × 10⁻¹	2.4308 × 10⁻²	1.6566 × 10⁻²	1.9730 × 10⁻⁵	2.8786 × 10⁻⁶	7.9710 × 10⁻¹	1.8462 × 10⁻¹	2.8434 × 10⁻⁵
$F 8$	9.3676 × 10⁻²	1.7138 × 10⁻¹	3.4935 × 10⁻¹	5.3070 × 10⁻⁵	1.7989 × 10⁻⁵	2.4519 × 10⁻¹	1.7138 × 10⁻¹	4.9916 × 10⁻³
$F 9$	3.4053 × 10⁻⁵	1.7344 × 10⁻⁶	1.7344 × 10⁻⁶	1.7344 × 10⁻⁶	5.2165 × 10⁻⁶	1.7344 × 10⁻⁶	1.7344 × 10⁻⁶	3.5152 × 10⁻⁶
$F 10$	4.0484 × 10⁻¹	2.3038 × 10⁻²	7.0356 × 10⁻¹	4.9498 × 10⁻²	1.4773 × 10⁻⁴	1.8326 × 10⁻³	1.9569 × 10⁻²	1.1138 × 10⁻³
$F 11$	4.0715 × 10⁻⁵	2.1630 × 10⁻⁵	1.3595 × 10⁻⁴	1.7344 × 10⁻⁶	7.6552 × 10⁻¹	5.2165 × 10⁻⁶	1.7344 × 10⁻⁶	1.7344 × 10⁻⁶
$F 12$	7.3433 × 10⁻¹	8.7740 × 10⁻¹	2.3705 × 10⁻⁵	1.7344 × 10⁻⁶	2.1266 × 10⁻⁶	6.0350 × 10⁻³	7.6909 × 10⁻⁶	3.6094 × 10⁻³

Table 5. Predictors of different mixed models.

	CNN-LSTM	CNN-GRU	TCN-LSTM	TCN-GRU	TCN–Transformer
MSE	0.000226	0.000305	0.000174	0.000141	0.000114
RMSE	0.015027	0.017452	0.013206	0.011863	0.010701
MAE	0.012054	0.010985	0.010227	0.009364	0.008215
RME	0.225468	0.195324	0.141298	0.135392	0.127960
$R^{2}$	0.962351	0.942552	0.970925	0.976536	0.980988

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Zhang, L.; Yan, Z.; Deng, J.; Huang, Y.; Qin, Z.; Cao, Y.; Wang, Y. TCN–Transformer Spatio-Temporal Feature Decoupling and Dynamic Kernel Density Estimation for Gas Concentration Fluctuation Warning. Fire 2025, 8, 175. https://doi.org/10.3390/fire8050175

AMA Style

Wang Y, Zhang L, Yan Z, Deng J, Huang Y, Qin Z, Cao Y, Wang Y. TCN–Transformer Spatio-Temporal Feature Decoupling and Dynamic Kernel Density Estimation for Gas Concentration Fluctuation Warning. Fire. 2025; 8(5):175. https://doi.org/10.3390/fire8050175

Chicago/Turabian Style

Wang, Yanping, Longcheng Zhang, Zhenguo Yan, Jun Deng, Yuxin Huang, Zhixin Qin, Yuqi Cao, and Yiyang Wang. 2025. "TCN–Transformer Spatio-Temporal Feature Decoupling and Dynamic Kernel Density Estimation for Gas Concentration Fluctuation Warning" Fire 8, no. 5: 175. https://doi.org/10.3390/fire8050175

APA Style

Wang, Y., Zhang, L., Yan, Z., Deng, J., Huang, Y., Qin, Z., Cao, Y., & Wang, Y. (2025). TCN–Transformer Spatio-Temporal Feature Decoupling and Dynamic Kernel Density Estimation for Gas Concentration Fluctuation Warning. Fire, 8(5), 175. https://doi.org/10.3390/fire8050175

Article Menu

TCN–Transformer Spatio-Temporal Feature Decoupling and Dynamic Kernel Density Estimation for Gas Concentration Fluctuation Warning

Abstract

1. Introduction

2. Materials and Methods

2.1. Sparse Kernel Principal Component Analysis

2.2. Construction of Gas Prediction Model Based on TCN–Transformer

2.2.1. Temporal Convolutional Network

2.2.2. Transformer Encoder

2.3. Hyperparameter Optimization Framework

2.3.1. Flood Optimization Algorithm

2.3.2. FLA Optimizes TCN–Transformer Network Hyperparameters

2.4. Local Weighted Regression Kernel Function Density Estimation

2.5. Model Performance Evaluation Index

2.5.1. Evaluation Index of Point Prediction

2.5.2. Evaluation Index of Interval Prediction

3. Results and Discussion

3.1. Analysis of Influence Gas Emission Quantity Data

3.2. Reliability and Effect Analysis of SKPCA Dimensionality Reduction

3.2.1. Feature Dimensionality Reduction and Nonlinear Correlation Analysis

3.2.2. Influence of Different Principal Component Numbers on Prediction Effect

3.3. Performance Verification of FLA Based on CEC 2022 Test Set

3.3.1. FLA Performance Test on CEC 2022 Test Set

3.3.2. TCN–Transformer Hyperparameter Optimization Results

3.4. TCN–Transformer Performance Test

3.5. LWD-KDE Interval Prediction Analysis

3.6. Early Warning Mechanism for Gas Concentration Mutation Risk

4. Conclusions

4.1. Main Results

4.2. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI