Next Article in Journal
Seaweed as a Sustainable Adsorbent for the Removal of Vancomycin from Water
Previous Article in Journal
Turbulence and Energy Dissipation of Lateral Deflectors in Free-Surface Tunnel
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Coal Mine Roof Water Inrush Prediction Based on Machine Learning Research

1
College of Energy and Mining Engineering, Shandong University of Science and Technology, Qingdao 266590, China
2
State Key Laboratory of Coal Resource Efficient Mining and Clean Utilization, Beijing 100013, China
3
State Key Laboratory of Hydroscience and Hydraulic Engineering, Tsinghua University, Beijing 100084, China
4
Northwest Research Institute of Shandong Energy Group, Xi’an 710026, China
5
Shaanxi Zhongtai Energy Investment Co., Ltd., Yulin 719100, China
6
School of Resources and Civil Engineering, Northeastern University, Shenyang 110819, China
7
Geotechnical and Structural Engineering Research Center, Shandong University, Jinan 250061, China
*
Author to whom correspondence should be addressed.
Water 2026, 18(9), 1036; https://doi.org/10.3390/w18091036
Submission received: 23 March 2026 / Revised: 20 April 2026 / Accepted: 22 April 2026 / Published: 27 April 2026
(This article belongs to the Section Hydrogeology)

Abstract

This study develops an intelligent multi-indicator collaborative approach to improve coal seam roof water inrush warnings. A multidimensional dataset is constructed using microseismic data, borehole water levels, electrical measurements, and daily water inflow. A VMD-LSTM algorithm is proposed to predict roof rupture height, while regression analysis handles remaining indicators. Results show that during water-conducting channel development, microseismic activity, electrical data, and water inflow increase synchronously, whereas borehole water levels decline significantly—trends that reverse post-development. Compared to traditional LSTM, the VMD-LSTM model reduces MAE by 15.38%, RMSE by 20.00%, MAPE by 17.39%, HH by 9.52%, GPI by 10.76%, and improves NSE by 6.90%, demonstrating high accuracy. The central tendency prediction errors for the remaining indicators range from 0.63% to 5.73%. This integration of intelligent algorithms and multi-indicator analysis enables precise prediction of water inrush precursors, offering a new technical framework for roof water hazard prevention.

1. Introduction

As a primary energy source in China, coal occupies a pivotal position in the development of the national economy. However, with the continuous increase in mining depth, the hydrogeological conditions of coal mines have become increasingly complex, and the problem of roof water inrush in coal seams has grown progressively more severe. Roof water inrush refers to the phenomenon where water from the aquifer above the coal seam suddenly surges into the working face under the effect of water pressure through water-conducting channels during mining operations. Characterized by strong suddenness, wide-ranging hazards, and significant destructive power, roof water inrush imposes a heavy burden on coal mining enterprises and society, making it one of the major disasters threatening the safe production of coal mines [1,2,3,4].
In the research of coal mine roof water inrush monitoring technologies, the introduction of multi-source data fusion and deep learning methods has significantly improved the accuracy and interpretability of precursor identification. Along this direction, a series of advances have been achieved based on different technical approaches: from a machine learning perspective, microseismic event classification is evaluated, and multi-source data including microseismic signals, water level, and stress are fused to achieve precise classification of precursor events of coal mine roof water inrush [5]; an unsupervised clustering method based on a hybrid model can directly distinguish the source mechanisms of roof fracturing, providing key discriminative criteria for early warning of water inrush [6]; by comparing different filtering methods and incorporating a multi-source data fusion strategy, the preprocessing workflow of coal mine roof water inrush monitoring data is optimized [7]; combining migration imaging with neural networks enables rapid localization of roof fracturing events from surface monitoring data, supporting real-time assessment of water inrush risk [8]; a real-time wave velocity prediction method based on a discrete physical laboratory model and explainable artificial intelligence effectively improves the localization accuracy of coal mine roof water inrush [9]; the design of roof bolt support integrating multi-source sensing information provides an important basis for water inrush risk assessment [10]; based on three-dimensional monitoring analysis and introducing deep learning temporal modeling, the long-term deformation behavior of the coal mine roof and the stability state concerning water inrush are clarified [11]; the controlling effect of multiple factors on fault activation in high-permeability sandstone laboratory injection tests provides an experimental basis for the mechanism of water inrush induced by roof water injection in coal mines [12]; combining numerical simulation with machine learning, the seepage parameters of heterogeneous media are inverted from fluid-induced signals, achieving seepage field inversion of coal mine roof water inrush channels [13]; a method that deeply integrates feature extraction with an explainable coupled neural network utilizes multi-source monitoring data to achieve accurate and interpretable short-term prediction of coal mine roof water inrush [14]; a hybrid model based on random forest and whale optimization algorithm, integrating microseismic time series and hydrogeological features, also achieves high-precision prediction of coal mine roof water inrush [15]; parametric machine learning models have advantages on small-sample, normally distributed datasets, providing theoretical guidance for model selection in coal mine roof water inrush monitoring [16].
Thanks to the extensive research conducted by scholars, the application of microseismic monitoring technology in the field of coal mine water inrush disaster control has become increasingly sophisticated [17,18,19,20]. However, technical bottlenecks still exist in the current application of microseismic monitoring technology for coal mine water inrush control. To effectively address coal seam roof water inrush disasters, improve the accuracy and timeliness of early warnings, and ensure the safety of coal mine production, this study innovatively integrates microseismic monitoring, geological and hydrological monitoring, and electrical monitoring. It systematically analyzes the intrinsic relationships among microseismic signals, current variations, observation hole water levels, and water inflow during the formation of water-conducting channels, examining groundwater migration and water-conducting channel development from different perspectives. On this basis, intelligent prediction algorithms are introduced to construct a multi-source data-driven prediction model for roof fracture height, achieving technological innovation from multi-field mechanism analysis to intelligent disaster prediction, thereby providing strong support for the safe production of coal mines.

2. Methods

2.1. Selection of Early Warning Indicators for Roof Water Inrush

Targeting the three key stages of roof water inrush—roof strata fracturing, fracture connection with aquifers, and water seepage along channels—four categories of key indicators that characterize the mechanical behavior of the roof and the disturbance patterns of the hydrological field are selected: microseismic events, monitoring borehole water levels, self-potential, and working face water inflow. These indicators provide support for roof water inrush control.

2.1.1. Microseismic Events

Microseismic events are triggered by micro-fracturing events within the rock mass and represent a dynamic response to internal stress adjustments within the rock mass. During the micro-fracturing process, the released strain energy propagates outward in the form of elastic waves [21,22,23]. By capturing elastic wave signals through sensors deployed in the monitoring area, and utilizing the time differences in signal arrival at multiple sensors in conjunction with a velocity model of the monitoring area, the spatial coordinates of microseismic events can be determined through geometric localization algorithms.

2.1.2. Monitoring Borehole Water Level

By deploying boreholes within the coal mine field through drilling and other engineering methods, changes in water level and pressure within the boreholes are monitored, and the dynamic characteristics of aquifers are analyzed. This enables timely detection of anomalous changes in groundwater, providing critical evidence for the prediction, forecasting, and prevention of coal mine water inrush [24,25,26]. When a water-conducting channel is fully formed, the groundwater seepage system undergoes a sudden change, and various parameters of the monitoring boreholes exhibit specific responses.

2.1.3. Self-Potential

Electrical monitoring is a geophysical technique based on differences in the electrical properties of subsurface media. Through the excitation of an electric field and the acquisition of response signals, it detects the occurrence state of water in rock layers and structural zones, as well as the development characteristics of water-conducting channels [27,28,29,30]. When a water-conducting channel suddenly forms, the drastic changes in the groundwater seepage field induce specific responses in electrical parameters. The resistivity of the medium surrounding the water-filled channel drops sharply, forming a distinct anomalous spike, and the anomalous value further decreases as the water inflow increases.

2.1.4. Working Face Water Inflow

Sources of water inflow include the roof aquifer, fissure water in the coal seam roof and floor, residual water in goafs, and water from structural zones. When a water-conducting channel connects with the main roof aquifer, water inflow exhibits a sudden increase, surging sharply from the normal range to a high value within a short period. Moreover, due to the continuous recharge capacity of the aquifer, the water inflow remains stable after the increase without significant attenuation [31,32,33]. Conversely, if the water-conducting channel does not connect with the main roof aquifer, changes in water inflow are gradual and limited, without substantial surges over a short period, and the overall water volume remains at a consistently low level.

2.2. Selection of Primary Indicators and Response Relationships Among Multiple Monitoring Indicators

2.2.1. Engineering Background

This study selects a working face of the No. 5 coal seam in a coal mine in East China as the research object. The stratigraphic sequence in this mining area is well developed, consisting, from bottom to top, of the Ordovician, Carboniferous, Permian, Paleogene, Neogene, and Quaternary systems. The coal-bearing strata are primarily the Taiyuan Formation of the Upper Carboniferous, the Shanxi Formation of the Lower Permian, the Lower Shihezi Formation of the Lower Permian, and the Upper Shihezi Formation of the Upper Permian. The primarily mined No. 5 coal seam is hosted within the Lower Shihezi Formation of the Permian, characterized by stable occurrence and a simple structure, representing the core coal seam currently exploited in the mine. Above the roof of the No. 5 coal seam, multiple aquifers with water inrush potential are developed from top to bottom, including the Quaternary fourth aquifer of the Cenozoic, the weathered oxidized zone aquifer, the roof sandstone aquifer of the No. 9 coal seam, and the roof sandstone aquifer of the No. 5 coal seam. These aquifers are spatially distributed in layered forms and, under the influence of faults, concealed outcrops, and mining-induced fractures, can form vertical and lateral hydraulic connections, collectively constituting the roof water source system of the working face.
A complete microseismic-electrical coupled monitoring system has been installed in the working face, enabling real-time monitoring of microseismic events, resistivity data, and other parameters during the mining process. Based on the actual monitoring sensitivity of long-term hydrological observation boreholes, Observation Borehole 1 was selected as the core monitoring borehole for the Quaternary fourth aquifer, and Observation Borehole 2 was used for simultaneous and continuous monitoring of the roof sandstone aquifer of the No. 9 coal seam.

2.2.2. Selection of Primary Indicators

To quantitatively evaluate the stability of the data, the coefficient of variation for each type of data is calculated. This indicator, by eliminating the influence of dimensions, objectively characterizes the relative degree of dispersion of the data. A smaller coefficient of variation indicates smaller data fluctuations and stronger stability.
The calculation formula is shown in Equation (1):
CV = σ μ × 100 % ,
where CV is the coefficient of variation, %; μ is the mean; σ is the standard deviation.
The indicators, ranked from lowest to highest coefficient of variation, are as follows: water level in Observation Borehole 1, water level in Observation Borehole 2, roof fracture height, self-potential, microseismic frequency, and daily water inflow. As shown in Table 1. Among these, water levels in Observation Borehole 1 and Observation Borehole 2 exhibit relatively high stability, indicating that water levels are less affected by mining disturbances. Roof fracture height and self-potential show moderate stability, suggesting that the extent of roof fracturing and the hydrological conditions of the strata are relatively stable. Microseismic frequency and daily water inflow demonstrate poor stability, indicating significant variations in rock mass fracturing activity and formation permeability conditions.
To analyze the correlation characteristics among the indicators, the Pearson correlation coefficient is used to construct a linear correlation matrix for the six types of data. The strength and direction of the correlation between variables are determined by the r value: the closer the absolute value of r is to 1, the stronger the linear correlation between the two variables.
The calculation formula is shown in Equation (2):
r = Cov ( X ,   Y ) σ X σ Y ,
where r is the correlation coefficient; Cov(X, Y) is the covariance between X and Y.
As shown in Table 2,based on the correlation analysis, the following strongly correlated indicator pairs can be identified: roof fracture height and water level in Observation Borehole 2 (r = 0.84), indicating that roof fracturing may lead to the formation of permeable pathways in the aquifer monitored by Observation Borehole 2; water level in Observation Borehole 2 and daily water inflow (r = 0.84), suggesting that the aquifer monitored by Observation Borehole 2 is a major source of water inflow; water level in Observation Borehole 1 and daily water inflow (r = 0.81), indicating that the aquifer monitored by Observation Borehole 1 may contribute to water inrush; roof fracture height and daily water inflow (r = 0.80), indicating that roof fracturing is a key factor driving changes in water inflow; and roof fracture height and self-potential (r = 0.79), indicating that damage to the roof structure significantly affects the electrochemical environment of the strata.
Meanwhile, the mutual information method was introduced for verification. The results show that: the roof fracture height and the water level in Observation Borehole 2 exhibit strong linear synergy (NMI = 0.3745), but their nonlinear responses differ during intense mining stages; the water level in Observation Borehole 2 and the daily water inflow (NMI = 0.4599) demonstrate both linear synchrony and nonlinear information feedback, making this pair a sensitive indicator for water hazard monitoring; the water level in Observation Borehole 1 and the daily water inflow (NMI = 0.5798) show a strong linear-nonlinear dual coupling, where the aquifer monitored by Borehole 1 is comprehensively influenced by regional hydrology and mining activities, exhibiting significant nonlinear responses; the roof fracture height and the daily water inflow (NMI = 0.4050) combine linear trends with nonlinear characteristics, representing a core pair for characterizing the mechanism of overburden-controlled water inflow and suitable for mechanism analysis and predictive modeling; the roof fracture height and the self-potential (NMI = 0.2940) indicate that self-potential can only qualitatively characterize macroscopic roof fracturing and is suitable as an auxiliary indicator for linear monitoring.
Based on the above results, the coefficient of variation for roof fracture height lies in the middle range among the six indicators, achieving a good balance between sensitivity and accuracy in anomaly detection. Meanwhile, it exhibits the highest correlation coefficients with the other five indicators, indicating that this indicator best captures the relationships with other influencing factors. As a core parameter characterizing the extent of roof strata failure and the degree of damage, roof fracture height is determined as the primary indicator for this study, in conjunction with the statistical results.

2.2.3. Analysis of Response Relationships Among Multiple Monitoring Indicators

Based on the monitoring data from 8 December to 17 December: the frequency of microseismic events increased significantly, rising from 20 events/day initially to 38 events/day at the end, an overall increase of 90%. Roof fracture height fluctuated upward between 35 m initially and 49 m at the end, indicating intense fracturing activity within the rock mass. The water level in Observation Borehole 1 showed a slight overall decreasing trend, declining by only 0.19 m over the 10-day period, while the water level in Observation Borehole 2 exhibited a significant and sustained downward trend, decreasing by 13.1 m over the 10-day period, suggesting gradual development of water-conducting channels. The amplitude of self-potential variations showed a continuous and rapid upward trend, increasing by 123 mV over the 10-day period, representing an increase of 183.6%. Daily water inflow exhibited a pattern of rapid initial increase followed by minor fluctuations, reaching a peak on 14 December and subsequently fluctuating at a high level, with an overall increase of 61.4% from the initial value, indicating continuously increasing groundwater activity intensity.
Based on the monitoring data from 9 January to 23 January, most indicators stabilized or declined from their peak levels, and observation borehole water levels shifted from decreasing to increasing. The frequency of microseismic events showed a fluctuating downward trend, with overall activity intensity weakening. It decreased from 61 events/day initially to 39 events/day at the end, a reduction of 36.1%. Roof fracture height gradually decreased overall while remaining at a relatively high level, indicating that rock fracturing had largely reached a stable state and microseismic activity intensity had significantly diminished. The water level in Observation Borehole 1 rose by 0.08 m over the 15-day period, while the water level in Observation Borehole 2 exhibited a significant and sustained recovery trend, rising by a total of 36.39 m over the 15-day period. The amplitude of self-potential variations showed a stable fluctuating trend at a high level, with no continuous increase or decrease, generally maintained around a high value of 240 mV. Daily water inflow exhibited an overall downward trend, stabilizing in the later period, with a decrease of 31.4% over the 15-day period.
Based on the variation characteristics of the multi-monitoring indicator data, a clear response relationship among the indicators can be identified: the increase in microseismic frequency and roof fracture height directly reflects the intensification of roof fracturing extent and damage degree, providing the precondition for the formation of water-conducting channels. The expansion of the fracturing zone enhances the permeability of the rock strata, which in turn leads to a decrease in monitoring borehole water levels and an increase in water inflow. The variation in self-potential directly reflects ion migration during groundwater seepage. The above response relationships reflect the complete process of roof water inrush, from roof strata fracturing to fracture connection with aquifers and subsequent water seepage along the channels.

2.3. Prediction of Roof Fracture Height and Multiple Monitoring Indicators

2.3.1. Data Preprocessing

To ensure the accuracy and reliability of the input data for the model, a systematic preprocessing of the raw monitoring data was carried out. For missing data, an adjacent data interpolation method was adopted for imputation: for consecutive missing data points of no more than three time steps, linear interpolation using the measured values at the adjacent preceding and succeeding time steps was applied; for consecutive missing data exceeding three time steps, the sliding window mean method was used for imputation. For data anomalies, the boxplot method was employed for identification and removal: using the interquartile range (IQR) as the criterion, data points falling below Q1 − 1.5 × IQR or above Q3 + 1.5 × IQR were identified as outliers. Furthermore, to address the issue of substantial differences in dimensions and numerical ranges among different monitoring indicators, the Min-Max normalization method was applied to map the data into the interval [0, 1], thereby eliminating the effect of different dimensions.
The calculation formula is shown in Equation (3):
x = x x min x max x min ,
where x′ is the normalized data; x is the original data; xmax is the maximum value in the dataset; xmin is the minimum value in the dataset.

2.3.2. VMD Decomposition

Variational Mode Decomposition (VMD) is an adaptive signal decomposition method capable of decomposing complex signals into multiple modal components with different frequency characteristics. Compared with traditional algorithms such as Empirical Mode Decomposition (EMD), its core advantage lies in constructing a variational optimization model that, under a preset number of decomposition modes, decomposes complex time series into multiple mutually independent intrinsic mode functions (IMFs) with distinct frequency characteristics, offering stronger noise suppression capability [34,35].
For the time series of roof fracture height, three parameter combinations were employed in comparative experiments: Combination 1 (K = 6, α = 1200), Combination 2 (K = 8, α = 1500), and Combination 3 (K = 10, α = 1800), covering the commonly used ranges of mode numbers from 6 to 10 and penalty factors from 1200 to 1800. Feature analysis indicates that increasing K leads to more complete decomposition, with energy being distributed across more modes, thereby facilitating the capture of small-scale fluctuation characteristics. Increasing α enhances modal constraints, allowing effective information to be more adequately allocated to each mode. The results show that Combination 1 exhibits the highest energy proportion in the dominant mode (99.50%) and a concentrated trend feature, but it struggles to capture small-scale fluctuations, resulting in limited accuracy for short-term anomaly detection scenarios. Combination 3 achieves a high degree of mode subdivision and strong capability in capturing fluctuations; however, the energy proportion of the dominant mode is low (97.86%), and the features are dispersed. Combination 2 preserves a clear trend while extracting key fluctuations, with moderate computational cost, making it the optimal choice for comprehensive modeling. The decomposition results of Combination 2 are shown in Figure 1. The high-frequency components (IMF1–IMF3) mainly correspond to instantaneous perturbations from microseismic events and monitoring noise, with their amplitude fluctuations temporally coinciding with sudden peaks in daily microseismic frequency. The intermediate-frequency components (IMF4–IMF6) reflect the staged upward trend of fracture height caused by periodic roof weighting. The low-frequency components (IMF7–IMF8) characterize the long-term evolution trend and residual deformation of the fracture zone, controlled by regional hydrogeological conditions.
Additionally, three other signal decomposition methods—EMD, EEMD, and CEEMDAN—were employed for comparison. The results were analyzed in terms of separation accuracy, reconstruction accuracy, and engineering applicability. The results are shown in Table 3.
Considering the engineering characteristics of roof fracture height data, which are non-stationary and contain random noise, VMD decomposition demonstrates the best performance. VMD achieves the highest modal separation quality, with clear frequency boundaries between components. Its stability and noise immunity are superior to the other three methods, thereby enhancing the accuracy of subsequent analyses.

2.3.3. LSTM Algorithm

Long Short-Term Memory (LSTM) is a special type of recurrent neural network. By introducing three gating mechanisms—the input gate, forget gate, and output gate—along with a memory cell, it effectively addresses the vanishing gradient problem, enabling it to capture long-term dependencies and effectively model dynamic patterns in time series data [36,37,38].
The forget gate determines which historical information to discard from the memory cell. It takes the current input xt and the previous hidden state ht−1 as inputs, and outputs a vector ft with values between 0 and 1 through a sigmoid function:
f t = s ( W f [ h t 1 , x t ]   +   b f ) ,
where s is the sigmoid activation function; Wf is the weight matrix of the forget gate; bf is the bias term of the forget gate.
The input gate determines the new information to be stored in the memory cell. First, it generates it, which controls the update intensity, through a sigmoid function, and then generates candidate update information C ~ t through a tanh function:
i t = s ( W i [ h t 1 , x t ]   +   b i ) ,
C ~ t = tanh ( W C [ h t 1 , x t ]   +   b C ) ,
where Wi is the weight matrix of the input gate; WC is the weight matrix of the candidate information; bC is the bias term of the candidate information.
Subsequently, the state of the memory cell C t is updated based on the forget gate and the input gate:
C t = f t C t 1 + i t C ~ t ,
The output gate determines what information to output at the current time step. It generates ot through a sigmoid function, and then combines it with the current state of the memory cell to generate the current hidden state ht, which serves as the output of the model or the input for the next time step:
o t = s ( W o [ h t 1 , x t ]   +   b o ) ,
h t = o t tanh ( C t ) ,
where Wo is the weight matrix of the output gate; bo is the bias term of the output gate.
The results are shown in Figure 2.
Additionally, three prediction models—BP neural network, GRU, and SVR—were employed for comparison. The results were analyzed in terms of prediction error and goodness of fit. The results are shown in Table 4.
The model comparison results indicate that the GRU model performs slightly better than LSTM in terms of RMSE and MAE, but exhibits the highest MAPE and the lowest R2, suggesting insufficient predictive stability for sensitive fluctuations preceding water inrush. Although the BP neural network and SVR models achieve relatively low local errors, their coefficients of determination (R2) are only 0.91, indicating a weaker capability in fitting the long-term evolution pattern of roof fracture height compared to LSTM. The LSTM model achieves the highest coefficient of determination among all models, demonstrating the strongest global fitting and dynamic pattern capture capabilities. It can effectively handle the non-stationary, high-noise, and long-lag time series data typical of coal mine sites, and is therefore better suited to the temporal characteristics of roof water inrush evolution and the requirements of engineering early warning. Consequently, LSTM is selected as the baseline comparison model in this study.

2.3.4. VMD-LSTM Prediction

Combining the VMD decomposition results, the eight IMF components obtained from decomposition are treated as independent subsequences. LSTM sub-models are constructed separately for each component based on its frequency characteristics. The historical time series data of each component serve as the core input features, while monitoring indicators significantly correlated with roof fracture height are introduced as auxiliary inputs. The input time step is set to 5, and the output is the predicted value of the corresponding component at the next time step. The model has one hidden layer with 24 neurons.
For each component, the data are divided into training and testing sets in chronological order at a ratio of 7:3. After training, the predicted values of all components are summed to obtain the final prediction result for the original target variable sequence. The final results are shown in Figure 3.

2.3.5. Calculation of Correlated Indicators

Based on the roof fracture height data predicted by VMD-LSTM, a regression model is used to estimate the mean shifts of three indicators that are strongly correlated with it. On this basis, the t-distribution method is applied to statistically infer the 95% confidence intervals of these three indicators.
The regression coefficient is calculated using the following formula:
β = r   ×   ( σ ta , tr σ tr ) ,
where β is the linear regression coefficient; r is the correlation coefficient between roof fracture height and the associated monitoring indicator; σta,tr is the standard deviation of the measured values of the associated monitoring indicator in the training set; σtr is the standard deviation of the predicted values of roof fracture height in the training set.
On this basis, the Bootstrap method based on resampling is employed for interval estimation. This approach repeatedly draws a large number of samples with replacement from the original dataset to simulate the sampling distribution, thereby constructing interval estimates without relying on the conventional large-sample normal distribution assumption. The results are shown in Table 5.

3. Results

3.1. Analysis of VMD-LSTM Prediction Results

For the results of the VMD-LSTM algorithm, the following five metrics are selected for evaluation: Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Nash–Sutcliffe Efficiency (NSE), Hanna–Heinold Indicator (HH), and Global Performance Indicator (GPI). The calculation formulas are as follows:
MAE = 1 N i = 1 N | y ^ i   y i | ,
MAPE = 1 N i = 1 n   | y i   y ^ i y i |   ×   100 % ,
HH = i = 1 N   ( y ^ i   y i ) 2 i = 1 N   y i   ×   y ^ i ,
NSE = 1 i = 1 N   ( y ^ i y i ) 2 i = 1 N   ( y i y i ¯ ) 2 ,
PI = MBE   ×   RMSE   ×   U 95   ×   TS   ×   ( 1 R 2 ) ,
where N is the total number of samples; y ^ i is the predicted value of the model for the i-th sample; yi is the measured value of the i-th sample; y i ¯ is the mean of the measured values; MBE is the Mean Bias Error; RMSE is the Root Mean Square Error; U95 is the 95% uncertainty; TS is the t-statistic; R2 is the coefficient of determination [39,40].
Meanwhile, to verify the superiority of the VMD-LSTM algorithm model, its prediction results are compared with those of the LSTM algorithm model. The results are shown in Table 6.
The results indicate that VMD-LSTM significantly outperforms the traditional LSTM across all key dimensions. In terms of error control, MAE decreases from 0.13 m to 0.11 m, and MAPE decreases from 0.23% to 0.19%, representing reductions of 15.38% and 17.39%, respectively. Regarding goodness of fit, and NSE increases from 0.87 to 0.93, corresponding to improvements in fitting accuracy and prediction reliability of 6.90% and 7.50%, respectively. In terms of stability, HH decreases from 0.23 to 0.21, and GPI decreases from 1.58 to 1.41, indicating enhancements in error distribution concentration and overall robustness of 9.52% and 10.76%, respectively. This disparity demonstrates that VMD decomposition can transform the original complex, nonlinear, and non-stationary data into stationary, single-frequency modal components, thereby reducing data complexity to alleviate the model’s learning burden, while simultaneously improving learning efficiency by separating noise from effective features. This fully confirms that incorporating VMD preprocessing prior to the LSTM model plays a crucial role in enhancing the predictive performance for nonlinear data.

3.2. Analysis of Correlated Indicator Calculation Results

For the statistical data of the remaining three indicators calculated using the predicted values of roof fracture height, the mean values and the fluctuation ranges of the 95% confidence intervals relative to the means are compared, analyzing both the central tendency of the data and the stability of the data. The results are shown in Table 7.
As shown in Figure 4, the three correlated indicators calculated based on roof fracture height exhibit small overall errors, indicating high reliability of the method. In terms of mean error, performance varies across different indicators. Among them, the mean error for water level in Observation Borehole 2 is only 0.63%, the lowest among all indicators, indicating extremely high accuracy in calculating this water level mean based on roof fracture height. The mean errors for self-potential and daily water inflow are relatively higher, at 4.67% and 5.73%, respectively, suggesting that daily water inflow and self-potential are more significantly affected by other factors, leading to slightly larger deviations in mean calculation. In terms of the fluctuation range error of the 95% confidence interval, the error for daily water inflow is only 4.85%, the lowest among all indicators, while the errors for water level in Observation Borehole 2 and self-potential are 7.22% and 7.53%, respectively, reflecting high reliability of the confidence interval calculations.

4. Discussion

(1)
There is a significant correlation between the development state of water-conducting channels in the coal seam roof and microseismic events, self-potential, observation borehole water levels, and daily water inflow. During the development of water-conducting channels, microseismic frequency, roof fracture height, electrical method data, and daily water inflow exhibit an increasing trend, while observation borehole water levels decrease. Conversely, when the water-conducting channels close, all indicators show opposite trends.
(2)
The VMD-LSTM prediction model constructed based on microseismic data, electrical method data, observation borehole water level data, and working face water inflow data demonstrates good performance, achieving high prediction accuracy for roof fracture height (MAE = 0.11 m, MAPE = 0.19%, NSE = 0.93, HH = 0.21, GPI = 1.41). Compared with the traditional LSTM algorithm model, the VMD-LSTM algorithm model exhibits clear advantages in prediction accuracy: MAE decreases by 15.38%, MAPE decreases by 17.39%, NSE increases by 6.90%, HH decreases by 9.52%, and GPI decreases by 10.76%. Regression calculations were performed for water level in Observation Borehole 2, self-potential, and daily water inflow. The results show that the prediction errors for future central tendency are 0.63%, 4.67%, and 5.73%, respectively; the prediction errors for future confidence intervals are 7.22%, 7.53%, and 4.85%, respectively.
(3)
The multi-indicator early warning technical system constructed in this study, by introducing the Variational Mode Decomposition (VMD) algorithm to output high-quality components, improves the performance of the Long Short-Term Memory (LSTM) model, thereby providing a quantitative basis and technical support for early warning of coal seam roof water inrush. This system holds practical application value for enhancing roof disaster prevention capability and ensuring safe production in coal mines.
(4)
This study is applicable to near-horizontal coal seam working faces with online monitoring conditions, but its generalizability under complex geological conditions remains to be verified, and the medium- to long-term prediction accuracy still requires further optimization. Future work may involve constructing a unified dataset across multiple mines and under various operating conditions, and integrating explainable artificial intelligence methods to provide new insights for intelligent disaster prevention and control in mines.

Author Contributions

Conceptualization, J.C.; methodology, L.L.; validation, J.B.; formal analysis, J.C.; investigation, Z.Q.; resources, W.T.; data curation, L.L.; writing—original draft preparation, H.Z.; writing—review and editing, W.T.; supervision, W.M.; project administration, F.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant numbers U25B20137 and 52104203), the Shandong Academy of Sciences (grant number ZR2022ME140), and the State Key Laboratory of Efficient Mining and Clean Utilization of Coal Resources (grant number 2021-CMCU-KF015).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Wenfeng Tan was employed by the Northwest Research Institute of Shandong Energy Group. Author Zhu Qu was employed by the Shaanxi Zhongtai Energy Investment Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Zeng, Y.F.; Wu, Q.; Zhao, S.Q.; Miao, Y.W.; Zhang, Y.; Mei, A.S.; Meng, S.H.; Liu, X.X. Characteristics, Causes, and Prevention Measures of Coal Mine Water Hazard Accidents in China. Coal Sci. Technol. 2023, 51, 1–14. [Google Scholar]
  2. Peng, S.P. Current Status and Prospects of Research on Geological Assurance System for Coal Mine Safe and High Efficient Mining. J. China Coal Soc. 2020, 45, 2331–2345. [Google Scholar]
  3. Qiao, W.; Wang, Z.W.; Li, W.P.; Lv, Y.G.; Li, L.G.; Huang, Y.; He, J.H.; Li, X.Q.; Zhao, S.L.; Liu, M.N. Formation Mechanism, Disaster-Causing Mechanism and Prevention Technology of Roof Bed Separation Water Disaster in Coal Mines. J. China Coal Soc. 2021, 46, 507–522. [Google Scholar]
  4. Yin, S.X.; Ding, Y.Y.; Lian, H.Q.; Dong, D.L.; Du, T.; Yin, H.C.; Zhao, P.; Zhang, Y.A.; Wang, X. Research on Monitoring and Early Warning Systems for Mine Water: Progress and Prospects. Coal Geol. Explor. 2025, 53, 68–79. [Google Scholar]
  5. Anjom, K.F.; Toro, D.L.; Colombero, C. Fast and effective classification of landslide microseismicity: A machine learning perspective. Eng. Geol. 2026, 367, 108698. [Google Scholar] [CrossRef]
  6. Barthwal, H.; Shcherbakov, R. Unsupervised clustering of mining-induced microseismicity provides insights into source mechanisms. Int. J. Rock Mech. Min. Sci. 2024, 183, 105905. [Google Scholar] [CrossRef]
  7. Chtouki, T.; Petružálek, M.; Staněk, F.; Eisner, L.; Jechumtálová, Z.; Iqbal, N.; Bin Waheed, U. Effect of data filtering on source mechanisms inverted from surface microseismic monitoring array. J. Appl. Geophys. 2026, 245, 106070. [Google Scholar] [CrossRef]
  8. Konyukhov, G.; Yaskevich, S. Neural networks for source mechanism inversion from surface microseismic data. Comput. Geosci. 2024, 28, 1413–1424. [Google Scholar] [CrossRef]
  9. Samadi, H.; Suorineni, F. Development of a procedure for predicting real-time seismic wave velocity in underground mines using discrete physical laboratory modelling and explainable artificial intelligence (XAI). Int. J. Rock Mech. Min. Sci. 2026, 198, 106391. [Google Scholar] [CrossRef]
  10. Langet, N.; Wuestefeld, A.; Grindheim, B.; Li, C.C. Microseismic Monitoring of the Rock Mass During Full-Scale Uplift Tests of Rock Anchors. Rock Mech. Rock Eng. 2025, 58, 7691–7702. [Google Scholar] [CrossRef]
  11. Kumar, V.; Balasubramaniam, V.R. Long-term time-dependent deformation and stability analysis of the machine hall of an underground powerhouse cavern using microseismic monitoring. Near Surf. Geophys. 2025, 23, 69–86. [Google Scholar] [CrossRef]
  12. Naderloo, M.; Veltmeijer, A.; Pluymakers, A.; Jansen, J.D.; Barnhoorn, A. The Effect of Pressurization Rate and Pattern on Injection-Induced Seismicity in Highly Permeable Sandstone: An Experimental Study. J. Geophys. Res. Solid Earth 2025, 130, e2024JB029469. [Google Scholar] [CrossRef]
  13. Novikova, V.E.; Barishnikov, A.N.; Turuntaev, S.B.; Trimonova, M.A. Reconstruction of the Spatial Distribution of Filtration Properties of Heterogeneous Geological Media Based on Variations of Microseismicity Resulting from Fluid Injection. Izv. Phys. Solid Earth 2025, 61, 251–262. [Google Scholar] [CrossRef]
  14. Wang, S.; Xie, L.; Song, Y.; Liu, P.; Gao, Y.; Zhang, G.; Yuan, Y.; Jin, S.; Wang, Z. Approach for Microseismic Monitoring Data-Driven Rockburst Short-Term Prediction Using Deep Feature Extraction and Interpretable Coupling Neural Networks. Appl. Sci. 2025, 15, 11358. [Google Scholar] [CrossRef]
  15. Kadkhodaei, H.M.; Ghasemi, E. Interpretable real-time monitoring of short-term rockbursts in underground spaces based on microseismic activities. Sci. Rep. 2025, 15, 911. [Google Scholar] [CrossRef]
  16. Basnet, S.M.P.; Jin, A.; Mahtab, S. Applying machine learning approach in predicting short-term rockburst risks using microseismic information: A comparison of parametric and non-parametric models. Nat. Hazards 2024, 121, 731–758. [Google Scholar] [CrossRef]
  17. Yu, G.F.; Yuan, L.; Ren, B.; Li, L.C.; Cheng, G.W.; Han, Y.C.; Mu, W.Q.; Wang, S.X. Big Data Prediction and Early Warning Platform for Floor Water Inrush Disaster. J. China Coal Soc. 2021, 46, 3502–3514. [Google Scholar]
  18. Li, L.P.; Jia, C.; Sun, Z.Z.; Liu, H.L.; Cheng, S. Research Status and Development Trend of Major Engineering Disaster Prevention and Control Technology in Deep Underground. J. Cent. South Univ. (Sci. Technol.) 2021, 52, 2539–2556. [Google Scholar]
  19. Liu, Z.B.; Yan, J.S.; Wang, J.H.; Gao, Y.Q.; Yang, H.; Bai, B.J.; Lu, J.J.; Wang, H.W.; Wang, G. Transparent Prevention and Control System for Water Hazards in Mine Floors Under Empowerment Based on Spatiotemporal Information Fusion. Coal Geol. Explor. 2025, 53, 130–141. [Google Scholar]
  20. Chen, F.B.; Sun, X.D.; Wang, Y.J.; Feng, Y.J.; Sun, X.B.; Ma, Z.R.; Li, Y.; Liu, N. Research Status and Prospect of Microseismic Monitoring Technology in Coal Mines. Coal Eng. 2025, 57, 133–141. [Google Scholar]
  21. Darisma, D.; Mukuhira, Y.; Okamoto, K.; Aoyogi, N.; Uchide, T.; Ishibashi, T.; Asanuma, H.; Ito, T. Building the fracture network model for the Okuaizu geothermal field based on microseismic data analysis. Earth Planets Space 2024, 76, 107. [Google Scholar] [CrossRef]
  22. Kneipp, W.F.; Novotny, A.A.; Guzina, B.B. Full-waveform reconstruction of microseismic events via observations of acoustic pressure in the surrounding fluid. Inverse Probl. 2025, 41, 085011. [Google Scholar] [CrossRef]
  23. Pandey, A.; Tkalčić, H.; Ma, X. Investigating the Characteristics of Microseisms Using the Australian Seismic Arrays. J. Geophys. Res. Solid Earth 2025, 130, e2024JB031032. [Google Scholar] [CrossRef]
  24. Mohammadi, M.A. Dynamic Simulation and Conceptual Interaction Between Groundwater Level, Groundwater Pumping Costs, and Land Uplift in Abandoned Coal Mines in Germany. Min. Metall. Explor. 2025, 42, 2839–2862. [Google Scholar] [CrossRef]
  25. Wu, Q.; Wang, X.; Zhao, Y.; Zhang, X.; Jia, M. Simulation and Application of Mine Water Inrush Spreading Based on Water Balance in the Control Area of Coal Seam Floor Elevation. J. China Univ. Min. Technol. 2025, 54, 545–560. [Google Scholar]
  26. Cao, Y.C.; Fan, W.Q. Performance Analysis and Research of Mine Water Level Scale Recognition Based on Different Depth Recognition Algorithms. J. China Coal Soc. 2019, 44, 3529–3538. [Google Scholar]
  27. Liu, S.D.; Yang, C.; Zhang, J.; Li, C.Y.; Ren, C. Mine Microseismic and Electrical Coupling Monitoring Technology. J. China Coal Soc. 2024, 49, 586–600. [Google Scholar]
  28. Prakash, A.; Bharati, A.K. Implication of Electrical Resistivity Tomography for Precise Demarcation of Pothole Subsidence Potential Zone Over Shallow Depth Coal Mine Workings. J. Geol. Soc. India 2022, 98, 600–606. [Google Scholar] [CrossRef]
  29. Kulikov, V.A.; Yakovlev, A.G.; Polikarpova, V.A. Some Problems of Electrical Geophysical Prospecting Methods Used for Exploration of Ore Deposits. Geodyn. Tectonophys. 2021, 12, 731–747. [Google Scholar] [CrossRef]
  30. Mohand-Said, A.; Marquis, G.; Sambolian, S.; Girard, J.F.; Harrison, G.; Williard, E. Joint Inversion of Electromagnetic and Direct Current Resistivity Data Using Trust Regions. Application to Uranium Exploration in the Athabasca Basin. Geophys. Prospect. 2025, 73, e70079. [Google Scholar] [CrossRef]
  31. Vianney, J.M.; Hoth, N.; Moro, K.; Wardani, D.N.W.; Drebenstedt, C. Modeling of Water Inflow Zones in a Swedish Open-Pit Mine with ModelMuse and MODFLOW. Sustainability 2025, 17, 2466. [Google Scholar] [CrossRef]
  32. Ovchinnikov, N.P.; Zyryanov, I.V. Impact of Increased Water Inflow on Main Drainage System Efficiency in Mine. J. Min. Sci. 2023, 59, 342–347. [Google Scholar] [CrossRef]
  33. Yang, J.; Liang, X.Y.; Ding, X. Variation Characteristics of Mine Inflow During Mining of Deep Buried Coal Seams in Shaanxi and Inner Mongolia Contiguous Area. Coal Geol. Explor. 2017, 45, 97–101. [Google Scholar]
  34. Ahmadi, F.; Tohidi, M.; Sadrianzade, M. Streamflow prediction using a hybrid methodology based on variational mode decomposition (VMD) and machine learning approaches. Appl. Water Sci. 2023, 13, 135. [Google Scholar] [CrossRef]
  35. Ramezani, R.; Gheiby, A.; Malakooti, H.; Bazrafshan, O. An intelligent VMD-WTC-GRU hybrid framework with uncertainty quantification for forecasting extreme flood events in semi-arid regions. Ain Shams Eng. J. 2026, 17, 103960. [Google Scholar] [CrossRef]
  36. Kofi, A.S.; Yuan, Y.; Chen, Z.; Li, B.; Qin, Z.; Apolline, D.A. Study on Intelligent Early Prediction Method of Ground Pressure in Fully Mechanized Mining Face of Coal Mine Based on Random Forest and LSTM. Min. Metall. Explor. 2026, 43, 1099–1118. [Google Scholar] [CrossRef]
  37. Kumari, K.; Dey, P.; Kumar, C.; Pandit, D.; Mishra, S.S.; Kisku, V.; Chaulya, S.K.; Ray, S.K.; Prasad, G.M. UMAP and LSTM based fire status and explosibility prediction for sealed-off area in underground coal mine. Process Saf. Environ. Prot. 2021, 146, 837–852. [Google Scholar] [CrossRef]
  38. Aydin, M.C.; Gelberi, G.; Ulu, A.E. Investigation of recent level changes in Lake Van using water balance, LSTM and ANN approaches. Appl. Water Sci. 2024, 14, 41. [Google Scholar] [CrossRef]
  39. Mehdinejadiani, B. A novel inverse model insensitive to initial guesses for estimating parameters of continuous time random walk-truncated power law model. J. Hydrol. 2025, 658, 133206. [Google Scholar] [CrossRef]
  40. Maroufi, H.; Mehdinejadiani, B. A comparative study on using metaheuristic algorithms for simultaneously estimating parameters of space fractional advection-dispersion equation. J. Hydrol. 2021, 602, 126757. [Google Scholar] [CrossRef]
Figure 1. VMD Decomposition Results.
Figure 1. VMD Decomposition Results.
Water 18 01036 g001
Figure 2. LSTM Algorithm Model Prediction Results.
Figure 2. LSTM Algorithm Model Prediction Results.
Water 18 01036 g002
Figure 3. VMD-LSTM Algorithm Model Prediction Results.
Figure 3. VMD-LSTM Algorithm Model Prediction Results.
Water 18 01036 g003
Figure 4. Regression Results Error Bar Chart.
Figure 4. Regression Results Error Bar Chart.
Water 18 01036 g004
Table 1. Statistical data of 6 indicators.
Table 1. Statistical data of 6 indicators.
IndicatorMean (μ)Standard
Deviation (σ)
Coefficient of
Variation (CV/%)
Roof Fracture Height55.65 m6.08 m10.93
Microseismic Frequency43.63 times11.22 times25.72
Water Level in
Observation Borehole 1
5.71 m0.13 m2.28
Water Level in
Observation Borehole 2
157.85 m9.05 m5.73
Self-Potential234.12 mA36.95 mA15.78
Daily Water Inflow1574.59 m3643.61 m340.87
Table 2. Correlation statistics of 6 indicators.
Table 2. Correlation statistics of 6 indicators.
IndicatorRoof
Fracture Height
Microseismic
Frequency
Water Level in Observation Borehole 1Water Level in Observation Borehole 2Self-
Potential
Daily Water Inflow
Roof Fracture Height10.470.420.840.790.8
Microseismic
Frequency
0.4710.20.340.320.38
Water Level in
Observation Borehole 1
0.420.210.560.130.81
Water Level in
Observation Borehole 2
0.840.340.5610.560.84
Self-Potential0.790.320.130.5610.46
Daily Water Inflow0.80.380.810.840.461
Table 3. Comparison of decomposition models.
Table 3. Comparison of decomposition models.
Evaluation
Dimension
Key IndicatorVMDEMDEEMDCEEMDAN
Frequency
Separation
Accuracy
Modal Spectrum Overlap0.1%5%2%0.5%
Center Frequency Deviation Rate0.2%5.80%2.30%1.50%
Full Width at Half Maximum0.02 Hz0.1 Hz0.05 Hz0.03 Hz
Reconstruction AccuracyRMSE0.12 m0.18 m0.12 m0.08 m
Engineering ApplicabilitySuitable for complex high-noise engineering dataNot suitable for complex noise environmentsNot suitable for batch processingSuitable for conventional engineering data
Table 4. Comparison of prediction models.
Table 4. Comparison of prediction models.
ModelRMSE (m)MAE (m)MAPE (%)R2
LSTM1.761.092.130.93
BP neural network1.480.911.740.91
GRU1.731.292.440.87
SVR1.461.021.890.91
Table 5. Regression Calculation of Various Indicators Based on Correlated Indicators.
Table 5. Regression Calculation of Various Indicators Based on Correlated Indicators.
IndicatorMean95% Confidence Interval
Water Level in Observation Borehole 2158.94 m±1.28 m
Self-Potential238.30 mA±5.83 mA
Daily Water Inflow1648.27 m3±106.40 m3
Table 6. Indicator Comparison of Prediction Models.
Table 6. Indicator Comparison of Prediction Models.
DATAVMD-LSTMLSTM
MAE0.11 m0.13 m
MAPE0.19%0.23%
NSE0.930.87
HH0.210.23
GPI1.411.58
Table 7. Error of Regression Calculation Based on Correlated Indicators.
Table 7. Error of Regression Calculation Based on Correlated Indicators.
IndicatorMean Error (%)Fluctuation Range Error of 95% Confidence Interval (%)
Water Level in
Observation Borehole 2
0.637.22
Self-Potential4.677.53
Daily Water Inflow5.734.85
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, J.; Li, L.; Tan, W.; Qu, Z.; Mu, W.; Zhou, H.; Bai, J.; Wu, F. Coal Mine Roof Water Inrush Prediction Based on Machine Learning Research. Water 2026, 18, 1036. https://doi.org/10.3390/w18091036

AMA Style

Chen J, Li L, Tan W, Qu Z, Mu W, Zhou H, Bai J, Wu F. Coal Mine Roof Water Inrush Prediction Based on Machine Learning Research. Water. 2026; 18(9):1036. https://doi.org/10.3390/w18091036

Chicago/Turabian Style

Chen, Juntao, Lu Li, Wenfeng Tan, Zhu Qu, Wenqiang Mu, Haoyu Zhou, Jiwen Bai, and Fangcan Wu. 2026. "Coal Mine Roof Water Inrush Prediction Based on Machine Learning Research" Water 18, no. 9: 1036. https://doi.org/10.3390/w18091036

APA Style

Chen, J., Li, L., Tan, W., Qu, Z., Mu, W., Zhou, H., Bai, J., & Wu, F. (2026). Coal Mine Roof Water Inrush Prediction Based on Machine Learning Research. Water, 18(9), 1036. https://doi.org/10.3390/w18091036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop