A Prediction Model of Coal Seam Roof Water Abundance Based on PSO-GA-BP Neural Network

: With the gradual increase of coal production capacity, the issue of water hazards in coal seam roofs is increasing in prominence. Accurate and effective prediction of the water content of the roof aquifer, based on limited hydrogeological data, is critical to the identiﬁcation of the central area of prevention and control of coal seam roof water damage and the reduction of the incidence of such accidents in coal mines. In this paper, we establish a prediction model for the water abundance of the roof slab aquifer, using a PSO-GA-BP neural network. Our model is based on ﬁve key factors: aquifer thickness, permeability coefﬁcient, core recovery, number of sandstone and mudstone interbedded layers, and fold ﬂuctuation. The model integrates the genetic algorithm (GA) into the particle swarm optimization (PSO) algorithm, with the particle swarm optimization algorithm serving as the primary approach. It utilizes adaptive inertia weight and quadratic optimization of the weights and thresholds of the backpropagation neural network to minimize the output error threshold for the purpose of minimizing output errors. The prediction model is applied to hydrogeology and coal mine production for the ﬁrst time. The model is trained using 100 data samples collected by the Surfer 13 software. These samples help to accurately predict the unit inﬂow of water. The model is then compared with traditional forecasting methods such as FAHP, BP, and GA-BP neural network models to determine its efﬁciency. The study found that the PSO-GA-BP neural network model accurately predicts aquifer water abundance with higher precision. The root mean square error (RMSE) of the test set is determined to be 8.7 × 10 − 4 , and the ﬁtting result is measured at 0.9999, indicating minimal error with actual values of the sample. According to the prediction results of the test set, the water abundance capacity of the No. 7 coal mine in Hami Danan Lake is divided, and it is found that the overall difference between the results and the actual value is small, which veriﬁes the reliability of the model. According to the results of the water abundance division, strong water abundance areas are mainly concentrated in the third-partition area. This study provides a new method for the prediction of aquifer water abundance, improves the prediction accuracy of aquifer water abundance, reduces the cost of coal mine production, and provides a scientiﬁc evaluation method and a theoretical basis for the prevention and control of water disasters in coal seam roofs.


Introduction
The Cretaceous and Jurassic coal seams in western China are the main sources of coal production in China.These Cretaceous and Jurassic strata exhibit obvious weak cementation characteristics, including low strength, easy weathering, mud disintegration upon encountering water, and so on [1][2][3].With the increasing coal production capacity, the issue of water damage on the roof of coal seams has become increasingly significant, posing a severe safety risk.Roof water damage is strongly linked to the water abundance of the roof aquifer, and the water inflow per unit of drilling is the most intuitive hydrogeological parameter to determine its water abundance nature.Therefore, accurate and efficient Water 2023, 15, 4117 2 of 16 prediction of the water inflow per unit of the roof aquifer can identify the principal areas for preventing and controll water damage to the roof in coal seams, propose timely, scientific and effective prevention and remedial measures to significantly reduce the incidence of roof water damage accidents in coal mines [4,5].Numerous studies have investigated the prediction of aquifer water abundance, including the use of conventional methods [5][6][7][8][9][10][11][12][13], as well as combining these methods with GIS data management and spatial analysis to classify areas of varying water abundance [14][15][16][17][18][19].The conventional means of predicting the aquifer's water abundance, along with other linear mathematical methods, are affected greatly by site data and human factors, and the calculation index weight is subjective and error is large, so it is difficult to accurately show the aquifer's water abundance.Therefore, some scholars have suggested employing artificial neural networks to forecast the water abundance of aquifers [20][21][22][23].Neural networks possess self-adaptive, self-learning, and fault-tolerant capabilities, which effectively address this issue.Among these, the BP neural network is the most widely used.Notably, the BP neural network suffers from several inadequacies, including a tendency to fall into local optima, slow convergence speed, and poor prediction accuracy.The prediction value of aquifer water abundance obtained by the BP neural network may contain errors.The genetic algorithm (GA) is used to optimize the BP network to overcome the limitations of local optima, and enhance the prediction accuracy of aquifer water abundance [24][25][26].The genetic algorithm boasts an exceptional ability for global optimization.Nonetheless, it lacks a memory capacity which may result in the omission of optimal points and ultimately lead to suboptimal results in water abundance prediction.
To improve water abundance prediction, we introduce the particle swarm optimization (PSO) algorithm into the GA-BP neural network.PSO algorithm has the disadvantages of slow convergence in the later stage and possible convergence to the local extreme point, but it has the ability of memory, which can effectively supplement the shortcomings of GA and retain the individual and global optimal solution.The PSO-GA algorithm optimizes the BP neural network, while the population is optimized via the PSO algorithm.The optimal iteration is found through the use of the PSO algorithm.This algorithm is then integrated with the GA, which carries out crossover and mutation operations on particles during the particle swarm iteration.By combining the advantages of both algorithms, the weights and thresholds of the BP neural network are optimized.The solution obtained is then substituted into the BP neural network for subsequent calculations to minimize the output error.
The PSO-GA-BP neural network prediction model was applied to hydrogeology and mine safety for the first time, which solved the shortcomings such as low prediction accuracy of the BP neural network, lack of memory ability of the GA-BP neural network, and the potential for losing most of their advantages.It solved the limitations of traditional prediction methods, such as the great influence of site data and human factors, large error, etc., improved the prediction accuracy and reduced the production cost.It is of great significance for coal mine safety production.

Study Area
The Dannanhu No. 7 coal mine is part of the Dannanhu mining area in Hami, Xinjiang.It is situated in the southern section of Hami City, Xinjiang (Figure 1a,c), located within the Nanhu Gobi region.The terrain is high in the north and south, with a low center.The surface is mostly covered by the Quaternary and Neoproterozoic, with the Xishanyao formation of the Middle Jurassic occasionally visible in the southwestern area of the mine.
Based on the drill hole data, the strata in the area have developed chronologically with the Upper Carboniferous Wutongwuzi Formation (C 2 wt) as the oldest layer, followed by the Lower Jurassic Sangonghe Formation (J 1 s), and the Middle Jurassic Xishanyao Formation (J 2 x) and Toutunhe Formation (J 2 t) in the middle.The Neogene Pliocene Putaogou Formation (N 2 p) and Quaternary Formation (Q 4 ) are observed, and the coal-bearing strata is the Middle Jurassic Xishanyao Formation (J 2 x).
uifer and Jurassic Middle Xishanyao Formation fracture pore aquifer.The Xishany mation fracture pore aquifer is further subdivided into three coal mine roof aquif 3~7 coal mine fracture pore aquifers.The study mainly analyzes the water abund the three coal mine roof aquifers, with the third coal mine roof aquifer thickness be largest (Figure 1d).First of all, the correlation analysis was carried out for the main control fac aquifer water abundance, the main factors with high correlation were removed, th remaining main control factors were used to calculate and predict the unit water Finally, the prediction zone of water abundance was determined.The research r this paper is shown in Figure 2: The aquifers are the Quaternary permeable layer, Neogene Pliocene Putaogou Formation fracture pore weak aquifer, Jurassic Middle Toutunhe Formation fracture pore aquifer and Jurassic Middle Xishanyao Formation fracture pore aquifer.The Xishanyao Formation fracture pore aquifer is further subdivided into three coal mine roof aquifers and 3~7 coal mine fracture pore aquifers.The study mainly analyzes the water abundance of the three coal mine roof aquifers, with the third coal mine roof aquifer thickness being the largest (Figure 1d).
First of all, the correlation analysis was carried out for the main control factors of aquifer water abundance, the main factors with high correlation were removed, then, the remaining main control factors were used to calculate and predict the unit water inflow.Finally, the prediction zone of water abundance was determined.The research route of this paper is shown in Figure 2:

Analysis of the Main Controlling Factors of Water Abundance
To effectively assess aquifer-water yield, the impact of the main controlling fa on water abundance was analyzed in six aspects.These include aquifer thickness, (gravel)-mud ratio, permeability coefficient, core recovery rate, sand-mudstone inter number, and fold fluctuation degree [5,14,27].Such analysis is informed by prev scholarship.
The primary control factors require data collection via drilling and pumping tes and the data was not easy to obtain and scattered.To acquire more consistent and pendable information from the limited data, spatial interpolation was applied, specifi the Kriging interpolation in the thesis.Subsequently, contour maps displaying each o control factors were created.

Aquifer thickness;
The size of an aquifer's water-storage space is represented by its thickness.The t ness of the aquifer directly influences the amount of water present, with thicker aqu indicating higher water content.The study area's northeast features a large aquifer t ness, averaging about 95 m (Figure 3a).

Sand (gravel)-mud ratio;
The alternating layers of sand and mudstone frequently appear in the roof strata rounding coal seams.The sand-mud ratio signifies the thickness ratio of sand (grav mudstone (siltstone) in the aquifer.If the thickness of the aquifer remains relatively stant, and the strata is less affected by tectonism, the greater the thickness of sandsto the stratum, and the more abundant the aquifer's water content.By analyzing hydro logical borehole data, it was found that the higher the sand (gravel)-mud ratio, the gr the aquifer's permeability coefficient.This suggests that the aquifer possesses a stro permeability.Furthermore, the thickness of sandstone is greater than that of mudsto the northwestern section of the study area, resulting in a larger effective aquifer thick in that region and indicating a good water abundance nature.For instance, in hole S1 ratio of sand (gravel)-mud was 0.9172, and the permeability coefficient was 0.276 m/d rock interval consisting of sand (gravel) is extensive with a considerable water quan as depicted in Figure 3b.

Permeability coefficient;
The permeability coefficient reflects the aquifer's permeability, which can reflec

Analysis of the Main Controlling Factors of Water Abundance
To effectively assess aquifer-water yield, the impact of the main controlling factors on water abundance was analyzed in six aspects.These include aquifer thickness, sand (gravel)mud ratio, permeability coefficient, core recovery rate, sand-mudstone interlayer number, and fold fluctuation degree [5,14,27].Such analysis is informed by previous scholarship.
The primary control factors require data collection via drilling and pumping testing, and the data was not easy to obtain and scattered.To acquire more consistent and dependable information from the limited data, spatial interpolation was applied, specifically the Kriging interpolation in the thesis.Subsequently, contour maps displaying each of the control factors were created.

Aquifer thickness;
The size of an aquifer's water-storage space is represented by its thickness.The thickness of the aquifer directly influences the amount of water present, with thicker aquifers indicating higher water content.The study area's northeast features a large aquifer thickness, averaging about 95 m (Figure 3a).

Sand (gravel)-mud ratio;
The alternating layers of sand and mudstone frequently appear in the roof strata surrounding coal seams.The sand-mud ratio signifies the thickness ratio of sand (gravel) to mudstone (siltstone) in the aquifer.If the thickness of the aquifer remains relatively constant, and the strata is less affected by tectonism, the greater the thickness of sandstone in the stratum, and the more abundant the aquifer's water content.By analyzing hydrogeological borehole data, it was found that the higher the sand (gravel)-mud ratio, the greater the aquifer's permeability coefficient.This suggests that the aquifer possesses a stronger permeability.Furthermore, the thickness of sandstone is greater than that of mudstone in the northwestern section of the study area, resulting in a larger effective aquifer thickness in that region and indicating a good water abundance nature.For instance, in hole S1, the ratio of sand (gravel)-mud was 0.9172, and the permeability coefficient was 0.276 m/d.The rock interval consisting of sand (gravel) is extensive with a considerable water quantity, as depicted in Figure 3b.

Correlation Analysis of Factors Controlling Water Abundance
Correlation analysis is a statistical method used to judge the degree of correlation between two variables.The corresponding indicator is the correlation coefficient (r).The range of the correlation coefficient value is −1 ≤ r ≤ 1; the greater the absolute value of the correlation coefficient, the higher the degree of correlation between the two factors under control.A negative value indicates a negative correlation, while a positive value indicates a positive correlation [28,29].
In this study, the extent of correlation between the two variables controlling water abundance was assessed using the Pearson correlation coefficient method, utilizing the following formula:

Permeability coefficient;
The permeability coefficient reflects the aquifer's permeability, which can reflect the conditions of replenishment, runoff, and discharge.The greater the permeability coefficient value, the stronger the aquifer's permeability.The coefficient of permeability is predominantly associated with the level of rock fissure development: the more developed the fissures, the greater the degree of connectivity between them, and the lower the degree of filling within the fissures.As a result, the rock's permeability is stronger and the coefficient of permeability is subsequently larger.The composition of the aquifer roof within the third coal layer at Dannanhu No. 7 Coal Mine encompasses fine sandstone, coarse sandstone, conglomerate-bearing coarse sandstone, mudstone, and sandy mudstone.The analysis of data from hydrogeological borehole pumping tests revealed permeability coefficients ranging between 0.015 and 0.276 m/d, giving a permeability grade that ranged from weak to medium permeability (Figure 3c).

Core recovery;
The core recovery reflects the degree of fissure development in the strata and is measured by the ratio of the length of core taken to the drilling footage.The lower the core recovery rate is, the higher the degree of fracture development is, and the more broken the rock is.This suggests a larger water storage space per unit volume of aquifer, better groundwater runoff conditions, and a higher water content.The study area's core recovery ranges from 0.66 to 0.96 (Figure 3d).

Sand-mudstone interlayer number;
The number of sandstone and mudstone interlayers reflects the overlapping status of aquifer and aquifuge, which can reflect the strength of the hydraulic connection.The mudstone (siltstone) can weaken the water abundance of aquifer, and the more sandstone and mudstone interlayers of the same thickness of aquifer indicate that they are more aquifuge, and the more water abundance of aquifer is weakened.In the northwest of the study area, the number of sandstone and mudstone interlayers is the smallest, and the sand-gravel ratios are all greater than 0.5, indicating that the effective aquifer thickness is larger and the water abundance is richer here (Figure 3e).

6.
Fold fluctuation degree; The folds in the well field, especially the core and both sides of the syncline, have developed fissures.The surface water from the two wings converges to the middle and seeps down into groundwater.The groundwater flows along the layers and slopes to create a catchment space in the syncline, which favors groundwater recharge by the syncline structure.Generally speaking, increased fold undulation enhances water catchment efficacy and augments water abundance (Figure 3f).
where F ud denotes the fold fluctuation degree, H denotes the elevation of the bottom surface of the aquifer, and H min denotes the minimum elevation of the bottom surface of the aquifer.

Correlation Analysis of Factors Controlling Water Abundance
Correlation analysis is a statistical method used to judge the degree of correlation between two variables.The corresponding indicator is the correlation coefficient (r).The range of the correlation coefficient value is −1 ≤ r ≤ 1; the greater the absolute value of the correlation coefficient, the higher the degree of correlation between the two factors under control.A negative value indicates a negative correlation, while a positive value indicates a positive correlation [28,29].
In this study, the extent of correlation between the two variables controlling water abundance was assessed using the Pearson correlation coefficient method, utilizing the following formula: where n is the number of samples, X i and Y i refer to the ith sample value of X and Y, and X and Y are the mean values of the samples.When |r| ≤ 0. This paper uses 0.7 as the designated threshold.Once the correlation between the two factors surpasses 0.7, one of the control factors will be excluded, and the other one will subsequently undergo analysis.The correlation coefficient between the permeability coefficient and the sand (gravel)-mud ratio is greater than 0.7 (Table 1), it is necessary to eliminate one of the controlling factors.The correlation between the remaining four control factors and the permeability coefficient is even lower compared to the sand (gravel)-mud ratio, hence excluding the sand (gravel)-mud ratio indicator.

Principle of PSO-GA-BP Neural Network
The PSO algorithm concept originates from the study of how birds forage in flocks, which allows for optimal adaptation to the search behavior of particles in complex systems.In the PSO algorithm, particles possess solely two attributes: velocity and position.Their position and velocity are updated based on two extreme values: individual extreme value (P best ) and group extreme value (G best ).During the search process, particles are interconnected and share information with each other.The solution that every particle searches for individually is referred to as the individual extreme value, and the optimal individual extreme value within the particle swarm serves as the current global optimal solution.The particle swarm continues to iterate, updating velocity and position until the optimal solution that meets the termination condition is ultimately achieved [30,31].
Assuming that N particles form a tribe in an M-dimensional target search space, the particle attribute can be regarded as a D-dimensional vector, where the position of the i particle can be expressed as: The velocity of the first particle can be expressed as: The optimal position searched by the i particle is the individual extreme value, which is expressed as: P best = (P i1 , P i2 , . . . ,P iM ), i = 1, 2, . . ., N The global optimal position searched by the particle swarm is the group extreme value, and the extreme value is recorded as: After searching for individual and group extremes, the particles update their speed and position according to the Formulas (3) and (4): where ω is the inertia weight, which provides a balance between local and global search; v i , v i+1 are the current and post-update velocities of each iteration, respectively; c 1 , c 2 are the learning factors, which are used to control the movement of particles in each iteration, and both are positive; r 1 , r 2 are the random numbers in the range of [0, 1]; x i , x i+1 are the current and post-update displacements of each iteration, respectively; and N is the number of particles.
The main of particle swarm optimization are as follows: 1. Initialize the particle swarm.Initialize the parameters of the particle swarm, including the particle swarm population size, particle displacement, velocity, individual extreme value and particle swarm extreme value, etc.; 2. Calculates the particle fitness value.According to the problem to be solved, the corresponding fitness function is selected, and the individual fitness value of the initial particle swarm is calculated by using the fitness function; 3. Update individual extreme value and group extreme value.Comparing the individual adaptation value calculated in the previous step with the individual extreme value of the particle, if the individual adaptation value is better, the individual adaptation value is regarded as the individual optimal position of the population particle, otherwise the original individual extreme value will be maintained until a better individual extreme value appears.Comparing the individual extreme value and the group extreme value, if the individual extreme value is better than the group extreme value, then the individual extreme value is taken as the global optimal position of the particle swarm, otherwise the original group extreme value will be maintained until a better individual extreme value appears; 4. Update the position and speed of the particles.Update the position and velocity of particles according to Formulas (3) and (4); 5. Judgment of termination conditions.According to the set termination condition of the algorithm, it is judged whether the algorithm meets the end condition, if it does not meet the end condition, return to step 2, and if it meets the end condition, proceed to the next step; 6.Output population extremum.The extreme value is regarded as the global optimal solution of the particle swarm optimization.
The PSO-GA-BP prediction model utilizes a neural network integrated with a GA into a PSO algorithm, whereby the PSO algorithm serves as the primary component.This optimizes the weight and threshold of the BP neural network in order to achieve the objective of reducing the BP neural network's output error (Figure 4).
Firstly, initialize the structure parameters of the BP neural network.The number of nodes in the input, hidden, and output layers must be determined according to the sample data.Weight and threshold for the BP neural network must also be established, with careful encoding of each parameter.
Particle population initialization.Initialize the parameters necessary for the PSO algorithm and genetic algorithm, including crossover probability (P cross ), mutation probability (P mutation ), maximum particle velocity (V max ), inertia weight ω, learning factors c 1 , c 2 and iteration number N. Encode the operation on particle velocity and position using the same rules as mentioned above.Set the particle population size to be U.Then, calculate the particle using the fitness function adaptation value, find, and update the individual extreme value and population extreme value.Firstly, initialize the structure parameters of the BP neural network.The number of nodes in the input, hidden, and output layers must be determined according to the sample data.Weight and threshold for the BP neural network must also be established, with careful encoding of each parameter.
Particle population initialization.Initialize the parameters necessary for the PSO algorithm and genetic algorithm, including crossover probability (Pcross), mutation probability (Pmutation), maximum particle velocity (Vmax), inertia weight ω , learning factors 1 2 c c , and iteration number N. Encode the operation on particle velocity and position using the same rules as mentioned above.Set the particle population size to be U.Then, calculate the particle using the fitness function adaptation value, find, and update the individual extreme value and population extreme value.Seed group iteration.Initially, the seed swarm optimization algorithm is utilized for optimization iteration.The algorithm updates the particle velocity and position, arranges fitness values in ascending order, and then divides them into three equal parts.U1 denotes the better adapted values, U2 denotes the average, and U3 denotes the poorly adapted values.According to the genetic algorithm, the well-adapted subpopulation U1 is directly copied to the subsequent generation, while the general subpopulation U2 uses velocity crossover operator and position crossover operator to perform particle crossover operation with crossover probability.Compare the fitness values of particles before and after performing crossover operation, and copy the better-adapted particles to the next iteration.Using velocity crossover operator and position crossover operator to initialize the subpopulation U3 with poor fitness randomly with mutation probability, the particles obtained by mutation are put back into the particle population.Recalculate the fitness value of each particle within the population and compare the before and after values to update the Pbest and Gbest values.Repeat these steps until the best fitness value reaches the convergence accuracy or until the maximum number of iterations is reached.Then output the globally optimal solution.The primary parameter of the PSO algorithm is the inertia weight ω , which signifi- cantly affects the algorithm's convergence performance.A larger value of the inertia weight results in stronger global search capabilities of the particle swarm, minimizing chances of Seed group iteration.Initially, the seed swarm optimization algorithm is utilized for optimization iteration.The algorithm updates the particle velocity and position, arranges fitness values in ascending order, and then divides them into three equal parts.U 1 denotes the better adapted values, U 2 denotes the average, and U 3 denotes the poorly adapted values.According to the genetic algorithm, the well-adapted subpopulation U 1 is directly copied to the subsequent generation, while the general subpopulation U 2 uses velocity crossover operator and position crossover operator to perform particle crossover operation with crossover probability.Compare the fitness values of particles before and after performing crossover operation, and copy the better-adapted particles to the next iteration.Using velocity crossover operator and position crossover operator to initialize the subpopulation U 3 with poor fitness randomly with mutation probability, the particles obtained by mutation are put back into the particle population.Recalculate the fitness value of each particle within the population and compare the before and after values to update the P best and G best values.Repeat these steps until the best fitness value reaches the convergence accuracy or until the maximum number of iterations is reached.Then output the globally optimal solution.
The primary parameter of the PSO algorithm is the inertia weight ω, which significantly affects the algorithm's convergence performance.A larger value of the inertia weight results in stronger global search capabilities of the particle swarm, minimizing chances of falling into the local extreme point.Conversely, a smaller value of the inertia weight allows for faster convergence of the algorithm due to the stronger local convergence capabilities of the particle swarm.This paper uses adaptive inertia weights to monitor the real-time motion state of the particle population during each round of iteration in the PSO algorithm.Consequently, each particle's inertia weights in the population are dynamically adjusted according to the motion state, thus diminishing the number of unproductive iterations whilst enhancing the PSO algorithm's convergence performance [32][33][34].
Run the BP neural network.Replace the global optimal solution acquired in the preceding step into the BP neural network as the initial weights and thresholds of the BP neural network.Subsequently, execute the BP neural network to obtain the conclusive output of the PSO-GA-BP neural network.

Case Analysis
Following the aforementioned analysis, five control parameters were employed to evaluate the water abundance in the No. 3 coal roof aquifer, so the input layer comprised five neurons, wherein the water unit inflow was designated as the output layer.The initial number of neurons in the implied layer was determined by utilizing empirical Formula ( 5), where n represents the number of input units, m represents the number of output units, and a is a constant within the range of [1,10].In this case, a is taken as the value of 1.
The PSO-GA algorithm was applied to enhance the weight and threshold of the BP neural network.Initially, the genetic algorithm was used to discover the suitable optimal weights and thresholds for the generated hidden layers by the PSO algorithm each time, with the following parameters: the population size was 20, the maximum number of iterations was 20, the iterative objective function had upper and lower bounds of 1 × 10 −9 -1 × 10 −7 .Then, the PSO algorithm was used to continue optimization, so as to better classify and predict, and circle mapping was introduced for further optimization.Based on experience, the PSO algorithm converged in the second iteration.So, this paper specifies that the number of PSO iterations was set to 2, the number of particles was 5, the maximum velocity was 6, and the learning factors c 1 , c 2 were both 2.
To assess the model's credibility, a sample data set comprising 100 control factor data, including measured data, was obtained from the contour plots drawn in the previous section.Out of these, the first 91 sets of sample data were chosen for the model training set, while the remaining 9 sets were used as the test set to predict the unit influx of water.After the training, the PSO-GA-BP neural network model had achieved a root mean square error (RMSE) of 1.9 × 10 −4 for the training data, 8.7 × 10 −4 for the test data, and 5.1 × 10 −4 overall.Furthermore, the model exhibited a training set fit of 0.9999.

Discussion
To verify the superior performance of the PSO-GA-BP neural network model for water abundance prediction, the study conducted a comparative analysis with the traditional prediction method FAHP and other neural network models, including the BP and GA-BP neural network.

FAHP
Due to disregarding the imprecision of human subjective judgement, AHP frequently encounters issues of incongruity between judgement consistency and matrix consistency, as well as difficulties in conducting consistency tests during its application.Hence, FAHP is preferred as the conventional prediction approach for analysis.To ensure the consistency of forecasted results, we apply FAHP to anticipate and examine test set data.We assign five controlling factors for determining water abundance: aquifer thickness, permeability coefficient, core recovery, sand (gravel)-mudstone interlayer number, and fold fluctuation degree, which are then compared in pairs on a scale from 0.1 to 0.9 to generate the fuzzy complementary judgement matrix A.
Applying matrix A and Formula (6), we achieve row sum normalization to determine the weights of each evaluation index w i and acquire the weight vector W, W = (0.21, 0.22, 0.22, 0.19, 0.18) T .To evaluate the plausibility of the matrix A and weight value W, we must also perform a consistency test.According to Formulas ( 7) and ( 8), the characteristic matrix W * of the fuzzy complementary judgement matrix A can be expressed as: 0.50 0.49 0.49 0.52 0.54 0.51 0.50 0.50 0.53 0.55 0.51 0.50 0.50 0.53 0.55 0.48 0.47 0.47 0.50 0.52 0.46 0.45 0.45 0.48 0.50 According to Equation ( 9), the compatibility index I is calculated as 0.1, and the fuzzy complementary judgment matrix A satisfies the consistency requirement V = w i y i (10) where y i is the dimensionless value of the ith indicator.Dimensionless treatment of each assessment index, combined with the calculated weights of each assessment index, are used to derive the corresponding unit inflow value using Equation (10).After comparison with the actual values, the predicted water abundance classes of sample numbers 6, 7, and 9 do not match the actual values, and the accuracy of the model prediction is 67% (Table 2).

Other Neural Network Prediction Models
To compare and analyze the prediction accuracy of the PSO-GA-BP neural network prediction model with other neural network prediction models, the same sample data set as the PSO-GA-BP model was selected, the first 91 sample data sets were the model training set, and the remaining 9 sample data sets (measured data) were the test set for the BP neural network and GA-BP neural network training.
Firstly, the number of nodes in the hidden layer of the BP and GA-BP neural networks was preferably trained by using an empirical formula, and after many training iterations, when the number of nodes in the hidden layer was 7, the corresponding mean square error was minimized to 2.43 × 10 −5 .The structure of the BP neural network was 5-7-1, and the length of the genetic algorithm coding was calculated according to Equations ( 11)-( 13) where A, B, and C are the number of neurons in the input, hidden, and output layers, respectively, S 1 is the number of neural network weights, which is 42 in this paper, S 2 is the number of thresholds, which is 8 in this paper, and S 3 is the length of the genetic algorithm coding, which is 50 in this paper.
The BP neural network was constructed according to the optimal number of hidden layer nodes obtained above, the maximum number of network training times was 1000, the learning rate was 0.01, and the minimum error of the training target was 1 × 10 −5 .The weights and thresholds of the BP neural network were optimized using genetic algorithms, and the parameters of the genetic algorithms were initialized by setting the population size to 30, the maximum number of iterations to 50, the crossover probability to 0.8, and the mutation probability to 0.1.
According to the training test results, the RMSE of the BP neural network was 5.95 × 10 −3 , and the RMSE of the GA-BP neural network was 1.53 × 10 −3 .The accuracy of classifying aquifer water abundance classes according to the prediction results of the BP and GA-BP neural networks was 100% compared with the actual values, but the two prediction models had large errors, which are very likely to cause water abundance grade misclassification.The prediction accuracy of water inflow per unit plays an important role in the prevention and control of roof water hazards, especially for the weakly cemented strata, which has low strength and is easily muddied and disintegrated when it encounters water, and only by minimizing the prediction error can the hazard of coal seam mining be reduced to a minimum.The comparison of the actual values of the test set samples with the predicted values of each method and the error comparison are shown in Figure 5, the prediction results and errors of the three prediction models on the test set are detailed in Table 3. the predicted values of each method and the error comparison are shown in Figure 5, the prediction results and errors of the three prediction models on the test set are detailed in Table 3.
(a) (b) Based on the comparison results, the PSO-GA-BP neural network model exhibited the highest prediction accuracy, with a maximum error of 1.14 × 10 −3 and a minimum error of 6.66 × 10 −8 .The model is well-suited to the sample measured values, as demonstrated by a relatively straight error curve and a small range of error intervals.In second place is  Based on the comparison results, the PSO-GA-BP neural network model exhibited the highest prediction accuracy, with a maximum error of 1.14 × 10 −3 and a minimum error of 6.66 × 10 −8 .The model is well-suited to the sample measured values, as demonstrated by a relatively straight error curve and a small range of error intervals.In second place is the GA-BP model, with a maximum error of 5.26 × 10 −3 and a minimum error of 1.03 × 10 −4 .The BP neural network prediction model achieved the lowest accuracy, with a maximum error of 1.65 × 10 −2 and a minimum error of 6.25 × 10 −4 , exhibiting a wider range of error intervals.

Prediction Zoning of Water Abundance
In this paper, the unit water influx values predicted by the PSO-GA-BP neural network model test set and the natural break point classification method in Arcgis were used to divide the water abundance of the No. 7 coal mine in Damian Lake, Hami (Figure 6).The limit values were 0.048, 0.087, 0.117, and 0.146, respectively.On this basis, it was concluded that the stronger and the strongest water abundance areas are mainly distributed in the north of the study area, that is, in the third partition, and a small part of them are distributed at the northern end of the first and second partitions and the intersection of the fourth and fifth zones.The distribution area of strong water abundance is consistent with the thickness of aquifer, similar to the area of maximum permeability and fold fluctuation, and close to the area with larger core recovery rate, indicating that water abundance is positively correlated with these four main influencing factors; close to the minimum number of sandmudstone interlayers, indicating that water abundance is negatively related to the main influencing factors.
To ensure the model's reliability, we selected the actual unit water influx value of the same sample as the test set.We then used the model's predicted boundary value to divide the mine area's water abundance.The water-rich zoning map, depicted based on the ac- The distribution area of strong water abundance is consistent with the thickness of aquifer, similar to the area of maximum permeability and fold fluctuation, and close to the area with larger core recovery rate, indicating that water abundance is positively correlated with these four main influencing factors; close to the minimum number of sand-mudstone interlayers, indicating that water abundance is negatively related to the main influencing factors.
To ensure the model's reliability, we selected the actual unit water influx value of the same sample as the test set.We then used the model's predicted boundary value to divide the mine area's water abundance.The water-rich zoning map, depicted based on the actual unit water inflow values as shown in Figure 6, exhibits slightly larger areas of strong, stronger, weak, and weaker water abundance regions.However, the overall disparity was minimal, thereby affirming the model's reliability.

Conclusions and Forecast
Aiming at the issue of roof water disaster, a prediction model for the water abundance of a coal seam roof aquifer was developed using a PSO-GA-BP neural network.This model takes into account the five primary factors that influence the water abundance of the aquifer, and integrates the genetic algorithm with the PSO algorithm.The algorithm continually seeks the optimal weight and threshold for the implicit layer generated by the PSO algorithm.The particle swarm optimization algorithm is used as the main body, and the adaptive inertia weight is adopted to optimize the weight and threshold of the BP neural network, to achieve the goal of minimizing output error.
Compared to the traditional prediction methods, FAHP and BP, and the GA-BP neural network prediction model, this model omits the need to evaluate the subjectivity of each control factor, which compensates for the tendency of the BP neural network model to fall into the local optimum and the GA-BP neural network model to miss the optimum point.The inclusion of adaptive inertia weight in the PSO algorithm results in higher prediction accuracy and a more favorable forecasting outcome.Furthermore, the principal regulating factors in the forecasting model can be acquired in every construction borehole without the requirement for geophysical surveying, significantly decreasing the building expense.
As per the verified test set data, the predicted value of the model is a highly precise match to the actual value, the maximum error is only 1.14 × 10 −3 , and the pre-prediction accuracy is as high as 99.99%.By utilizing the predicted unit water influx value, the water abundance of the 3-coal roof aquifer could be partitioned, revealing that areas with greater water abundance were concentrated in the third partition.By comparing the predicted value with the partition results of the real value of unit water inflow, it was found that the prediction accuracy of the model test set was high, and the water-rich partition was consistent with the actual situation.
There are still some shortcomings in the prediction model, in view of the small number of main control factors in the model, whether a better prediction result can be obtained by using more control factors, and whether the combination of other main control factors is more suitable.Further research needs to be done to determine whether the model can be better optimized.

Figure 1 .
Figure 1.Study area.(a) Location of Hami City, Xinjiang; (b) The partition map of the Dann No. 7 Coal Mine; (c) Location of the Dannanhu No. 7 Coal Mine in Hami City; (d) Schemati gram of main coal seam and aquifer.

Figure 1 .
Figure 1.Study area.(a) Location of Hami City, Xinjiang; (b) The partition map of the Dannanhu No. 7 Coal Mine; (c) Location of the Dannanhu No. 7 Coal Mine in Hami City; (d) Schematic diagram of main coal seam and aquifer.Within the Dannanhu No. 7 Coal Mine area, there are Quaternary clay aquifers and sand-mudstone aquifers at the bottom of the Toutunhe Formation in the Middle Jurassic, sand-mudstone aquifers at the base of the upper Xishanyao Formation and in the middle part of the Xishanyao Formation in the Middle Jurassic.The aquifers are the Quaternary permeable layer, Neogene Pliocene Putaogou Formation fracture pore weak aquifer, Jurassic Middle Toutunhe Formation fracture pore aquifer and Jurassic Middle Xishanyao Formation fracture pore aquifer.The Xishanyao Formation fracture pore aquifer is further subdivided into three coal mine roof aquifers and 3~7 coal mine fracture pore aquifers.The study mainly analyzes the water abundance of the three coal mine roof aquifers, with the third coal mine roof aquifer thickness being the largest (Figure1d).First of all, the correlation analysis was carried out for the main control factors of aquifer water abundance, the main factors with high correlation were removed, then, the remaining main control factors were used to calculate and predict the unit water inflow.Finally, the prediction zone of water abundance was determined.The research route of this paper is shown in Figure2:

Figure 4 .
Figure 4. Flowchart of PSO-GA algorithm to optimize BP neural network.

Figure 4 .
Figure 4. Flowchart of PSO-GA algorithm to optimize BP neural network.

Water 2023 ,
15, x FOR PEER REVIEW 13 of 16

Figure 5 .
Figure 5. Analysis of neural network prediction results.(a) Comparison between actual value and the predicted value of test set; (b) Comparison of prediction errors of test set.

Figure 5 .
Figure 5. Analysis of neural network prediction results.(a) Comparison between actual value and the predicted value of test set; (b) Comparison of prediction errors of test set.

Figure 6 .
Figure 6.Water abundance zoning of coal roof aquifer.(a) The data comes from the test set predictions; (b) The data comes from the actual value.

Figure 6 .
Figure 6.Water abundance zoning of coal roof aquifer.(a) The data comes from the test set predictions; (b) The data comes from the actual value.
2, the two control factors are either extremely weakly correlated or uncorrelated.When 0.2 < |r| ≤ 0.4, the two control factors are weakly correlated.When 0.4 < |r| ≤ 0.6, the two control factors are moderately correlated.When 0.6 < |r| ≤ 0.8, the two control factors are highly correlated.When 0.8 < |r| ≤ 1, the two control factors are extremely highly correlated.

Table 1 .
Pearson's correlation coefficients between control factors.

Table 2 .
Evaluation and prediction of water abundance of FAHP.

Table 3 .
The results of neural network prediction Unit: L/(s•m).