# Classification of Water Source in Coal Mine Based on PCA-GA-ET

^{1}

^{2}

^{*}

## Abstract

**:**

^{+}+K

^{+}, Ca

^{2+}, Mg

^{2+}, Cl

^{−}, SO

_{4}

^{2−}, and HCO

_{3}

^{−}from observed water samples. An improved water source discrimination model is proposed which combines algorithms from data mining, classification models, and learning reinforcement. According to the Pearson correlation coefficient, Na

^{+}+K

^{+}has a strong correlation with HCO

_{3}

^{−}. To identify the major metrics, we performed principal component analysis (PCA), and the adaptive differential evolutionary genetic algorithm (GA) was utilized to optimize the depth of the extreme tree (ET) and the number of classifiers. Finally, the model distinguished 25 sets of studied samples from various water sources in the Pingdingshan coalfield. Comparative analysis demonstrated the efficacy of each stage of our work. PCA-GA-ET outperformed the conventional approaches, such as the support vector machine, BP artificial neural network, and random forest. The studies revealed that PCA-GA-ET can eliminate the information overlap between data and simplify the data structure and thereby improve the efficiency and accuracy of water source detection. We discovered that by utilizing the evolutionary algorithm to optimize parameters such as the depth of the extreme trees and the number of decision trees, we could get the model to converge faster and to be more stable and more accurate. The results suggest that PCA-GA-ET has good robustness and accuracy and can meet the needs of water source identification.

## 1. Introduction

## 2. The Theory of Methods

#### 2.1. Principal Component Analysis

#### 2.2. Genetic Algorithm

_{min}is the minimum solution that can be obtained by the parameter, and b

_{i}is the binary number of the optimal solution.

#### 2.3. Extreme Tree (ET)

## 3. Data Collection and Preprocessing

#### 3.1. Study Area

#### 3.2. Statistical Analysis and Data Processing

^{+}+K

^{+}, Ca

^{2+}, Mg

^{2+}, Cl

^{−}, SO

_{4}

^{2−}, and HCO

_{3}

^{−}were extracted from the sample water samples. These are hereafter referred to as X1, X2, X3, X4, X5, and X6. The statistics data are shown in Table 1.

^{−1}). X1 was more densely spread in the range of 0.14 to 100.00 (mg∙L

^{−1}). X2 ranged from 2.4 to 417.33 (mg∙L

^{−1}), and there was a tendency to show a double-wave peak. The X3 distribution ranged from 0 to 173.4 (mg∙L

^{−1}). X4 had a similar distribution to X3, but the X4 waveform shifted to the left and had a right-skewed distribution with a larger peak than that of X3. The distribution of X5 was comparable to that of X1. The X6 distribution varied from 52.48 to 2498.77 and was much greater than the other ion concentration distributions. This tends to increase the complexity of the calculation by a substantial difference in magnitude and can influence the discriminative model’s accuracy. As a result, the sample data must be standardized so that the data are proportionally constrained to the range [0, 1], thereby reducing the harmful impacts generated by anomalous data. The following is the standardizing equation:

_{1}, Y

_{2}, Y

_{3}, Y

_{4}, are Y

_{5}are freshly created features, and X

_{1}, X

_{2}, X

_{3}, X

_{4}, X

_{5}, and X

_{6}are original features.

## 4. Establishment of Water Source Discrimination Model

#### Comparison of Models

## 5. PCA-GA-ET Model Performance Verification

^{2+}, Mg

^{2+}, and Na

^{+}+K

^{+}have strong discrimination abilities.

## 6. Conclusions and Outlook

^{2+}had the highest, indicating that Ca

^{2+}had the best discrimination capacity in identifying water inrush sources.

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Hu, W.; Dong, S.; Yan, L. Water hazard control technology for safe extractionof coal resources influenced by faulted zone. Procedia Earth Planet. Sci.
**2011**, 3, 1–10. [Google Scholar] [CrossRef] - Howladar, M.F. Coal mining impacts on water environs around the Barapukuria coal mining area, Dinajpur, Bangladesh. Environ. Earth Sci.
**2013**, 70, 215–226. [Google Scholar] [CrossRef] - Li, B.; Wu, Q.; Liu, Z. Identification of mine water inrush source based on PCA-FDA: Xiandewang coal mine case. Geofluids
**2020**, 2020, 2584094. [Google Scholar] [CrossRef] - Zhou, M.; Lai, W.; Wang, Y.; Hu, F.; Li, D.; Wang, R. Application of CNN in LIF fluorescence spectrum image recognition of mine water inrush. Spectrosc. Spectr. Anal.
**2018**, 38, 2262–2266. [Google Scholar] - Bian, K.; Zhou, M.; Hu, F.; Lai, W.; Huang, M. CEEMD: A new method to identify mine water inrush based on the signal processing and laser-induced fluorescence. IEEE Access
**2020**, 8, 107076–107086. [Google Scholar] [CrossRef] - Zhang, H.; Yao, D. The Bayes recognition model for mine water inrush source based on multiple logistic regression analysis. Mine Water Environ.
**2020**, 39, 888–901. [Google Scholar] [CrossRef] - Huang, P.; Wang, X. Piper-PCA-Fisher recognition model of water inrush source: A case study of the Jiaozuo mining area. Geofluids
**2018**, 2018, 9205025. [Google Scholar] [CrossRef] - Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature
**1986**, 323, 399–421. [Google Scholar] [CrossRef] - Cover, T.M.; Hart, P.E. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory
**1967**, 13, 21–27. [Google Scholar] [CrossRef] - Daral, N. Histograms of Oriented Gradients for Human Detection. Proc. CVPR
**2005**, 2005, 886–893. [Google Scholar] - Kut, P.; Pietrucha-Urbanik, K. Most Searched Topics in the Scientific Literature on Failures in Photovoltaic Installations. Energies
**2022**, 15, 8108. [Google Scholar] [CrossRef] - Nishitsuji, Y.; Exley, R. Elastic impedance based facies classification using support vector machine and deep learning. Geophys. Prospect.
**2019**, 67, 1040–1054. [Google Scholar] [CrossRef] - Feng, R. A Bayesian approach in machine learning for lithofacies classification and its uncertainty analysis. IEEE Geosci. Remote Sens. Lett.
**2020**, 18, 18–22. [Google Scholar] [CrossRef] - Caté, A.; Schetselaar, E.; Mercier-Langevin, P.; Ross, P.S. Classification of lithostratigraphic and alteration units from drillhole lithogeochemical data using machine learning: A case study from the Lalor volcanogenic massive sulphide deposit, Snow Lake, Manitoba, Canada. J. Geochem. Explor.
**2018**, 188, 216–228. [Google Scholar] [CrossRef] - Zhang, H.; Xing, H.; Yao, D.; Liu, L.; Xue, D.; Guo, F. The multiple logistic regression recognition model for mine water inrush source based on cluster analysis. Environ. Earth Sci.
**2019**, 78, 612. [Google Scholar] [CrossRef] - Huang, P.; Yang, Z.; Wang, X.; Ding, F. Research on Piper-PCA-Bayes-LOOCV discrimination model of water inrush source in mines. Arab. J. Geosci.
**2019**, 12, 334. [Google Scholar] [CrossRef] - Jiang, C.; Zhu, S.; Hu, H.; An, S.; Su, W.; Chen, X.; Li, C.; Zheng, L. Deep learning model based on big data for water source discrimination in an underground multiaquifer coal mine. Bull. Eng. Geol. Environ.
**2022**, 81, 26. [Google Scholar] [CrossRef] - Wei, Z.; Dong, D.; Ji, Y.; Ding, J.; Yu, L. Source Discrimination of Mine Water Inrush Using Multiple Combinations of an Improved Support Vector Machine Model. Mine Water Environ.
**2022**, 41, 1106–1117. [Google Scholar] [CrossRef] - Yan, B.; Ren, F.; Cai, M.; Qiao, C. Bayesian model based on Markov chain Monte Carlo for identifying mine water sources in Submarine Gold Mining. J. Clean. Prod.
**2020**, 253, 120008. [Google Scholar] [CrossRef] - Wang, Y.; Shi, L.; Wang, M.; Liu, T. Hydrochemical analysis and discrimination of mine water source of the Jiaojia gold mine area, China. Environ. Earth Sci.
**2020**, 79, 123. [Google Scholar] [CrossRef] - Zhang, Y.; Tang, S.; Shi, K. Risk assessment of coal mine water inrush based on PCA-DBN. Sci. Rep.
**2022**, 12, 1370. [Google Scholar] [CrossRef] [PubMed]

**Figure 7.**Performance comparison of ET, PCA-ET, PCA-GA-ET, GS-RF, PSO-SVM, and MLP in learning samples.

**Figure 8.**The discriminant results of the PCA-GA-ET water source identification model on the validation samples.

Category | Sample Capacity | Target | One-Hot Encoding |
---|---|---|---|

Surface water | 19 | 0 | [1.0.0.0.0] |

Quaternary pore water | 16 | 1 | [0.1.0.0.0] |

Carboniferous limestone karst water | 44 | 2 | [0.0.1.0.0] |

Permian sandstone water | 22 | 3 | [0.0.0.1.0] |

Cambrian limestone karst water | 23 | 4 | [0.0.0.0.1] |

X1 | X2 | X3 | X4 | X5 | X6 | |
---|---|---|---|---|---|---|

X1 | 1.0000 | |||||

X2 | −0.3316 | 1.0000 | ||||

X3 | 0.0129 | 0.4928 | 1.0000 | |||

X4 | 0.4187 | 0.1846 | 0.3257 | 1.0000 | ||

X5 | 0.2354 | 0.3780 | 0.5666 | 0.1947 | 1.0000 | |

X6 | 0.8849 | −0.3039 | −0.1197 | 0.4391 | 0.0206 | 1.0000 |

**Table 3.**MLP, PSO-SVM, and GS-RF comparison model parameters (The maximum number of features of the RF model was simplified to MF, and the number of hidden layers, neurons per layer, and learning rate of the MLP model were simplified to HL, NL, and LR).

GS-RF | ||||||

Parameter | Algorithm | MF | ES | DP | MSL | MSS |

Value | CART | 6 | 30 | 16 | 2 | 1 |

PSO-SVM | ||||||

Parameter | Kernel | C | Gamma | Cache_size | Class_weight | Tol |

Value | RBF | 6 | 8 | 200 | 1 | 0.001 |

MLP | ||||||

Parameter | Activation | HL | NL | LR | Batch_size | Alpha |

Value | Relu | 2 | 15 | 0.001 | 64 | 0.0001 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Yang, Z.; Lv, H.; Wang, X.; Yan, H.; Xu, Z.
Classification of Water Source in Coal Mine Based on PCA-GA-ET. *Water* **2023**, *15*, 1945.
https://doi.org/10.3390/w15101945

**AMA Style**

Yang Z, Lv H, Wang X, Yan H, Xu Z.
Classification of Water Source in Coal Mine Based on PCA-GA-ET. *Water*. 2023; 15(10):1945.
https://doi.org/10.3390/w15101945

**Chicago/Turabian Style**

Yang, Zhenwei, Hang Lv, Xinyi Wang, Hengrui Yan, and Zhaofeng Xu.
2023. "Classification of Water Source in Coal Mine Based on PCA-GA-ET" *Water* 15, no. 10: 1945.
https://doi.org/10.3390/w15101945