Next Article in Journal
A Novel Approach for As-Built BIM Updating Using Inertial Measurement Unit and Mobile Laser Scanner
Previous Article in Journal
BPG-Based Lossy Compression of Three-Channel Remote Sensing Images with Visual Quality Control
Previous Article in Special Issue
Time-Series InSAR with Deep-Learning-Based Topography-Dependent Atmospheric Delay Correction for Potential Landslide Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Random Forest—Based Identification of Factors Influencing Ground Deformation Due to Mining Seismicity

Department of Geodesy and Geoinformatics, Faculty of Geoengineering, Mining and Geology, Wroclaw University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(15), 2742; https://doi.org/10.3390/rs16152742
Submission received: 3 June 2024 / Revised: 22 July 2024 / Accepted: 24 July 2024 / Published: 26 July 2024
(This article belongs to the Special Issue Machine Learning and Remote Sensing for Geohazards)

Abstract

:
The goal of this study was to develop a model describing the relationship between the ground-displacement-caused tremors induced by underground mining, and mining and geological factors using the Random Forest Regression machine learning method. The Rudna mine (Poland) was selected as the research area, which is one of the largest deep copper ore mines in the world. The SAR Interferometry methods, Differential Interferometric Synthetic Aperture Radar (DInSAR) and Small Baseline Subset (SBAS), were used in the first case to detect line-of-sight (LOS) displacements, and in the second case to detect cumulative LOS displacements caused by mining tremors. The best-prediction LOS displacement model was characterized by R2 = 0.93 and RMSE = 5 mm, which proved the high effectiveness and a high degree of explanation of the variation of the dependent variable. The identified statistically significant driving variables included duration of exploitation, the area of the exploitation field, energy, goaf area, and the average depth of field exploitation. The results of the research indicate the great potential of the proposed solutions due to the availability of data (found in the resources of each mine), and the effectiveness of the methods used.

1. Introduction

Ground deformations are an inherent element of underground mining and constitute an important research area and practical issue [1,2]. A special case of ground movements in mining areas are deformations resulting from induced tremors due to the unpredictable nature of seismic events [3,4,5,6]. In general, seismic tremors occur as a result of the disturbance of the natural balance in the rock mass [7,8,9,10]. This imbalance of the rock mass causes the release of potential energy from the rocks. Then, a small part of the potential energy turns into seismic energy, which in turn propagates in the form of elastic waves from the focus of the tremor. Induced tremors can cause not only ground deformation, but also damage to infrastructure (on the surface and underground) and pose a threat to human health and life [11]. The mining companies conduct periodic geodetic measurement campaigns along specific leveling lines to determine ground deformation increments caused by underground extraction of minerals. However, these measurements are predominately not spatially and temporally congruent with locations/regions of the shock epicenter.
The proliferation of satellite measurement methods, such as InSAR (Interferometric Synthetic Aperture Radar), has made it possible to detect ground deformations that have occurred in the past. In recent years, there have been many papers using InSAR to determine LOS (line-of-sight) displacements caused by induced shocks [12,13,14]. This is due to the universal access to data collected by satellite missions used in Earth-monitoring programs (Sentinel-1) and the provision of free software for their processing [15,16]. The publications mainly focus on identification of short- and long-term ground deformations using the InSAR methods [17,18], spatial relations between the location of the epicenter of the tremor and mining activities [19,20], and the places where ground deformation is revealed [21,22]. The analysis of scientific papers shows that the problem of ground deformation caused by events induced in the areas of underground exploitation requires research in terms of better understanding and characterization of ground deformations caused by induced tremors [9,23].
The current state of knowledge shows that the issue of ground deformation caused by underground-mining-induced seismicity has not been fully researched, understood, and described, especially in the domain of the relationship between the observed magnitude of deformation and potential causative factors [24,25]. The most recent publications attempt to analyze this relationship but with relation to the incremental deformations expected in mining grounds and not the sudden deformation caused by seismic events [26,27]. The other directions of research are the prediction of ground deformations [28] and automatic identification of ground deformations [29] caused by mining using InSAR and machine learning or deep learning methods. Thus, the main aim of our research was to model and statistically analyze the relationship between ground deformation caused by the tremors induced in the area of underground copper ore mining, and mining and geological factors. For this purpose, the SAR Interferometry methods, Differential Interferometry Synthetic Aperture Radar (DInSAR) and Small Baseline Subset (SBAS), were first used to detect ground deformation caused by induced shocks. Then, a machine learning approach was implemented to develop the relationship model and identify statistically significant causative factors.
The supervised machine learning based on regression method was chosen to work on the modeling problem, more precisely, the Random Forest Regression (RFR) algorithm, which enables the prediction of a given phenomenon and determination of the statistical significance of individual explanatory variables. Random Forest is used for classification and regression tasks [30,31,32,33]. In addition, it is resistant to model overfitting and the lack of some data in the training dataset. This machine learning algorithm deals with nonlinearity and does not require assumptions about the relationship between the predictors [34]. The relevant papers concerned prediction of dam deformation and prediction of other structural behavior of the object, isolation of representative factors influencing the dam, assessment of thermal loads, and comparison of Random Forest with other machine learning models [34,35,36,37,38,39]. Other studies have focused on predicting landslide movement and identifying the stage of deformation, determining the significance of factors, and determining susceptibility to landslides [40,41,42,43,44]. The papers were also about predicting ground subsidence due to changes in the groundwater level [45,46] and tunneling [47]. The RFR method was used in research related to induced and natural seismicity, in particular in predicting hazards with induced tremors [48], identification of factors affecting induced seismicity [49,50], and predicting the effects of seismic events [51]. The RFR algorithm was also used to determine the causes of security threats to the foundations of metro-lines [52] and to predict the safety factor for deep excavations [53].
In the presented papers, the coefficient of determination (R2) of the dependent variables’ prediction ranged from 75% to 98%. Moreover, in each case, the predictors with the greatest influence on the dependent variable were identified. However, the issue of modeling LOS displacements caused by mining tremors with the RFR method has not been studied in the literature so far.
This paper is organized into six sections. The Section 1 presents the current state-of-the-art and the motivation of the research. The Section 2 contains basic information about the characteristics of the research area. The descriptions of the data and the individual stages of the research are provided in Section 3. The results obtained in this study and their discussion are included in Section 4. The Section 5 presents a discussion of the results, and the Section 6 draws conclusions.
We use the terms “ground displacements” and “ground deformations” interchangeably. In both cases, it refers to the observed movements of the ground surface as a result of exploitation or induced shocks. The use of the term “displacements” results from the nomenclature used in Interferometric Synthetic Aperture Radar techniques [54,55]. We also use the terms shock and tremor interchangeably.

2. Area of Interest

The research site was the Rudna mining area in the south-western part of Poland, located between 16°2′10″E–16°14′31″E and 51°28′18″N–51°34′18″N. This area, together with 6 other mining areas, form the Legnica-Głogów Copper District (LGCD; an urban–industrial area and a copper ore basin in the northern part of the Dolnośląskie Voivodeship). Rudna is the largest copper ore mine in Europe and one of the largest underground mines of this type in the world (Figure 1).
Geologically, the mine is located on the Fore Sudeten Monocline (the so-called New Copper Basin), where the richest copper deposits in Poland have been documented. The geological and tectonic structure of the monocline was shaped by the sedimentation processes and rock mass movements. Therefore, the copper ore deposit is heavily disturbed by numerous tectonic dislocations (Biedrzychowa, Paulinowa, and Ruda Główna faults). The depth of the Rudna copper ore deposit varies from 844 m to 1250 m b.g.l., and the average thickness is 4.26 m. Access to the copper ore deposit is through 10 shafts, with a depth of 941 m b.g.l. up to 1244 m b.g.l. The room-and-pillar systems with roof deflection and hydraulic backfill are used to extract the copper ore.
Induced seismicity is an important aspect of the underground mining of copper ore deposits. The analysis of seismic activity carried out for the Legnica-Głogów Copper District area showed that the Rudna mining area is the most seismically active. The seismic activity is influenced by the natural conditions of the area, where there is a thick layer structure of the rock mass and tectonic faults, and the exploitation is carried out at great depths [56]. Moreover, the high-energy mining tremors (≥107 J) occurring in this area are related to the shock-producing rock formations lying above the exploited copper ore deposit. Additionally, the location of such tremors results from the intensity of exploitation of the deposit and the mining and geological conditions prevailing in this area.
Annually, on average, there are several thousand mining tremors, most of which are shocks with an energy of ≤ 103 J, which are imperceptible on the ground surface. Mining tremors with energies of 104 J and 105 J occur several hundred times a year. Phenomena with an energy of 106 J happen with a frequency of several dozen events per year. The smallest group, but the most dangerous, includes shocks with energy ≥ 107 J (2016—12 shocks, 2017—14 shocks, 2018—11 shocks, 2019—8 shocks, and 2020—6 shocks; Figure A1 in Appendix A). Induced seismic events with such energies can threaten human life and have a potentially destructive effect on the ground surface. These sudden and unexpected events are of great interest to the scientific and industrial communities due to the effects they cause (e.g., rapid ground deformation, potential to destruct mining and urban infrastructure, and risk for people’s safety).

3. Data and Methods

In this paper, we propose a research methodology consisting of five stages (Figure 2). In the first stage, seismic, geological, and mining data obtained from the KGHM Polish Copper Inc. (Measurement and Tremor Departments of the Rudna mine) were preprocessed in a geographic information system (GIS).

3.1. Data

The analyzed materials included a list of induced tremors with basic information about the event (e.g., location, date, energy, etc.), geological profiles, maps of mining excavations, and reports on periodic leveling measurements. The above materials were used to develop a database of the independent variables considered as potential causative factors for the observed ground deformations. The specific data could not be given in the paper (apart from the shock data in Table 1) due to the lack of consent from the mine. A list of the variables considered in our study, accompanied by a short description, is presented in Table A1 (Appendix A). The research focused on 11 induced shocks (Table 1), characterized by an energy above 107 J. Ground deformations caused by these mining tremors were determined. Two SAR Interferometry methods:
  • Differential InSAR for the period from 28/11/2016 to 22/12/2019—104 images,
  • SBAS InSAR for the period from 11/10/2016 to 26/02/2020—189 images,
were used based on Sentinel-1 satellite imagery (Level-1 Single-Look Complex (SLC) products, ascending path 73, and descending path 22).
Table 1. List of mining tremor events recorded by the Mine Geophysics Station of the Rudna mine.
Table 1. List of mining tremor events recorded by the Mine Geophysics Station of the Rudna mine.
DataTimeEnergy (J)Magnitude 1XYZ
29 November 201621:09:401.00 × 1083.505,709,7175,580,593−963
16 December 20167:46:509.50 × 1073.505,707,5945,578,375−790
22 January 201720:08:151.10 ×1073.015,708,1195,577,792−810
9 April 20170:23:116.60 ×1073.405,709,2245,576,982−816
10 November 201712:19:062.30 × 1073.175,709,2455,576,989−806
7 December 201718:42:495.00 × 1073.345,707,6535,576,009−775
26 December 201712:15:291.20 × 1083.545,709,0655,576,432−824
15 September 201818:35:143.00 × 1083.745,705,2235,579,084−649
20 November 20187:15:553.00 × 1073.235,706,3395,579,021−717
29 January 201913:53:433.10 × 1083.755,708,8005,577,770−773
30 November 20195:58:324.50 × 1073.325,707,5065,577,645−799
1 Calculated from the formula log E = a + b M [57], where: E—energy; M—magnitude; a and b—coefficients determined empirically (related to the dimensions of the quake focus, the density of the rock medium, and the speed of seismic waves), corresponding to values of 1.15 and 1.96.

3.2. Calculation of Ground Displacements

A detailed description of all methods used in this research is provided in the references included in the text. All the calculations using InSAR methods were performed in the GMT5SAR software. The first method enabled the calculation of LOS displacements on the basis of two subsequent satellite images at two different times. The DInSAR method was used to detect and monitor changes in the ground surface occurring between two passes of a satellite equipped with the SAR instrument over the same place on the ground. It was developed by the authors of [58,59] and is widely used to detect LOS displacements caused, among others, by landslides [60], earthquakes [61], mining exploitation [62], volcanic eruption [63], and glacier movements [64]. For each shock, calculations were performed with time intervals of 6, 12, 18, and 24 days for both paths, while the reference image in each pair in individual paths was constant and closest to the moment of the shock’s occurrence. The second method—SBAS—allowed for the detection of cumulative LOS displacements on the basis of a larger number of images, for a longer period of time, and for two paths. The SBAS method allows to track the development of deformation of the studied area over time by analyzing a series of interferograms. Time series methods ensure data redundancy and determination of LOS displacement increments at the millimeter level [65]. SBAS builds on the core principles of the DInSAR method and was developed by the authors of [66]. The SBAS time series, similar to DInSAR, is widely used to detect various types of LOS displacements related to, among others, landslides [67], earthquakes [68], mining [69], volcanic eruptions [70], and glacier movements [71].
The results from the SBAS method were a verification of the results obtained with the DInSAR method. Moreover, the LOS displacements for the intervals of 6, 12, 18, and 24 days were also calculated from the SBAS time series of cumulative LOS displacements. The calculations were performed in the following way: the cumulative LOS displacements after the shock minus the cumulative LOS displacements before the shock. This process allowed the transition from time series to “normal” LOS displacements at the above time intervals. The result of this operation was LOS displacements caused by a given shock. The calculated LOS displacements served as dependent variables. The third stage included the development of a set of training data based on the results of the DInSAR and SBAS methods (data on LOS displacements caused by induced tremors) and materials obtained from the Rudna mine (data on mining and geological conditions). The geodatabase of dependent and independent variables for predicting was developed in the ESRI ArcGIS Pro 2.8 software using the following tools: georeferencing, vectorization, and spatial analyses, e.g., generation of centroids of the analyzed exploitation fields and determination of the shortest distance between the epicenter and fault zones. The training set ultimately consisted of 89 observations with values of LOS displacements for the time intervals of 6, 12, 18, and 24 days as the dependent variables, and 28 independent variables relating to mining and geological conditions. The potential explanatory variables influencing the values of LOS displacements caused by mining tremors considered in the study are shown in Table A1 in Appendix A. The exploratory data analysis was carried out in the fourth stage. Pearson’s matrix of linear correlation coefficients (Figure A3 in Appendix A) and the scatterplot matrix (Figure A4 in Appendix A) were analyzed.

3.3. Identification of Ground Displacement Factors

In the next stage, the supervised machine learning approach using the Random Forest Regressor (RFR) algorithm was adopted. The main concept of the algorithm is to build a set of decision trees that act independently as regression functions [72,73,74]. The prepared dataset was randomly divided into separate subsets: training and testing, where 20% of the PLOS samples were assigned to the variables x_test and y_test and 80% of the samples to x_train and y_train. The selection of the appropriate proportion of the division resulted from the size of the dataset and practical rules related to machine learning [75,76]. Additionally, the normalization process was performed on the dataset. Normalization consisted of rescaling the samples to the range [0, 1].
To generate an effective machine learning model, it was necessary to optimize the hyperparameters of the RFR algorithm with the scikit-learn library. In the analysis of scientific papers pertaining to the application of the RFR method [35,38,44], the following hyperparameters were tuned: number of trees in the forest (n_estimators), maximum depth of the trees (max_depth), minimum number of samples required to split an internal node (min_samples_split), minimum number of samples required to be at a leaf node (min_samples_leaf), and number of features to consider when looking for the best split (max_features). The grid search method was used to optimize the hyperparameters [77]. This method enters a list of values for various algorithm hyperparameters, and then evaluates the model’s performance based on the optimal combination of hyperparameter values. The values of the hyperparameters used in the tests are presented in Table 2. Additionally, the bootstrap and oob_score parameters were set to True.
In the process of searching the grid, values of the coefficient of determination (R2) were generated for the training and test sets based on the combination of hyperparameters. This step also included 10-fold cross-validation, which was used to tune the model’s hyperparameters [78]. During processing, multiple iterations were performed with different sets of values for each hyperparameter to narrow down the search area and determine the most optimal values. The final result was analyzed in terms of the model’s effectiveness, as well as the modeling accuracy of the largest LOS displacement values. For a more detailed analysis and interpretation of the prediction model, the statistical significance of the independent variables was determined using three proposed methods: MDI (Mean Decrease in Impurity), MDA (Mean Decrease in Accuracy), and SHAP (Shapley values). The first was the Mean Decrease in Impurity method, which is the most common way to calculate the importance of variables [74]. Impurity is a measure by which the (locally) optimal state is selected in a range from 0 to 1. The second method, Mean Decrease in Accuracy, is based on the direct measurement of the influence of each variable on the accuracy of the model (R2) and is calculated on the test set after fitting the model; therefore, neither the model nor the prediction was changed [79]. The MDI and MDA methods are conventional techniques for determining the importance of model variables. A novel approach is to use classic Shapley values to explain models. On this basis, the SHAP algorithm was created, defined as a unified prediction interpretation framework [80].

4. Results

The LOS displacement results that were used for the analyses had coherence values ranging from 0.26 to 0.46. Generally, the largest LOS displacement was −111 mm (shock from 29 January 2019) and the smallest was −15 mm (shock from 10 November 2017) the for DInSAR method. The LOS displacement was, respectively, −87 mm (shock from 26 December 2017) and −17 mm (shock from 10 November 2017) for the SBAS method (after converting cumulative LOS displacements into LOS displacements for individual time intervals). Figure A4 in Appendix A shows example results from both methods for each shock analyzed. The results may differ because not every map of ground displacement caused by a given shock shows LOS displacements from the same path and with the same time interval.
The scatterplots were analyzed in the next step, but they showed that the variables did not have strong linear relationships. The distribution of variables on the graph, among others, EN vs. PLOS and GH vs. PLOS, showed that in some cases, several observations had the same value of the feature (Figure A2 in Appendix A). In the correlation matrix, the values |r| > 0.70 occurred between the following variables: PLOS and EN, as well as IC and CPOW. Another strong relationship, |r| > 0.60, was found between GH and PC, DR and SECO, SGEP and PC, PPE and SWUZ, and CTE and SWUZ. The highest values of correlation were shown by the PLOS variable with CTE (|r| = 0.67) and SWZU (|r| = 0.63). However, a negligible |r| was found for the following variables: PLOS and CPDW, IC and EN, IC and PPE, SMP and GH, and CPOW and SMP.
In the machine learning process, 25,600 model combinations were generated based on the proposed hyperparameter values and cross-validation results. The optimal values of hyperparameters generated for the model describing the relationship between LOS displacements and mining and geological factors are presented in Table 3.
Two outliers were observed in the results, precisely −111 mm and 100 mm. These were related to the PLOS value for the shock of 29 January 2019. This shock was characterized by the highest energy among the analyzed seismic events and was described with only two PLOS values of −111 mm and −100 mm (low coherence values), with time intervals of 6 days and 24 days, respectively. The accuracy of the developed model increased after removing outliers (Table 4).
The RMSE (Root Mean Square Error) for the training dataset remained the same and for the test dataset it decreased by 4 mm. The MAE (Mean Absolute Error) also improved (training dataset—same value, test dataset—1 mm reduction after removing outliers). The R2 values were 0.97 and 0.93, respectively. This shows that the model was able to explain 93% of the variability of the dependent variable based on the input independent variables. ME (Mean Error) dropped to 13 mm on the training dataset and to 17 mm on the test dataset. MAPE (Mean Absolute Percentage Error) was the only one that slightly increased compared to the previous model, both for the training set by 0.4% and for the test set by 1.3%. Additionally, the OOB error was reduced by 5%. These metrics indicate that the accuracy of the developed model and the predictive effectiveness increased. In addition to the metrics, graphs showing the results of the predictions without outliers were analyzed. The generated residual values did not indicate the presence of high outliers (Figure 3).
In total, five observations (three observations from the training dataset and two observations from the test dataset) were outside the range of residual values of −10 mm to 10 mm (highest number of observations). The graph showed a gap of −50 mm to −60 mm, because none of the analyzed shocks caused LOS displacements of such values. The LOS displacements formed two clusters, in the first (left side) there was higher deformation and residual values than in the second (right side). This relationship also applies to Figure 4.
The predicted values in the range from −20 mm to −40 mm were characterized by residuals of ±5 mm (Figure 5). The graph below shows the predicted values from residual values for the LOS displacement model, training, and test datasets without the two outliers. On the other hand, the prediction values from −70 mm to −100 mm assumed residuals also at the level above −5 mm and above 5 mm.
The histogram in Figure 6 shows the residual values for the training and test datasets. Residuals of ±5 mm constituted the most numerous group, around 83% of all observations. The significance of the smaller residual values ranged from 5 mm to 10 mm and from −5 mm to −10 mm.
Next, three algorithms to determine statistical significance of the independent variables were applied, MDI, MDA, and SHAP. For the analyzed best-performing RFR model, a graph of statistical significance of independent variables based on the MDI approach is presented in Figure 7.
In the graph in Figure 6, three groups of predictor importance with similar MDI values can be distinguished. The first one included four variables, with the CTE being statistically the most significant (MDI value of 0.144) and the PPE the least in this group (MDI value of 0.081). The second group contained the following seven variables: SLWUS, LEC, EN, SGEP, SWZU, LEF, and CPDW, the values of which ranged from 0.041 to 0.062. The third group consisted of the following variables: LPPPE, CPOW, SMP, SEZ, and OGEH, with values between 0.031 and 0.036. The following variables: DR, PC, IC, OECP, GH, and SECO, had low, below 0.03, MDI values. In addition, six potential predictors considered in the study: PFSE, PFNW, SLWCPS, OENUP, SET, and SLWPH, showed no effect on the model.
In the second approach based on the MDA method, the influence of each independent variable on the accuracy of the analyzed model was calculated using the eli5 library (Figure 8). The black value (reduction R2) presents the values determined by how much the accuracy of the model decreased after random reshuffling. Negative values for permutation validity refer to predictions where shuffled data turned out to be more accurate than the real data. In such a situation, the variable is irrelevant, and the importance should be close to 0. However, the randomness in the calculations made the predictions based on the shuffled values of variables more accurate. Such cases are specific to small datasets. The red value is weight, which shows how much the accuracy changed between one shuffle and the next.
The variable CTE showed the greatest statistical significance (0.095). The impact of the PPE (0.046), SLWUS (0.040), and SOESP (0.038) variables was less than half that of the CTE variable. The following variables: PZ, EN, SWZU, and OGEH, had values in the range of 0.029 to 0.022. The potential remaining predictors: LEC, CPDW, SGEP, DR, CPOW, LEF, SECO, OECP, PC, and SMP, showed less statistical significance, as the MDA values ranged from 0.01 to 0.02. The other variables, with scores below 0.01, had an insignificant impact on the determination coefficient. Among them, there were two variables with no influence, SET and SLWPH.
The test dataset was used to plot the graph using the SHAP library. Figure 9 presents a summary graph of the statistical significance of variables with absolute mean SHAP values.
This graph provides a global interpretation. The mean SHAP values indicate the extent to which each independent variable contributed to the prediction of the dependent variable. The variables were arranged in a descending manner. Once again, the CTE predictor was the most important variable for the analyzed model. It significantly exceeded the mean SHAP values of the other variables. As in the case of the MDA approach, several classes of independent variables with close mean SHAP values could be distinguished in the graph. The first group contained three variables, two of which, EN and SWZU, had similar values, and the variable SOESP was slightly smaller. The second group consisted of LEC, CPOW, and SGEP variables, with the same SHAP values. The third group included six potential predictors: LEF, IC, PPE, SLWUS, OGEH, and CPDW, with mean SHAP values similar but relatively smaller when compared to the previous groups. The variables DR and PZ had equal values. The fourth group consisted of the variables SMP, LPPPE, SEZ, and PC, which had similar SHAP values. The remaining variables had a negligible impact on the model, with the significance of the OENUP, SET, and SLWPH ones equal to zero.

5. Discussion

To study the relationship between the ground deformation caused by induced seismicity and candidate mining and geological factors, an InSAR- and RFR-based approach was proposed, followed by analysis of the results with three methods, MDI, MDA, and SHAP, used to explain the results of the machine learning models. The training set constituting 80% of the data was used to train the model, and a test set with 20% of the data was used to verify the results. The dataset included 89 observations. Research on datasets of similar sizes also appeared in the scientific literature [81,82]. The values of the dependent variable—PLOS—were associated with high-energy shocks, with energy ≥ 107 J. The InSAR results could not have been verified by other measurement methods, as no other results were available. The mine conducts precise leveling measurements in the study area twice a year. Thus, these measurements are not connected to the shocks and do not cover the area of the shock epicenter. The InSAR results based on analysis of data from two satellite tracks currently provide the only available source of data on ground deformation. Processing and analysis of two satellite racks provide for checking the consistency of the ground displacement results. The analyzed dataset contained two outliers for the 29 January 2019 shock that negatively affected the predictive accuracy of the model.
Outliers in the InSAR data could be caused by low coherence between images due to the presence or absence of snow cover in subsequent satellite passes [83]. Removal of the outliers improved the predictive accuracy of the model, especially on the test dataset, as R2 increased by 0.06 and RMSE decreased by 4 mm. Finally, the R2 value for the best-performing model was 0.93, which indicates high predictive accuracy. Thus, the proposed and developed independent variables made it possible to explain approximately 93% of the variation of the PLOS dependent variable. The developed model had a mean deviation (RMSE) of 7 mm between the observed values and the prediction. In addition, the mean value of the prediction errors (MAPE), expressed as a percentage of the actual values of the prediction variable, was 12%. The obtained metrics, especially R2, had values that characterized an effective regression model. We observed that predictions for values observed above −100 mm were underestimated due to their small number in the dataset. In other studies conducted on LOS displacements related to landslides or dams on water reservoirs, similar R2 values were obtained, i.e., above 0.90 [38,51,84]. No studies using the RFR algorithm for predicting LOS displacements caused by tremors induced by underground mining have been found in the scientific literature. Thus, it was impossible to directly compare the obtained results with similar research on this subject. However, there are publications reporting studies of identifying ground deformation factors in mining areas that are not related to induced seismicity and are based on machine learning approaches. Notably, Cieślik and Milczarek [85] determined the predicted values of LOS displacements within the active subsidence of the entire LGCD area. The authors used the results of the SBAS method for the 2014–2021 period and the following models: ARIMA, SARIMA, Holt, Holt–Winters, Ridge, Bayesian Ridge, Lasso model, ElasticNet, Gradient Boosting, Decision Tree, Random Forest, Theta, Linear Regression, and prophet. They obtained the best results using the ARIMA and Holt models, with RMSE values of 13.13 and 13.14, respectively. Another study [85] concerned the prediction of slope deformation in an open-pit mine in Anjialing (China). Ground-based interferometric radar (GB-SAR) was used to collect data on slope deformation from an open-pit mine, 12 parameters from the geographical, climatic, and hydrographic aspects, and 5 algorithms: BPNN, SVM, recurrent neural network (RNN), adaptive network-based fuzzy inference system (ANFIS), and relevant vector machine (RVM). The smallest RMSE was 2.64 mm for the RVM model. In the next study [86], the research team used the STARM model to predict deformations in the Chengchao underground iron mine in China. They used data from surface deformation monitoring that were collected by the mine. The RMSE error ranged from 0.84 to 2.67 cm. In another paper [87], the authors used the Gray Model First-Order One Variable (GM (1, 1)), support vector machine regression (SVR), and gray support vector machine regression (GM-SVR) models for the Panji mining area in China. The dataset consisted of SBAS-InSAR results from the period 27 December 2016 to 20 May 2017. It turned out that the GM-SVR model could accurately predict the deformation of this research area with an RMSE of around 2 mm.
Moreover, in our approach, the influence of the individual predictors on the model was determined using three methods: MDI, MDA, and the SHAP value. Establishing the statistical significance of variables is important, as it is an integral part of the RFR results, which also characterize the model’s performance and facilitate its interpretation [35,77,88]. The choice of predictors for the machine learning process resulted from a study of the literature and from discussions with practitioners from the Rudna mine, as well as the availability of data representing the independent variables. The top ten predictors in each of the three methods are shown in Table 5. The results from these algorithms allowed for a more accurate indication of the statistically significant independent variables that affect the values of ground deformation. In all three methods, the CTE variable (duration of the exploitation) was statistically the most significant predictor. Additionally, it was observed that the following variables: SOESP (average distance between the epicenter and adjacent exploitation fields), PPE (area of the exploitation field), LEC (location of the epicenter of in the unexploited part of the field), SWZU (average value of faults’ throws in the exploitation field), and EN (shock energy), were returned as significant predictors in MDI, MDA, as well as SHAP approaches. The variables that appeared in two of the three considered approaches were the following: PZ (goaf area), SLWUS (liquidation of the excavation with the deflection of the roof), SGEP (average depth of field exploitation), and LEF (location of the epicenter at the mining front). This indicates that five of the analyzed variables should be treated as the factors statistically contributing the most to the magnitude of induced ground deformation, with an additional four also being strong predictors. The last group consisted of variables that were returned once: OGEH, CPOW, IC, and CPDW. The acronyms are explained in Appendix A.
An approach that uses such a broad scope of explanatory variables to model the value of ground deformation caused by induced shocks and to determine their impact using integration of RFR and MDI, MDA, and SHAP methods has not yet been reported in the published literature. Therefore, this research contributes significantly to the understanding of ground deformation resulting from induced tremors and the causative forces behind them. The factors influencing the magnitude and nature of ground deformation in areas of underground mining of minerals, reported in the literature, included the depth of exploitation, the shape and size of the mined part of the deposit, the height of the mine face, the speed of the mining front, the method of filling the post-mining void, the geological structure of the overlying rock mass, and the slope of the overburden [1,2]. In the case of induced seismicity, the contributing factors were the depth of the deposit, the production rate, mining geometry and geological discontinuities (dykes and faults), tectonics, the active front, and the interaction between mining and crustal states of stress on a local and regional scale [89,90]. Thus, similarities can be seen between these factors and the results presented in Table 5. The CTE, PPE, and PZ factors can be considered in the context of the advance of the front, which affects the process of deformation of the rock mass and the surface. Another important factor is EN. It is worth emphasizing here that high values of shock energy do not always cause large deformations, because it depends, among others, on the geological structure of the rock mass. In addition, the location of the epicenter on the front or inland also turned out to be important. These factors may be related to the advance of the front and the geological structure. The geological structure is a difficult parameter to present due to its high complexity, but it is significant in the context of deformation and induced seismicity. Therefore, the SWZU factor showed statistical significance in these studies. The important factors included parameters characterizing operation, i.e., SGEP and SLWUS, which also appeared in the literature as important. The application of the MDI, MDA, and SHAP value algorithms for analysis of the RFR results allowed us to obtain additional insights into the contribution of factors describing mining and geology to the deformation of the ground surface caused by sudden tremors in mining areas.
Finally, this research used proprietary data from the resources of the Rudna mine, which are not generally available to the scientific community. It was made possible thanks to cooperation with the mining entrepreneur. Therefore, our study is an example proving the need for close collaboration and support of the industry to be able to carry out complete and comprehensive research, providing significant findings for the benefit of both the academia and industry, as well as the necessity of the scientific community to continuously convince the mining operators of the need for cooperation. Although the industry data used in our study are important, as they provide explanations for the predictive modeling, the proposed methodology can be applied to other mining seismicity cases and for different datasets, providing there is awareness of the data availability limitations on the performance of the models.

6. Conclusions

Seismic events induced by mining result in ground deformation that is difficult to measure and predict due to the sudden and unpredictable nature of this phenomenon. Traditional geodetic methods are unable to provide good results with high temporal and spatial resolution. Therefore, ground deformation monitoring based on satellite radar interferometry nowadays provides means to accurately map this deformation. Machine-learning-based methods, such as the RFR algorithm, and statistical approaches for explaining their results, such as the SHAP values, provide tools for determining the impact of potential driving factors on the modeled dependent variable and its prediction. Our study contributes to the state of knowledge of mining-seismicity-induced ground deformation, with the following main results:
  • The RFR method enabled modeling the relationship between LOS displacements caused by high-energy tremors, i.e., ≥107 J, and a set of explanatory variables characterizing mining and geological conditions.
  • The best-performing final model was characterized by RMSE = 7 mm, R2 = 0.93, and most of the residual values were within the range of ±5 mm.
  • The identified statistically significant explanatory variables behind the observed LOS displacements caused by induced tremors tested independently with three methods (MDI, MDA, and SHAP) included CTE, as well as SOESP, PPE, LEC, EN, PZ, SLWUS, SGEP, SWZU, LEF, OGEH, CPOW, and CPDW.
In our research, the following original aspects can be distinguished: the use of the Random Forest method for predicting of deformation caused by induced tremors in the area of underground copper mining, with a high number of independent variables, and the application of Mean Decrease in Impurity (MDI), Mean Decrease in Accuracy (MDA), and Shapley (SHAP: SHapley Additive exPlanations) methods to analyze the machine learning results. Our findings expand the state of the knowledge in the domain of ground deformation caused by induced tremors in mining and their causative factors. The developed model describes the relationship between ground deformation and mining–geological factors, and the presented approach is universally applicable to prediction and analysis of ground deformation caused by high-energy seismic events in other underground mining areas. In addition, this research provides insights for further research that, as discussed, could lead to developing an automatic system for predicting sudden ground deformation caused by underground mining for control and management of mining operations to minimize the occurrence of induced shocks.

Author Contributions

Conceptualization, K.O. and J.B.; methodology, K.O. and J.B.; software, K.O.; validation, K.O. and J.B.; formal analysis, K.O.; investigation, K.O.; resources, K.O.; data curation, K.O.; writing—original draft preparation, K.O. and J.B.; writing—review and editing, K.O. and J.B.; visualization, K.O.; supervision, J.B. All authors have read and agreed to the published version of the manuscript.

Funding

The research has been supported by the statutory grant no 8211204604 at the Faculty of Geoengineering, Mining, and Geology, Wroclaw University of Science and Technology.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The research has been realized with the support of Rudna mine of KGHM Polish Copper Ltd.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Designations of individual variables from the training dataset.
Table A1. Designations of individual variables from the training dataset.
No.VariablesSymbolRange of Value
1.LOS displacements (mm; dependent variable).
(Displacements calculated using InSAR methods.)
PLOS−111 to −16
2.Time interval between imagery dates (days).
(The number of days between a pair of images used to calculate LOS displacements.)
IC6, 12, 18, or 24
3.Time between the reference image date and shock date (days).
(The number of days between the pre-shock reference image date and the shock date.)
CPDW0 to 17
4.Time between shock date and slave image (days).
(Number of days between the shock date and the post-shock image date.)
CPOW1 to 24
5.Energy (J).
(Registered energy of the seismic events obtained from the Mine Geophysics Station of the Rudna mine.)
EN1.10 × 107 to 3.10 × 108
6.Hypocenter depth (m b.s.l.).
(The values come from the list of mining tremors recorded by the Mine Geophysics Station of the Rudna mine.)
GH−1090 to −841
7.Annual deformations (mm).
(The ground deformation values determined on the basis of leveling measurement campaigns conducted by the mine.)
DR−250 to 100
8.Location of the epicenter in the unexploited part of the field. 1
(Occurrence of an epicenter in unexploited parts of the copper field.)
LEC0 or 1
9.Location of the epicenter—front. 1
(Occurrence of an epicenter at the mine operation front.)
LEF0 or 1
10.Average depth of field exploitation (m b.s.l.).
(The depth at which exploitation is carried out.)
SGEP−1099.4 to 895.3
11.Average thickness in the field (m).
(Variable representing the thickness of the copper field, determined from mining excavation maps.)
SMP4.8 to 9.8
12.The direction of the advance of the front NW. 1
(Direction of the NW mining front.)
PFNW0 or 1
13.The direction of the advance of the front SE. 1
(Direction of the SE mining front.)
PFSE0 or 1
14.The area of the exploitation field (ha).
(Variable representing the area of a given copper-mining field calculated from mining excavation maps.)
PPE11.8692 to 46.2168
15.Goaf area (ha).
(Variable representing the area of goaf calculated from mining excavation maps.)
PZ0 to 21.7382
16.The unexploited area of the field (ha).
(Variable representing the area of the unmined part of the copper field calculated from mining excavation maps.)
PC0 to 35.5129
17.The ratio of the exploited area to the area of the department.
(Variable representing the ratio of the exploited area in the field where the shock occurred to the area of the entire department.)
SECO0.19 to 0.69
18.Operation status in progress. 1
(Variable representing the present status of the exploitation in the field where the shock occurred.)
SET0 or 1
19.Operation status complete. 1
(Variable representing completed status of the operation in the field where the shock occurred.)
SEZ0 or 1
20.Method of liquidation of the excavation—partial dry filling. 1
(Liquidation of an excavation with partial dry backfill in the field where the shock occurred.)
SLWCPS0 or 1
21.The method of liquidation of the excavation with the deflection of the roof. 1
(Liquidation of an excavation with a deflection of the roof in the field where the shock occurred.)
SLWUS0 or 1
22.Method of liquidation of the excavation—hydraulic backfilling. 1
(Liquidation of the excavation and hydraulic backfilling in the field where the shock occurred.)
SLWPH0 or 1
23.Duration of exploitation (years).
(Variable representing the number of years of operation in the field where the shock occurred.)
CTE0 to 10
24.Distance between the depth of exploitation and the hypocenter (m).
(Variable representing the distance measured between variables #6 and #10.)
OGEH−115 to 20
25.Distance between the epicenter and the centroid of the exploitation field (m).
(Variable representing the distance measured from the epicenter to the centroid of the field where the epicenter appeared.)
OECP121 to 320
26.Distance between the epicenter and the nearest fault in the exploitation field (m).
(Variable representing the distance measured from the epicenter to the nearest fault.)
OENUP6 to 328
27.Average value of a fault’s throw in the exploitation field (m).
(Variable representing the average throw of a fault within the mining field where the shock occurred, calculated on the basis of information contained in mining excavation maps.)
SWZU2 to 8.4
28.Average distance between the epicenter and adjacent exploitation fields (m).
(Variable representing the distance measured from the epicenter to the edge of the adjacent exploitation field.)
SOESP0 to 520
29.Number of exploitation fields adjacent to the exploitation field from the epicenter.
(Variable determined on the basis of information contained in maps of mining excavations. The number of exploitation fields bordering the field in which the shock occurred.)
LPPPE0 to 6
1 Dummy variable.
Figure A1. Map of the location of high-energy tremors (energy ≥ 106 J) that occurred from January 2016 to 11 October 2020. Map based on data from the Rudna O/ZG, Rock Burst Department.
Figure A1. Map of the location of high-energy tremors (energy ≥ 106 J) that occurred from January 2016 to 11 October 2020. Map based on data from the Rudna O/ZG, Rock Burst Department.
Remotesensing 16 02742 g0a1
Figure A2. LOS displacements determined based on the DInSAR and SBAS methods.
Figure A2. LOS displacements determined based on the DInSAR and SBAS methods.
Remotesensing 16 02742 g0a2
Figure A3. The matrix of Pearson’s (r) correlation coefficients for the 20 variables of the dataset without taking into account the dummy variables.
Figure A3. The matrix of Pearson’s (r) correlation coefficients for the 20 variables of the dataset without taking into account the dummy variables.
Remotesensing 16 02742 g0a3
Figure A4. The matrix of the scatter plots of the variables (20) in the dataset without taking into account the dummy variables.
Figure A4. The matrix of the scatter plots of the variables (20) in the dataset without taking into account the dummy variables.
Remotesensing 16 02742 g0a4

References

  1. Kratzsch, H. Mining Subsidence Engineering; Springer: Berlin/Heidelberg, Germany, 1983; ISBN 978-3-642-81925-4. [Google Scholar]
  2. Whittaker, B.N.; Reddish, D.J. Subsidence: Occurrence, Prediction and Control; Elsevier: Amsterdam, The Netherlands, 1989. [Google Scholar]
  3. Sokoła-Szewioła, V. Method of Prediction the Probability of a Strong Tremor on the Basis of Observed Changes of Mining Ground Subsidences. Arch. Min. Sci. 2009, 54, 725–737. [Google Scholar]
  4. Temporim, F.A.; Gama, F.F.; Mura, J.C.; Paradella, W.R.; Silva, G.G.; Temporim, F.A.; Gama, F.F.; Mura, J.C.; Paradella, W.R.; Silva, G.G. Application of Persistent Scatterers Interferometry for Surface Displacements Monitoring in N5E Open Pit Iron Mine Using TerraSAR-X Data, in Carajás Province, Amazon Region. Braz. J. Geol. 2017, 47, 225–235. [Google Scholar] [CrossRef]
  5. Milczarek, W.; Kopeć, A.; Głąbicki, D. Estimation of Tropospheric and Ionospheric Delay in DInSAR Calculations: Case Study of Areas Showing (Natural and Induced) Seismic Activity. Remote Sens. 2019, 11, 621. [Google Scholar] [CrossRef]
  6. Mutke, G. Oddziaływanie Górniczych Wstrząsów Sejsmicznych Na Powierzchnię; Główny Instytut Górnictwa: Katowice, Poland, 2019. [Google Scholar]
  7. Gibowicz, G. Seismicity in Mines; Pageoph Topical Volumes; Birkhäuser: Basel, Switzerland, 1989; ISBN 978-3-7643-2273-1. [Google Scholar]
  8. Gibowicz, S.J. Seismicity Induced by Mining: An Overview. In Monitoring a Comprehensive Test Ban Treaty; Husebye, E.S., Dainty, A.M., Eds.; NATO ASI Series; Springer: Berlin/Heidelberg, Germany, 1996; pp. 385–409. ISBN 978-94-011-0419-7. [Google Scholar]
  9. Larsson, K. Seismicity in Mines: A Review; Luleå University of Technology, Department of Civil and Environmental Engineering Division of Rock Mechanics: Luleå, Sweden, 2004; p. 118. [Google Scholar]
  10. Verdon, J.P.; Kendall, J.-M.; Butcher, A.; Luckett, R.; Baptie, B.J. Seismicity Induced by Longwall Coal Mining at the Thoresby Colliery, Nottinghamshire, U.K. Geophys. J. Int. 2018, 212, 942–954. [Google Scholar] [CrossRef]
  11. Foulger, G.R.; Wilson, M.P.; Gluyas, J.G.; Julian, B.R.; Davies, R.J. Global Review of Human-Induced Earthquakes. Earth-Sci. Rev. 2018, 178, 438–514. [Google Scholar] [CrossRef]
  12. Tama, A.; Guzy, A.; Witkowski, W.T.; Hejmanowski, R.; Malinowska, A. Mapping Vertical Ground Movement Caused by Human-Induced Seismicity Applying Satellite Radar Interferometry and Geostatistics. In Proceedings of the ResearchGate, Vienna, Austria, 31 October–2 November 2018. [Google Scholar]
  13. Hejmanowski, R.; Malinowska, A.A.; Witkowski, W.T.; Guzy, A. An Analysis Applying InSAR of Subsidence Caused by Nearby Mining-Induced Earthquakes. Geosciences 2019, 9, 490. [Google Scholar] [CrossRef]
  14. Wang, S.; Jiang, G.; Weingarten, M.; Niu, Y. InSAR Evidence Indicates a Link Between Fluid Injection for Salt Mining and the 2019 Changning (China) Earthquake Sequence. Geophys. Res. Lett. 2020, 47, e2020GL087603. [Google Scholar] [CrossRef]
  15. GMTSAR. Available online: https://topex.ucsd.edu/gmtsar/downloads/ (accessed on 5 July 2024).
  16. Science Toolbox Exploitation Platform. Available online: https://step.esa.int/main/download/snap-download/ (accessed on 23 July 2024).
  17. Kubanek, J.; Liu, Y.; Harrington, R.M.; Samsonov, S. Observation of Surface Deformation Associated with Hydraulic Fracturing in Western Canada Using InSAR. In Proceedings of the EUSAR 2018; 12th European Conference on Synthetic Aperture Radar, Aachen, Germany, 5–7 June 2018; pp. 1–6. [Google Scholar]
  18. Milczarek, W. Application of a Small Baseline Subset Time Series Method with Atmospheric Correction in Monitoring Results of Mining Activity on Ground Surface and in Detecting Induced Seismic Events. Remote Sens. 2019, 11, 1008. [Google Scholar] [CrossRef]
  19. Malinowska, A.; Witkowski, W.T.; Guzy, A.; Hejmanowski, R. Study of Dynamic Displacement Phenomena with the Use of Imaging Radars from the Sentinel Mission. Zesz. Nauk. Inst. Gospod. Surowcami Miner. I Energią PAN 2017, 101, 229–246. [Google Scholar]
  20. Krawczyk, A.; Grzybek, R. An Evaluation of Processing InSAR Sentinel-1A/B Data for Correlation of Mining Subsidence with Mining Induced Tremors in the Upper Silesian Coal Basin (Poland). E3S Web Conf. 2018, 26, 00003. [Google Scholar] [CrossRef]
  21. Barnhart, W.D.; Yeck, W.L.; McNamara, D.E. Induced Earthquake and Liquefaction Hazards in Oklahoma, USA: Constraints from InSAR. Remote Sens. Environ. 2018, 218, 1–12. [Google Scholar] [CrossRef]
  22. Deng, F.; Dixon, T.H.; Xie, S. Surface Deformation and Induced Seismicity Due to Fluid Injection and Oil and Gas Extraction in Western Texas. J. Geophys. Res. Solid Earth 2020, 125, e2019JB018962. [Google Scholar] [CrossRef]
  23. Wang, G.; Zhu, S.; Zhang, X.; Wen, Y.; Zhu, Z.; Zhu, Q.; Xie, L.; Li, J.; Tan, Y.; Yang, T.; et al. Prediction of Mining-Induced Seismicity and Damage Assessment of Induced Surface Buildings in Thick and Hard Key Stratum Working Face: A Case Study of Liuhuanggou Coal Mine in China. Front. Earth Sci. 2023, 11, 1238055. [Google Scholar] [CrossRef]
  24. Cieślik, K.; Milczarek, W.; Warchala, E.; Kosydor, P.; Rożek, R. Identifying Factors Influencing Surface Deformations from Underground Mining Using SAR Data, Machine Learning, and the SHAP Method. Remote Sens. 2024, 16, 2428. [Google Scholar] [CrossRef]
  25. Wang, X.; Chen, S.; Xia, Y.; Niu, Y.; Gong, J.; Yang, Y. Analysis of Surface Deformation and Related Factors over Mining Areas Based on InSAR: A Case Study of Fengcheng Mine. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2024, XLVIII-1–2024, 697–712. [Google Scholar] [CrossRef]
  26. Cieślik, K.; Milczarek, W. Application of Machine Learning in Forecasting the Impact of Mining Deformation: A Case Study of Underground Copper Mines in Poland. Remote Sens. 2022, 14, 4755. [Google Scholar] [CrossRef]
  27. Kopeć, A.; Bugajska, N.; Milczarek, W.; Głąbicki, D. Long-term monitoring of the impact of the impact of mining operations on the ground surface at the regional scale based on the InSAR-SBAS technique, the Upper Silesian Coal Basin (Poland). Case study. Acta Geodyn. Et Geomater. 2022, 19, 93–110. [Google Scholar] [CrossRef]
  28. Sui, L.; Ma, F.; Chen, N. Mining Subsidence Prediction by Combining Support Vector Machine Regression and Interferometric Synthetic Aperture Radar Data. ISPRS Int. J. Geo-Inf. 2020, 9, 390. [Google Scholar] [CrossRef]
  29. Xi, N.; Mei, G.; Liu, Z.; Xu, N. Automatic Identification of Mining-Induced Subsidence Using Deep Convolutional Networks Based on Time-Series InSAR Data: A Case Study of Huodong Mining Area in Shanxi Province, China. Bull. Eng. Geol. Env. 2023, 82, 78. [Google Scholar] [CrossRef]
  30. Naghibi, S.A.; Pourghasemi, H.R.; Dixon, B. GIS-Based Groundwater Potential Mapping Using Boosted Regression Tree, Classification and Regression Tree, and Random Forest Machine Learning Models in Iran. Environ. Monit Assess 2015, 188, 44. [Google Scholar] [CrossRef]
  31. Jaiswal, J.K.; Samikannu, R. Application of Random Forest Algorithm on Feature Subset Selection and Classification and Regression. In Proceedings of the 2017 World Congress on Computing and Communication Technologies (WCCCT), Tiruchirappalli, India, 2–4 February 2017; pp. 65–68. [Google Scholar]
  32. Lagomarsino, D.; Tofani, V.; Segoni, S.; Catani, F.; Casagli, N. A Tool for Classification and Regression Using Random Forest Methodology: Applications to Landslide Susceptibility Mapping and Soil Thickness Modeling. Environ. Model Assess 2017, 22, 201–214. [Google Scholar] [CrossRef]
  33. Izquierdo-Verdiguier, E.; Zurita-Milla, R. An Evaluation of Guided Regularized Random Forest for Classification and Regression Tasks in Remote Sensing. Int. J. Appl. Earth Obs. Geoinf. 2020, 88, 102051. [Google Scholar] [CrossRef]
  34. Belmokre, A.; Mihoubi, M.K.; Santillán, D. Analysis of Dam Behavior by Statistical Models: Application of the Random Forest Approach. KSCE J. Civ. Eng. 2019, 23, 4800–4811. [Google Scholar] [CrossRef]
  35. Dai, B.; Gu, C.; Zhao, E.; Qin, X. Statistical Model Optimized Random Forest Regression Model for Concrete Dam Deformation Monitoring. Struct. Control Health Monit. 2018, 25, e2170. [Google Scholar] [CrossRef]
  36. Li, X.; Su, H.; Hu, J. The Prediction Model of Dam Uplift Pressure Based on Random Forest. IOP Conf. Ser. Mater. Sci. Eng. 2017, 229, 012025. [Google Scholar] [CrossRef]
  37. Guo, Z.; Huang, H. Application of RS-RF Model in Deformation Prediction of Concrete Dam. IOP Conf. Ser. Earth Environ. Sci. 2020, 474, 072003. [Google Scholar] [CrossRef]
  38. Li, X.; Wen, Z.; Su, H. An Approach Using Random Forest Intelligent Algorithm to Construct a Monitoring Model for Dam Safety. Eng. Comput. 2021, 37, 39–56. [Google Scholar] [CrossRef]
  39. Su, Y.; Weng, K.; Lin, C.; Zheng, Z. An Improved Random Forest Model for the Prediction of Dam Displacement. IEEE Access 2021, 9, 9142–9153. [Google Scholar] [CrossRef]
  40. Chen, W.; Xie, X.; Wang, J.; Pradhan, B.; Hong, H.; Bui, D.T.; Duan, Z.; Ma, J. A Comparative Study of Logistic Model Tree, Random Forest, and Classification and Regression Tree Models for Spatial Prediction of Landslide Susceptibility. Catena 2017, 151, 147–160. [Google Scholar] [CrossRef]
  41. Dou, J.; Yunus, A.P.; Tien Bui, D.; Merghadi, A.; Sahana, M.; Zhu, Z.; Chen, C.-W.; Khosravi, K.; Yang, Y.; Pham, B.T. Assessment of Advanced Random Forest and Decision Tree Algorithms for Modeling Rainfall-Induced Landslide Susceptibility in the Izu-Oshima Volcanic Island, Japan. Sci. Total Environ. 2019, 662, 332–346. [Google Scholar] [CrossRef]
  42. Sun, D.; Wen, H.; Wang, D.; Xu, J. A Random Forest Model of Landslide Susceptibility Mapping Based on Hyperparameter Optimization Using Bayes Algorithm. Geomorphology 2020, 362, 107201. [Google Scholar] [CrossRef]
  43. Rahmati, O.; Kornejady, A.; Deo, R.C. Spatial Prediction of Landslide Susceptibility Using Random Forest Algorithm. In Intelligent Data Analytics for Decision-Support Systems in Hazard Mitigation: Theory and Practice of Hazard Mitigation; Deo, R.C., Samui, P., Kisi, O., Yaseen, Z.M., Eds.; Springer Transactions in Civil and Environmental Engineering; Springer: Singapore, 2021; pp. 281–292. ISBN 9789811557729. [Google Scholar]
  44. Hu, X.; Wu, S.; Zhang, G.; Zheng, W.; Liu, C.; He, C.; Liu, Z.; Guo, X.; Zhang, H. Landslide Displacement Prediction Using Kinematics-Based Random Forests Method: A Case Study in Jinping Reservoir Area, China. Eng. Geol. 2021, 283, 105975. [Google Scholar] [CrossRef]
  45. Ilia, I.; Loupasakis, C.; Tsangaratos, P. Land Subsidence Phenomena Investigated by Spatiotemporal Analysis of Groundwater Resources, Remote Sensing Techniques, and Random Forest Method: The Case of Western Thessaly, Greece. Environ. Monit Assess 2018, 190, 623. [Google Scholar] [CrossRef]
  46. Chen, Y.; Tong, Y.; Tan, K. Coal Mining Deformation Monitoring Using SBAS-InSAR and Offset Tracking: A Case Study of Yu County, China. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 6077–6087. [Google Scholar] [CrossRef]
  47. Kohestani, V.R.; Bazarganlari, M.R.; Asgari Marnani, J. Prediction of Maximum Surface Settlement Caused by Earth Pressure Balance Shield Tunneling Using Random Forest. J. AI Data Min. 2017, 5, 127–135. [Google Scholar] [CrossRef]
  48. Limbeck, J.; Bisdom, K.; Lanz, F.; Park, T.; Barbaro, E.; Bourne, S.; Kiraly, F.; Bierman, S.; Harris, C.; Nevenzeel, K.; et al. Using Machine Learning for Model Benchmarking and Forecasting of Depletion-Induced Seismicity in the Groningen Gas Field. Comput Geosci 2021, 25, 529–551. [Google Scholar] [CrossRef]
  49. Rouet-Leduc, B.; Hulbert, C.; Lubbers, N.; Barros, K.; Humphreys, C.J.; Johnson, P.A. Machine Learning Predicts Laboratory Earthquakes. Geophys. Res. Lett. 2017, 44, 9276–9282. [Google Scholar] [CrossRef]
  50. Amini, A. Investigation of Induced Seismicity Mechanisms and Magnitude Distributions under Different Stress Regimes, Geomechanical Factors, and Fluid Injection Parameters. Ph.D. Dissertation, University of British Columbia, Kelowna, BC, Canada, 2020. [Google Scholar]
  51. Miao, T.Y.; Wang, M. Susceptibility Analysis of Earthquake-Induced Landslide Using Random Forest Method; Atlantis Press: Amsterdam, The Netherlands, 2015; pp. 771–775. [Google Scholar]
  52. Zhou, Y.; Li, S.; Zhou, C.; Luo, H. Intelligent Approach Based on Random Forest for Safety Risk Prediction of Deep Foundation Pit in Subway Stations. J. Comput. Civ. Eng. 2019, 33, 05018004. [Google Scholar] [CrossRef]
  53. Zhang, W.; Zhang, R.; Wu, C.; Goh, A.T.C.; Wang, L. Assessment of Basal Heave Stability for Braced Excavations in Anisotropic Clay Using Extreme Gradient Boosting and Random Forest Regression. Undergr. Space 2020, 7, 233–241. [Google Scholar] [CrossRef]
  54. Furuya, M. SAR Interferometry. In Encyclopedia of Solid Earth Geophysics; Gupta, H.K., Ed.; Springer: Dordrecht, The Netherlands, 2011; pp. 1041–1049. ISBN 978-90-481-8702-7. [Google Scholar]
  55. Bürgmann, R.; Rosen, P.A.; Fielding, E.J. Synthetic Aperture Radar Interferometry to Measure Earth’s Surface Topography and Its Deformation. Annu. Rev. Earth Planet. Sci. 2000, 28, 169–209. [Google Scholar] [CrossRef]
  56. Dąbski, J.; Dunaj, A.; Markiewicz, M.; Mikoda, A.; Paździor, J.; Rydzewski, A.; Siewierski, S. Historia Rozwoju KGHM Polska Miedź S.A. In MONOGRAFIA KGHM Polska Miedź S.A.; KGHM CUPRUM Sp. z o.o. CBR; Lubin: Wrocław, Poland, 2007. [Google Scholar]
  57. Butra, J. Eksploatacja Złoża rud Miedzi w Warunkach Zagrożenia Tąpaniami i Zawałami; Cuprum Centrum Badawczo-Rozwojowe: Wrocław, Poland, 2010; ISBN 978-83-929275-8-7. [Google Scholar]
  58. Gabriel, A.K.; Goldstein, R.M.; Zebker, H.A. Mapping Small Elevation Changes over Large Areas: Differential Radar Interferometry. J. Geophys. Res. Solid Earth 1989, 94, 9183–9191. [Google Scholar] [CrossRef]
  59. Hartl, P.; Thiel, K.-H. Fields of Experiments in ERS-1 SAR Interferometry in Bonn and Naples; Proc. Of Symposium “From Optics to Radar: SPOT and ERS Applications”, Cépaduès- Èditions: Paris, France, 1993. [Google Scholar]
  60. Huang, J.; Xie, M.; Farooq, A.; Williams, E.J. DInSAR Technique for Slow-Moving Landslide Monitoring Based on Slope Units. Surv. Rev. 2019, 51, 70–77. [Google Scholar] [CrossRef]
  61. Govorčin, M.; Herak, M.; Matoš, B.; Pribičević, B.; Vlahović, I. Constraints on Complex Faulting during the 1996 Ston–Slano (Croatia) Earthquake Inferred from the DInSAR, Seismological, and Geological Observations. Remote Sens. 2020, 12, 1157. [Google Scholar] [CrossRef]
  62. Wajs, J.; Milczarek, W.J. Detection of Surface Subsidence Using SAR SENTINEL 1A Imagery and the DInSAR Method—A Case Study of the Belchatow Open Pit Mine, Central Poland. EDP Sci. 2018, 55, 00004. [Google Scholar] [CrossRef]
  63. Novellis, V.D.; Atzori, S.; Luca, C.D.; Manzo, M.; Valerio, E.; Bonano, M.; Cardaci, C.; Castaldo, R.; Bucci, D.D.; Manunta, M.; et al. DInSAR Analysis and Analytical Modeling of Mount Etna Displacements: The December 2018 Volcano-Tectonic Crisis. Geophys. Res. Lett. 2019, 46, 5817–5827. [Google Scholar] [CrossRef]
  64. Nela, B.R.; Bandyopadhyay, D.; Singh, G.; Glazovsky, A.F.; Lavrentiev, I.I.; Kromova, T.E.; Arigony-Neto, J. Glacier Flow Dynamics of the Severnaya Zemlya Archipelago in Russian High Arctic Using the Differential SAR Interferometry (DInSAR) Technique. Water 2019, 11, 2466. [Google Scholar] [CrossRef]
  65. Ferretti, A.; Savio, G.; Barzaghi, R.; Borghi, A.; Musazzi, S.; Novali, F.; Prati, C.; Rocca, F. Submillimeter Accuracy of InSAR Time Series: Experimental Validation. IEEE Trans. Geosci. Remote Sens. 2007, 45, 1142–1153. [Google Scholar] [CrossRef]
  66. Berardino, P.; Fornaro, G.; Lanari, R.; Sansosti, E. A New Algorithm for Surface Deformation Monitoring Based on Small Baseline Differential SAR Interferograms. IEEE Trans. Geosci. Remote Sens. 2002, 40, 2375–2383. [Google Scholar] [CrossRef]
  67. Yastika, P.E.; Shimizu, N.; Verbovšek, T. A Case Study on Landslide Displacement Monitoring by SBAS DInSAR in the Vipava River Valley, Slovenia. In Proceedings of the OnePetro, The 5th ISRM Young Scholars’ Symposium on Rock Mechanics and International Symposium on Rock Engineering for Innovative Future, Okinawa, Japan, 1–4 December 2019. [Google Scholar]
  68. Huang, J.; Khan, S.D.; Ghulam, A.; Crupa, W.; Abir, I.A.; Khan, A.S.; Kakar, D.M.; Kasi, A.; Kakar, N. Study of Subsidence and Earthquake Swarms in the Western Pakistan. Remote Sens. 2016, 8, 956. [Google Scholar] [CrossRef]
  69. Gama, F.F.; Cantone, A.; Mura, J.C.; Pasquali, P.; Paradella, W.R.; dos Santos, A.R.; Silva, G.G. Monitoring Subsidence of Open Pit Iron Mines at Carajás Province Based on SBAS Interferometric Technique Using TerraSAR-X Data. Remote Sens. Appl. 2017, 8, 199–211. [Google Scholar] [CrossRef]
  70. Grzesiak, K.; Milczarek, W.J. LOS Displacements of Mauna Loa Volcano, Hawaii Island, as Determined Using SBAS-InSAR. E3S Web Conf. 2018, 55, 00006. [Google Scholar] [CrossRef]
  71. Brencher, G.; Handwerger, A.L.; Munroe, J.S. InSAR-Based Characterization of Rock Glacier Movement in the Uinta Mountains, Utah, USA. Cryosphere 2021, 15, 4823–4844. [Google Scholar] [CrossRef]
  72. Breiman, L. Out-of-Bag Estimation. Statistics Department. University of California: Berkeley, CA, USA, 1996. [Google Scholar]
  73. Breiman, L. Bagging Predictors. Mach. Learn 1996, 24, 123–140. [Google Scholar] [CrossRef]
  74. Breiman, L. Random Forests. Mach. Learn 2001, 45, 5–32. [Google Scholar] [CrossRef]
  75. Han, S.; Kim, H. On the Optimal Size of Candidate Feature Set in Random Forest. Appl. Sci. 2019, 9, 898. [Google Scholar] [CrossRef]
  76. Raschka, S.; Mirjalili, V. Python Uczenie Maszynowe, 2nd ed.; Helion SA: Boston, MA, USA, 2019. [Google Scholar]
  77. Zhao, X.; Yu, B.; Liu, Y.; Chen, Z.; Li, Q.; Wang, C.; Wu, J. Estimation of Poverty Using Random Forest Regression with Multi-Source Data: A Case Study in Bangladesh. Remote Sens. 2019, 11, 375. [Google Scholar] [CrossRef]
  78. Kohavi, R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Ijcai 1995, 14, 1137–1145. [Google Scholar]
  79. Breiman, L. Manual on Setting Up, Using, And Understanding Random Forests V3.1.; UC Berkeley, Department of Statistics: Berkeley, CA, USA, 2002. [Google Scholar]
  80. Lundberg, S.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  81. Rawlings, J.O.; Pantula, S.G.; Dickey, D.A. Applied Regression Analysis: A Research Tool; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2001; ISBN 978-0-387-98454-4. [Google Scholar]
  82. Li, Y.; Zou, C.; Berecibar, M.; Nanini-Maury, E.; Chan, J.C.W.; van den Bossche, P.; Van Mierlo, J.; Omar, N. Random Forest Regression for Online Capacity Estimation of Lithium-Ion Batteries. Appl. Energy 2018, 232, 197–210. [Google Scholar] [CrossRef]
  83. Kumar, V.; Venkataraman, G. SAR Interferometric Coherence Analysis for Snow Cover Mapping in the Western Himalayan Region. Int. J. Digit. Earth 2011, 4, 78–90. [Google Scholar] [CrossRef]
  84. Rahmati, O.; Falah, F.; Naghibi, S.A.; Biggs, T.; Soltani, M.; Deo, R.C.; Cerdà, A.; Mohammadi, F.; Tien Bui, D. Land Subsidence Modelling Using Tree-Based Machine Learning Algorithms. Sci. Total Environ. 2019, 672, 239–252. [Google Scholar] [CrossRef]
  85. Du, S.; Feng, G.; Wang, J.; Feng, S.; Malekian, R.; Li, Z. A New Machine-Learning Prediction Model for Slope Deformation of an Open-Pit Mine: An Evaluation of Field Data. Energies 2019, 12, 1288. [Google Scholar] [CrossRef]
  86. Ren, M.; Cheng, G.; Zhu, W.; Nie, W.; Guan, K.; Yang, T. A Prediction Model for Surface Deformation Caused by Underground Mining Based on Spatio-Temporal Associations. Geomat. Nat. Hazards Risk 2022, 13, 94–122. [Google Scholar] [CrossRef]
  87. Li, J.; Gao, F.; Lu, J.; Tao, T. Deformation Monitoring and Prediction for Residential Areas in the Panji Mining Area Based on an InSAR Time Series Analysis and the GM-SVR Model. Open Geosci. 2019, 11, 58. [Google Scholar] [CrossRef]
  88. Iannace, G.; Ciaburro, G.; Trematerra, A. Wind Turbine Noise Prediction Using Random Forest Regression. Machines 2019, 7, 69. [Google Scholar] [CrossRef]
  89. Gibowicz, S.J.; Lasocki, S. Seismicity Induced by Mining: Ten Years Later. In Advances in Geophysics; Dmowska, R., Saltzman, B., Eds.; Elsevier: Amsterdam, The Netherlands, 2001; Volume 44, pp. 39–181. [Google Scholar]
  90. Guha, S.K. Mining Induced Seismicity. In Induced Earthquakes; Guha, S.K., Ed.; Springer: Dordrecht, The Netherlands, 2000; pp. 159–215. ISBN 978-94-015-9452-3. [Google Scholar]
Figure 1. (A) Location of the area of interest in Poland. (B) Rudna mining area of the Legnica-Głogów Copper District. (C) Number of tremors registered in the Rudna mining area in 2016–2020.
Figure 1. (A) Location of the area of interest in Poland. (B) Rudna mining area of the Legnica-Głogów Copper District. (C) Number of tremors registered in the Rudna mining area in 2016–2020.
Remotesensing 16 02742 g001
Figure 2. Research methodology.
Figure 2. Research methodology.
Remotesensing 16 02742 g002
Figure 3. Residuals for the observed values in the LOS displacement model, the training, and the test datasets with the outliers.
Figure 3. Residuals for the observed values in the LOS displacement model, the training, and the test datasets with the outliers.
Remotesensing 16 02742 g003
Figure 4. Residuals for the observed values in the LOS displacement model, the training, and the test datasets without the two outliers.
Figure 4. Residuals for the observed values in the LOS displacement model, the training, and the test datasets without the two outliers.
Remotesensing 16 02742 g004
Figure 5. Residuals for the predicted values in the LOS displacement model, training, and test datasets without the two outliers.
Figure 5. Residuals for the predicted values in the LOS displacement model, training, and test datasets without the two outliers.
Remotesensing 16 02742 g005
Figure 6. Residual values for the LOS displacement model, training, and test datasets without the two outliers.
Figure 6. Residual values for the LOS displacement model, training, and test datasets without the two outliers.
Remotesensing 16 02742 g006
Figure 7. Statistical significance of independent variables for the LOS displacement model based on the MDI approach. The higher the value, the more important the variable (for explanations of the variables’ acronyms, please refer to Table A1 in Appendix A).
Figure 7. Statistical significance of independent variables for the LOS displacement model based on the MDI approach. The higher the value, the more important the variable (for explanations of the variables’ acronyms, please refer to Table A1 in Appendix A).
Remotesensing 16 02742 g007
Figure 8. Importance of the independent variables for the RFR model based on the MDA approach implemented in the eli5 library (for explanations of the variables’ acronyms, please refer to Table A1 in Appendix A).
Figure 8. Importance of the independent variables for the RFR model based on the MDA approach implemented in the eli5 library (for explanations of the variables’ acronyms, please refer to Table A1 in Appendix A).
Remotesensing 16 02742 g008
Figure 9. Global variable importance graph on the basis of absolute mean SHAP values (for explanations of the variables’ acronyms, please refer to Table A1 in Appendix A).
Figure 9. Global variable importance graph on the basis of absolute mean SHAP values (for explanations of the variables’ acronyms, please refer to Table A1 in Appendix A).
Remotesensing 16 02742 g009
Table 2. Hyperparameters used in the grid search method and their corresponding ranges of values.
Table 2. Hyperparameters used in the grid search method and their corresponding ranges of values.
No.Parameter NameParameter Values
1.n_estimators 1{10, 20, 30, 40, 50, 60, 70, 80, 90, 100}
2.max_depth 2{4, 6, 8, 10}
3.min_samples_split 3{2, 4, 6, 8}
4.min_samples_leaf 4{1, 3, 5, 7}
5.max_features 5{4, 5, 6, 28}
1 number of trees in the forest. 2 maximum depth of the trees. 3 minimum number of samples required to split an internal node. 4 minimum number of samples required to be at a leaf node. 5 number of features to consider when looking for the best split.
Table 3. Optimal values of the hyperparameters determined on the basis of the grid search method.
Table 3. Optimal values of the hyperparameters determined on the basis of the grid search method.
No.Parameter NameOptimal Values
1.n_estimators10
2.max_depth6
3.min_samples_split2
4.min_samples_leaf1
5.max_features5
1 number of trees in the forest. 2 maximum depth of the trees. 3 minimum number of samples required to split an internal node. 4 minimum number of samples required to be at a leaf node. 5 number of features to consider when looking for the best split.
Table 4. Error metrics determining the accuracy of both models.
Table 4. Error metrics determining the accuracy of both models.
ErrorModel with OutliersModel without Outliers
Training
Dataset
Test
Dataset
Training
Dataset
Test
Dataset
MSE29 mm2112 mm220 mm248 mm2
RMSE5 mm11 mm5 mm7 mm
MAE4 mm7 mm4 mm6 mm
R20.950.870.970.93
ME27 mm36 mm13 mm17 mm
MAPE7.4%10.7%7.8%12.0%
OOB18%13%
Table 5. Summary of the statistical significance of the explanatory variables according to the methods MDI, MDA, and SHAP, in the order of decreasing influence on the model (for explanations of the variables’ acronyms, please refer to Table A1 in Appendix A).
Table 5. Summary of the statistical significance of the explanatory variables according to the methods MDI, MDA, and SHAP, in the order of decreasing influence on the model (for explanations of the variables’ acronyms, please refer to Table A1 in Appendix A).
No.Statistical Significance of Independent Variables
MDI MethodMDA MethodSHAP Method
1.CTECTECTE
2.SOESPPPEEN
3.PZSLWUSSWZU
4.PPESOESPSOESP
5.SLWUSPZLEC
6.LECENCPOW
7.ENSWZUSGEP
8.SGEPOGEHLEF
9.SWZULECIC
10.LEFCPDWPPE
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Owczarz, K.; Blachowski, J. Random Forest—Based Identification of Factors Influencing Ground Deformation Due to Mining Seismicity. Remote Sens. 2024, 16, 2742. https://doi.org/10.3390/rs16152742

AMA Style

Owczarz K, Blachowski J. Random Forest—Based Identification of Factors Influencing Ground Deformation Due to Mining Seismicity. Remote Sensing. 2024; 16(15):2742. https://doi.org/10.3390/rs16152742

Chicago/Turabian Style

Owczarz, Karolina, and Jan Blachowski. 2024. "Random Forest—Based Identification of Factors Influencing Ground Deformation Due to Mining Seismicity" Remote Sensing 16, no. 15: 2742. https://doi.org/10.3390/rs16152742

APA Style

Owczarz, K., & Blachowski, J. (2024). Random Forest—Based Identification of Factors Influencing Ground Deformation Due to Mining Seismicity. Remote Sensing, 16(15), 2742. https://doi.org/10.3390/rs16152742

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop