Analysis of the Contribution Rate of the Inﬂuencing Factors to Land Subsidence in the Eastern Beijing Plain, China Based on Extremely Randomized Trees (ERT) Method

: As a common geological hazard, land subsidence is widely distributed in the Eastern Beijing Plain. The pattern of evolution of this geological phenomenon is controlled by many factors, including groundwater level change in di ﬀ erent aquifers, compressible layers of di ﬀ erent thicknesses, and static and dynamic loads. First, based on the small baseline subset Interferometric Synthetic Aperture Radar (SBAS-InSAR) technique, we employed 47 ENVISAT ASAR images and 48 RADARSAT-2 images to acquire the ground deformation of the Beijing Plain from June 2003 to November 2015 and then validated the results using leveling benchmark monitoring. Second, we innovatively calculated additional stress to obtain static and dynamic load information. Finally, we evaluated the contribution rate of the inﬂuencing factors to land subsidence by using the Spearman’s rank correlation coe ﬃ cient (SRCC) and extremely randomized trees (ERT) machine learning methods. The SBAS-InSAR outcomes revealed that the maximum deformation rate was 110.7 mm / year from 2003 to 2010 and 144.4 mm / year from 2010 to 2015. The SBAS-InSAR results agreed well with the leveling benchmark monitoring results; the correlation coe ﬃ cients were 0.97 and 0.96 during the 2003–2010 and 2013–2015 periods, respectively. The contribution rate of the second conﬁned aquifer to the cumulative land subsidence was 49.3% from 2003 to 2010, accounting for the largest proportion; however, its contribution rate decreased to 23.4% from 2010 to 2015. The contribution rate of the third conﬁned aquifer to the cumulative land subsidence increased from 2003 to 2015. Although the contribution of additional stress engendered from static and dynamic loads to the cumulative land subsidence was slight, it had a signiﬁcant e ﬀ ect on the uneven land subsidence, with a contribution rate of 33.8% from 2003 to 2010 and 23.1% from 2010 to 2015. These ﬁndings provide scientiﬁc support for mitigating hazards associated with land subsidence.


Introduction
With the reduction in land surface elevation, land subsidence is mainly caused by the consolidation and compression of loose stratum [1][2][3]. When serious land subsidence occurs, especially for the spatial correlation between cumulative land subsidence, uneven subsidence, and their influencing factors. Finally, we summarize the main conclusions in Section 6.

Study Area
Covering an area of more than 16,000 km 2 , the city of Beijing is located in the northwest fringe of the North China Plain. The BP lies in the southeast of Beijing and accounts for approximately 38% of the total area of Beijing ( Figure 1). The BP has an annual average precipitation of 602 mm (measured from 1949 to 2016), of which more than 70% is concentrated from June to August, causing an extremely uneven distribution of precipitation across the year [12]. With the expansion of the city and the rapid growth of the urban residential populations, water consumption has increased dramatically [6]. With a decrease in the average annual precipitation of 23%, the BP suffered a continuous drought from 2000 to 2010. A large amount of groundwater was extracted to meet the needs of urban population survival and economic development, with pumping rates of approximately 2.5 × 10 9 m 3 /year, which greatly exceeded the natural recharge capacity [6]. The overexploitation of groundwater resulted in an obvious recession in the level of said groundwater in the BP from 2003 to 2014, as well as in an increase in effective stress, which led to the consolidation of aquifer systems, thus causing serious land subsidence [3,16]. Quaternary loose sediments are widely distributed in the BP, which provide basic geological conditions for land subsidence, although the thickness of compressible layers varies greatly [21]. As shown in Figure 1c, the thickness of compressible layers gradually increases from the top to the bottom of the alluvial plain [22]. At the top of the alluvial plain, the lithology comprises sand and pebbles, which have poor compressibility and rich permeability. Meanwhile, at bottom of the alluvial plain, the lithology comprises fine and silty sands that have a high water content and strong compressibility [17]. According to the characteristics of Quaternary sediments, the aquifer system in the BP can be divided into four parts in the vertical direction: a phreatic aquifer (a depth of approximately 25 m), a first confined aquifer (a depth of 80-100 m), a second confined aquifer (a depth of 100-180 m), and a third confined aquifer (a depth of 200-300 m) [23].
As a surface response to groundwater exploitation, the urbanization process, and the utilization of underground space, land subsidence in the BP has experienced a stage of rapid development since the 1970s [22]. Research shows that five subsidence funnels with a deformation rate of over 50 mm/year have been formed in the BP [24]. The Laiguangying-Dongbalizhuang-Dajiaoting (LDD) land subsidence funnel, located in the eastern section of the Chaoyang District and the northwestern sections of the Tongzhou District, was selected as the study area in this paper.

Input Data
In this paper, surface deformation information was obtained based on ENVISAT ASAR (EA) and RADARSAT-2 (R2) data. Additional stress engendered from static and dynamic loads were calculated by using the classical formula proposed by Boussinesq [25]. The SAR datasets used were as follows.
One set included 47 descending track EA images collected during the period from June 2003 to August 2010. The European Space Agency successfully launched the ENVISAT ASAR radar satellite sensor on March 2002. The images were acquired in the VV single polarization mode. The EA satellite has a revisiting time of 35 days and an incidence angle of 22.9 • , and it operates in the 5.6 cm C-band. The other set included 48 descending track R2 images obtained from November 2010 to November 2015. The Canadian Space Agency successfully launched the RADARSAT-2 radar satellite sensor in December 2007. The R2 satellite has a revisiting time of 24 days and an incidence angle of 27.6 • , and it also operates in the 5.6 cm C-band. The spatial resolutions of these datasets were both 30 m; the detailed parameters for EA images can be found in Supplementary Table S1 and for the R2  images in Supplementary Table S2. To obliterate the topographic phase in interferograms, the digital elevation model (DEM) with a 90 m resolution supplied by the Shuttle Radar Topography Mission (more information is available at http://dds.cr.usgs.gov/srtm/) was utilized.
Thirty-seven leveling benchmarks-17 and 20 leveling benchmarks measuring precise ground deformation data during the 2003-2010 and 2013-2015 periods, respectively-were used to validate the reliability of the SBAS-InSAR results. Their locations are shown in Figure 1a. The contours of the groundwater level in different aquifers in 2007 and 2014 were used to extract the value of said groundwater level in the study area by using the inverse distance weighted (IDW) interpolation tool provided by ArcGIS. The groundwater level represents the amount of groundwater extracted; in general, the lower the groundwater level, the greater the exploitation of groundwater. The groundwater levels in different aquifers were used to analyze the influence of excessive extraction of groundwater on land subsidence. The groundwater level in different aquifers after 2015 is not available because the Beijing Institute of Hydrogeology and Engineering Geology has not published the relevant data. Quaternary sediments are widely distributed in the study area, and their thicknesses vary greatly; these data were used to analyze the impact of compressible layers on land subsidence. We calculated the additional stress engendered from building loads to represent static load information and the additional stress engendered from subway lines, expressways, and urban roads to represent the dynamic load information. Figure 2 shows the flowchart of this research. First, we adopted the SBAS-InSAR method for the EA and R2 images to obtain land subsidence information and to validate the SBAS-InSAR results using the leveling benchmark monitoring results. Second, the additional stress engendered from the static and dynamic loads was calculated. Third, the contribution rate of the influencing factors to cumulative and uneven land subsidence was analyzed by Spearman's rank correlation coefficient (SRCC) and extremely randomized trees (ERT) machine learning methods.

SBAS-InSAR Processing for the EA and R2 Datasests
The SBAS-InSAR method has been widely used for deriving slow ground deformation [15,26]. To maximize the correlations between interferograms, the SBAS-InSAR method was used to obtain the minimum value of the separation in the Doppler frequency and in the time intervals of acquisition pairs. The SBAS-InSAR method was also used in the GAMMA software to obtain ground deformation. First, the SAR images were registered to the same master image within each stack. Then, 267 interferograms were generated for the EA data when the temporal and perpendicular baseline thresholds were set as 500 days and 500 m, respectively. For the R2 data, 163 interferograms were created when the temporal and perpendicular baseline thresholds were set as 500 days and 500 m, respectively. Figure 3 shows the network of the spatial and temporal baselines for the SBAS interferograms. The temporal coherence threshold was set as 0.7, which has been widely applied in InSAR applications. The least-squares method was used in the phase unwrapping, and the threshold was set as half of the satellite wavelength. The spatial filters were used to eliminate the phase noise. Finally, the ground deformation information along the line-of-sight (LOS) direction was obtained. Considering that the horizontal deformation in the BP can be neglected [21], the land deformation velocity was estimated by using Equation (1).
where V and V los represent the deformation rate in the vertical (i.e., land subsidence) and LOS directions, respectively, and ϕ is the incidence angle of the radar satellite sensor.

Additional Stress Engendered from Static and Dynamic Loads
Based on the building information provided by the BIGMAP software, we learned that on average, buildings in Beijing weigh 1300 kg/m 2 and bear 200 kg/m 2 . Furthermore, the soil weighs 2000-2200 kg/m 2 in the range of 0-100 m underground, and the depth of the foundations of buildings should not be less than 1/15 the building's height when buildings are greater than seven stories. The weight of the building was calculated by multiplying the area of the building by 1500 kg and then subtracting the weight of the soil replaced by the foundation. To simplify the calculation process, the dynamic load of subways and expressways was considered as the equivalent static load. According to the data provided by the Beijing Mass Transit Railway Operation Corp (Table 1), the equivalent static load of each metro line crossing the study area was calculated using Equation (2) [27].
where P E and V E represent the load and operating speed of metro vehicles, respectively, and P ES is the equivalent static load of metro vehicles. According to the information provided by the Beijing Traffic Management Bureau (Table 2), the equivalent static load of different cars, buses, and trucks was calculated using Equation (2). Based on the proportion, the average equivalent static load of expressways was calculated using Equation (3).
where n is the number of types of vehicle, M i and P i represent the proportion and equivalent static load of cars, buses and trucks, respectively, and P VS is the average equivalent static load of expressways. We generated a series of grid points with an interval of 10 m along the longitude, latitude, and vertical dimensions in World Geodetic System 1984 (WGS84). The additional stress of each point in the grid was calculated by summing the additional stress engendered from the static and dynamic loads by using Equation (4). Table 3 shows the meaning of each parameter in Equation (4). Additional stress and stress gradient maps can be obtained by using the IDW interpolation tool.

Spearman's Rank Correlation Coefficient
SRCC, proposed by Spearman and widely used in the field of statistics, is a statistical index that reflects the correlation between two groups of variables. The value of SRCC ranges from −1 to 1, and the larger the value, the stronger the correlation. Compared to Pearson's product motion correlation coefficient (PMMCC), it overcomes the disadvantage of variables following normal distribution. In this study, we used SRCC to quantitatively measure the correlations among the ground water level, the thickness of compressible layer, the additional stress engendered from static and dynamic loads, and the land subsidence. SRCC can be calculated using Equation (5).
where n represent the number of variables in the first group and d i is the difference of the rank of the corresponding variable in the second group.

Extremely Randomized Trees
As an extension of the random forest method, the extremely randomized trees (ERT) method is an efficient supervised machine learning algorithm that is widely used in regression and classification problems. The ERT method can obtain more reliable results than the random forest method by using all of the sample sets in the training process. Based on decision tree learning, ERT is a more advanced algorithm that contains many decision trees. As an ensemble machine learning method, the ERT method has the following advantages. (1) This method is unexcelled in accuracy among current algorithms, because its results are determined by voting of the results of the many decision trees included in ERT. (2) ERT can handle thousands of input variables with multi-dimensional characteristics without variable deletion. (3) ERT can evaluate the importance of variables in the process of algorithm implementation. (4) This method is very efficient, especially for large datasets.
In this paper, the ERT algorithm was implemented by using the sklearn.ensemble library in Python. We used the ExtraTreesRegressor regression model in the sklearn.ensemble library to quantitatively measure the contribution rate of changes in the level of groundwater in different aquifers, the thickness of compressible layers, and additional stress engendered from static and dynamic loads to cumulative and uneven land subsidence. The results of the ERT algorithm and the SRCC method were cross-validated.

Surface Deformation Measured by SBAS-InSAR in Two Time Periods
Based on the surface displacement information measured by SBAS-InSAR, a land subsidence rate distribution map during the observation period was obtained for the BP. Figure 4 shows that land subsidence is widely distributed in the BP, and the spatial discrepancy is huge. The areas with serious land subsidence are mainly distributed in the middle and east of the Chaoyang District, the northwest of the Tongzhou District, the middle and south of the Changping District, the west of the Shunyi District, and the south of the Daxing District. From June 2003 to August 2010, 19,090 pixels were detected in the study area based on the EA data, with a density of 181 pixel/km 2 , and the average land subsidence rate varied from −110.7 to +8.3 mm/year. For the R2 data, we detected 17,605 pixels in the study area from November 2010 to November 2015, with a density of 167 pixel/km 2 , and the average land subsidence rate varied from −144.4 to +10.5 mm/year.  Figure 5 shows that the discrepancy of the land subsidence is obvious from 2003 to 2015 in the LDD land subsidence funnel. To further study the temporal and spatial evolution patterns of the land subsidence, we obtained the maximum land subsidence rate and area with a land subsidence rate greater than 100 mm/year. Figure 6 shows that land subsidence experienced a rapid expansion period from 2004 to 2007. The maximum land subsidence rate and area with a land subsidence rate greater than 100 mm/year increased from 111.8 to 143.5 mm/year and from 0.1 to 24.9 km 2 , respectively. From 2007 to 2008, land subsidence showed a decreasing trend, in which the maximum land subsidence rate and area with a land subsidence rate greater than 100 mm/year decreased from 143.5 to 126.4 mm/year and from 24.9 to 8.2 km 2 , respectively. Land subsidence developed rapidly from 2008 to 2010, with the maximum land subsidence rate and area with a land subsidence rate greater than 100 mm/year increasing to 157.6 mm/year and 68.9 km 2 , respectively. From 2010 to 2011, the land subsidence showed a decreasing trend, in which the maximum land subsidence rate and area with a land subsidence rate greater than 100 mm/year decreased to 134.7 mm/year and 63.1 km 2 , respectively. Land subsidence remained in a relatively stable state from 2011 to 2015, during which time the maximum land subsidence rate and area with a land subsidence rate greater than 100 mm/year rarely changed.

Accuracy Assessment of the SBAS-InSAR Results
As shown in Figure 4, 37 leveling benchmarks evenly distributed in the BP were selected for accuracy assessment of the SBAS-InSAR results-that is, 17

Acquisition of Additional Stress Engendered from Static and Dynamic Loads
The additional stress information engendered from static and dynamic loads in the LDD land subsidence funnel was obtained by using the method outlined in Section 3.2. This additional stress will not cause significant land subsidence in this stratum when the additional stress is less than 20% of the stress engendered from the gravity of the soil above the depth [28]. Table 4 shows the detailed parameters of the soil in the LDD land subsidence funnel, while Figure 8 shows that the maximum value of the additional stress engendered from the static and dynamic loads is less than 20% of the stress engendered from the gravity of the soil when the depth is greater than 65.1 m by assuming that the density of the soil above this depth is 1930 kg/m 3 . This result reveals that the influence of the depth of additional stress engendered from the static and dynamic loads was 65.1 m in the LDD land subsidence funnel. As shown in Figure 9, the superposition of additional stress engendered from the static and dynamic loads in the LDD land subsidence funnel obviously enhanced as the depth increased. Figures 8 and 9 show that the additional stress decreased sharply as the depth increased between 10 and 70 m. These results indicate that the influence of additional stress engendered from static and dynamic loads on land subsidence may decrease as the depth increase, especially for uneven land subsidence.

Evaluating the Spatial Correlation between Groundwater Level in Different Aquifers and the Cumulative Land Subsidence
Previous studies have shown that land subsidence in the LDD land subsidence funnel is mainly caused by overexploitation of groundwater [6,24]. Analysis of the spatial correlation between the groundwater level in different aquifers and cumulative land subsidence was implemented in two different time periods.
We used the IDW interpolation method for the surface deformation information measured by SBAS-InSAR technique to obtain land subsidence rate maps during the 2003-2010 and 2010-2015 periods. Overlay analysis was applied to explore the spatial correlation between the groundwater level in different aquifers and the land subsidence rate. The spatial distribution of the second confined aquifer was more consistent with cumulative land subsidence than that of the phreatic aquifer, the first confined aquifer, and the third confined aquifer from 2003 to 2010 ( Figure 10). From 2003 to 2010, a large amount of the groundwater from the second confined aquifer was extracted for agricultural irrigation.
From 2010 to 2015, groundwater exploitation from the phreatic aquifer increased, while the exploitation of the second confined aquifer decreased; thus, the groundwater level of the phreatic aquifer decreased significantly during this period. This result reveals that the contribution of the phreatic aquifer to cumulative land subsidence may have increased. In order to further study the spatial correlation between the groundwater level in different aquifers and cumulative land subsidence, the SRCC between the land subsidence rate and the groundwater level in different aquifers was calculated in two different time periods. Table 5 shows that the SRCC between the groundwater level in the second aquifers and the land subsidence rate was 0.67, which was significantly larger than that of the other aquifers from 2003 to 2010. This result indicates that the second confined aquifer was the main layer contributing to cumulative land subsidence from 2003 to 2010. The SRCC between the groundwater level in the phreatic aquifers and the land subsidence rate was 0.5 from 2010 to 2015, which indicates that the phreatic aquifer was the main layer that contributed to cumulative land subsidence during this period.

Evaluation of the Spatial Correlation betweent Compressible Layers of Different Thicknesses and Cumulative Land Subsidence
The existence of compressible layers provides basic geological conditions for the occurrence of land subsidence. Figure 11a shows the distribution of compressible layers of different thicknesses. We divided the compressible layers into five categories and obtained the maximum and mean cumulative deformation from 2003 to 2015 in each compressible layer of a different thickness. The results show that the maximum and the mean cumulative deformation increased with the increase in the thickness of the compressible layers, except for the maximum cumulative deformation at 140-170 m, which was greater than the maximum cumulative deformation at 170-200 m. To further study the spatial correlation between compressible layers of different thicknesses and cumulative land subsidence, the cumulative deformation from 2003 to 2015 was divided into four categories, and the maximum and mean thicknesses of the compressible layers in each cumulative deformation were calculated. Figure 12b shows that the maximum and mean thicknesses of the compressible layers increased as the cumulative deformation increased, except for the maximum thickness at 900-1200 mm, which was greater than the maximum thickness at 1200-1500 mm.

Evaluation of the Spatial Correlation between Uneven Land Subsidence and Additional Stress Gradients
The additional stress gradient at different depths and the land subsidence rate gradient were calculated and both were divided into nine categories. Figure 13 shows the distribution of the differences between the land subsidence rate gradient and the additional stress gradient with depths less than 65.1 m. The results show that the differences increased rapidly as the depth increased, which indicates that the influence of additional stress engendered from static and dynamic loads on uneven land subsidence decreased rapidly as the depth increased. This can be explained by the fact that the superposition of the additional stress enhanced as the depth increased, while the differences between the additional stresses decreased as the depth increased.
We calculated the SRCC between the land subsidence rate gradient and the additional stress gradient, the gradient of compressible layers of different thickness, and the groundwater level gradient in different aquifers from 2003 to 2010 and 2010 to 2015. Table 6 shows that the SRCC between the land subsidence rate gradient and the additional stress gradient was significantly larger than that of the other factors from 2003 to 2010 and 2010 to 2015. This result indicates that additional stress plays an important role in controlling uneven land subsidence. The SRCC between the groundwater level gradient in the phreatic aquifer and the land subsidence rate gradient increased from 2003 and 2015, which indicates that the contribution of the phreatic aquifer to uneven subsidence increased. Table 6. Spearman's rank correlation coefficient (SRCC) between uneven land subsidence and its influencing factors.

Quantitative Analysis of the Contribution Rate of the Influencing Factors on Land Subsidence Using the ERT Method
In this study, we selected 36,695 pixels detected by the SBAS-InSAR technique as research data to quantify the contribution rate of the influencing factors to land subsidence using the ERT method. To automatically optimize the ERT model parameters, the coordinate descent method was used in this study. Of the research data, 70% were selected as the training data, and the rest were selected as the verification data.

Contribution Rate of the Influencing Factors to Cumulative Land Subsidence
We selected the cumulative deformation from 2003 to 2010 and 2010 to 2015 as the dependent variable, while the groundwater level in the different aquifers, the thickness of compressible layers, and additional stress engendered from static and dynamic loads in same time period were the explanatory variables. Figure 14 shows that the contribution rate of the second confined aquifer was 49.3% from 2003 to 2010, which indicates that the second confined aquifer was the main contribution layer during this period. We found that the influence of the second confined aquifer on cumulative land

Contribution Rate of the Influencing Factors to Uneven Land Subsidence
We calculated the gradient of cumulative deformation to represent uneven deformation and explored the relationship between uneven deformation and its influencing factors. The gradient of cumulative deformation from 2003 to 2010 and 2010 to 2015 was selected as the dependent variable and groundwater level gradient in the different aquifers, the gradient of compressible layers of different thicknesses, and the gradient of additional stress engendered from static and dynamic loads were selected as the explanatory variables. Figure 15a shows that the contribution rate of the static and dynamic loads accounted for 33.8% from 2003 to 2010, including 28.3% with a depth of less than 30 m and 5.5% with a depth greater than 30 m. This result indicates that the additional stress engendered from static and dynamic loads plays an important role in controlling uneven land subsidence, especially in the stratum with a depth of less than 30 m. We found that the contribution rates of the phreatic aquifer, the first confined aquifer, the second confined aquifer, the third confined aquifer, and the compressible layers of different thicknesses were 6%, 16.1%, 14.1%, 19.3%, and 10.7%, respectively from 2003 to 2010. The contribution rate of the phreatic aquifer increased sharply from 6% between 2003 and 2010 to 20.8% between 2010 and 2015. This result is consistent with that of SRCC to some extent. We found that the contribution rate of the static and dynamic loads decreased from 33.

Conclusions
In this study, we obtained surface deformation information from June 2003 to November 2015 in the BP by applying the SBAS-InSAR technique to 47 EA images and 48 R2 images, and then we verified the accuracy of the SBAS-InSAR results. Based on the ground deformation monitoring results, we analyzed the evolution pattern of land subsidence in the LDD land subsidence funnel. The spatial correlation between land subsidence and its influencing factors (i.e., the groundwater level in the phreatic aquifer, the first confined aquifer, the second confined aquifer, and the third confined aquifer, the compressible layers of different thicknesses, and the additional stress engendered from static and dynamic loads) were explored using the overlay analysis method, the spatial statistics method, the SRCC method, and the ERT method in machine learning. The following conclusions can be drawn. (2) We found that the greatest contribution to cumulative land subsidence was the second confined aquifer from 2003 to 2010, with a contribution rate of 49.3%. However, its contribution rate decreased to 23.4% from 2010 to 2015. The phreatic aquifer was the main contribution layer from 2010 to 2015, with a contribution rate of 33.9%. The contribution rate of the third confined aquifer increased from 2003 to 2015 due to the upper part of third confined aquifer gradually becoming the main mining layer. We innovatively calculated additional stress to represent static and dynamic load information to show the superposition of additional stress; however, the influence of additional stress engendered from the static and dynamic loads on cumulative land subsidence was slight. (3) The contribution rate of the static and dynamic loads to uneven land subsidence was 33.8% from 2003 to 2010 and 23.1% from 2010 to 2015, accounting for the largest proportion. The contribution rate of static and dynamic loads with depths of less than 30 m accounted for more than 80% of the total contribution rate of said static and dynamic loads. This result shows that said additional stress has a significant effect on the distribution of uneven settlement, especially in the stratum with a depth of less than 30 m.
In summary, the spatial distribution of cumulative land subsidence is mainly affected by geological conditions, and the additional stress engendered from static and dynamic loads has a significant impact on the distribution of uneven land subsidence. This research has improved our understanding of the relationship between land subsidence and its influencing factors.