Integration of InSAR Time-Series Data and GIS to Assess Land Subsidence along Subway Lines in the Seoul Metropolitan Area, South Korea

: The aims of this research were to map and analyze the risk of land subsidence in the Seoul Metropolitan Area, South Korea using satellite interferometric synthetic aperture radar (InSAR) time-series data, and three ensemble machine-learning models, Bagging, LogitBoost, and Multiclass Classiﬁer. Of the types of infrastructure present in the Seoul Metropolitan Area, subway lines may be vulnerable to land subsidence. In this study, we analyzed Persistent Scatterer InSAR time-series data using the Stanford Method for Persistent Scatterers (StaMPS) algorithm to generate a deformation time-series map. Subsidence occurred at four locations, with a deformation rate that ranged from 6–12 mm / year. Subsidence inventory maps were prepared using deformation time-series data from Sentinel-1. Additionally, 10 potential subsidence-related factors were selected and subjected to Geographic Information System analysis. The relationship between each factor and subsidence occurrence was analyzed by using the frequency ratio. Land subsidence susceptibility maps were generated using Bagging, Multiclass Classiﬁer, and LogitBoost models, and map validation was carried out using the area under the curve (AUC) method. Of the three models, Bagging produced the largest AUC (0.883), with LogitBoost and Multiclass Classiﬁer producing AUCs of 0.871 and 0.856, respectively.


Introduction
Land subsidence is a threat faced by big cities with extensive development that can negatively impact the environment, social systems, and the economy [1]. Subsidence occurs due to geological causes or anthropogenic processes such as massive urban development, infrastructure development [2,3], tunneling [4][5][6], water extraction [7][8][9], and earthquakes [10]. Subsidence has been observed in several metropolitan cities, including Mexico City [11], Shanghai [12], and Jakarta [13][14][15]. The Seoul Metropolitan Area is the center of governance, commerce, and culture in South Korea. It has been extensively developed and is the most densely populated city in Asia [16]. Industrial development and economic growth have led to city developments such as the expansion of subway lines and the construction of many structures and buildings [17]. By examining the potential effects of land subsidence, monitoring of land deformation could be the first step of a mitigation process. Seoul, which has a high population density and extensive developments, is extremely vulnerable to land subsidence. Given the severe negative impacts of subsidence, it necessary to elucidate the factors that cause land subsidence from integrated observations, to assess damage risks and prevent damage to roads, bridge, railways, and other infrastructure.
Monitoring land deformation is essential for reducing losses due to subsidence and developing a sound mitigation plan. With advancements in knowledge and technology, monitoring techniques have

Study Area
Seoul is the capital city of South Korea. It is located in the midwestern region of the Korean Peninsula at 126 • 59 40"E and 37 • 33 59"N and covers an area of 605.5 km 2 [51]. The Han River, which is one of the largest rivers crossing Seoul, divides the city into north and south areas. Seoul has a population of approximately 10 million people with a density of 16,364 people/km 2 , making it one of the most populous metropolitan cities in Asia [52]. The geological setting of Seoul consists of Jurassic granite, Precambrian metamorphic rocks (gneiss and schist), and Quaternary alluvium. Predominantly, coarse-grained, sandy alluvium sequence (<20 m thick) occurs along the Han River and its tributaries [53]. The alluvium is mainly distributed along the Han River and its tributaries, it is composed of coarse-to fine-grained sediments, often with high permeability. The alluvium and soil tend to be thicker close to the river, particularly its lower reaches, and thinner in mountain area [54]. In this study area, there are two types of aquifer unconsolidated alluvium aquifers and bedrock aquifers [17]. The alluvial aquifers are dominantly composed of silt and fine to coarse sands are appearance along the Han river and tributaries. The bedrock aquifers are mainly composed of fractured gneiss, schist, and granite.
As a metropolitan city that has experienced urbanization in recent years, Seoul has experienced many developments such as office, business, and residential buildings. This has an impact on increasing the density of the building in this area. The use of groundwater and other utilities in densely populated areas will have an impact on the weakening of soil conditions in these areas. This condition can indirectly lead to subsidence which can cause many losses, especially in areas with high population density such as Seoul.
Together with the increase in population, the economy has grown quickly, followed by industrial development. To meet the needs of the city inhabitants, Seoul undertook massive developments, including infrastructure, buildings, and transportation networks. At the end of 2019, a total of 23 rapid transit, light metro, commuter rail, and airport rail lines had been integrated into the Seoul Metropolitan Subway system [55]. This system operates in the Seoul Metropolitan Area, including Incheon and some Remote Sens. 2020, 12, 3505 4 of 25 satellite cities in Gyeonggi Province. Several regional lines such as those in Chungnam and Gangwon provinces are also connected to this system. Figure 1 has shown a map of the subway line that has been operating and in the process of construction in the Seoul Metropolitan Area. New transportation routes have since been added, such as the Gimpo Gold-Line in 2019, and the Line 7 and Line 5 extensions to Hanam City is currently under construction and slated to open in 2020. To improve connectivity in this metropolitan area, several future subway lines (until 2028) are still being planned.
Remote Sens. 2019, 12, 3505 4 of 27 in Chungnam and Gangwon provinces are also connected to this system. Figure 1 has shown a map of the subway line that has been operating and in the process of construction in the Seoul Metropolitan Area. New transportation routes have since been added, such as the Gimpo Gold-Line in 2019, and the Line 7 and Line 5 extensions to Hanam City is currently under construction and slated to open in 2020. To improve connectivity in this metropolitan area, several future subway lines (until 2028) are still being planned. In densely populated areas like Seoul, ground subsidence can cause much higher casualties and property damages than in lesser populated areas. For this reason, it is of utmost importance to conduct complete monitoring on the cases of ground subsidence to prevent damages on roads, railroads, and other infrastructures.

SAR Datasets
In this study, SAR data from Sentinel-1B was used to generate representations of surface deformation. SAR images derived from C-band data can be used to map surface deformation over broad areas while providing time-consistent ground-deformation data. Sentinel-1B has an acquisition cycle of 12 days. We used 93 SAR scenes from descending tracks. The descending datasets are listed in Table 1, the reference date with zero delta day and zero perpendicular baselines from the descending track is shown on October 11, 2018 as the reference date are shown in bold text. In densely populated areas like Seoul, ground subsidence can cause much higher casualties and property damages than in lesser populated areas. For this reason, it is of utmost importance to conduct complete monitoring on the cases of ground subsidence to prevent damages on roads, railroads, and other infrastructures.

SAR Datasets
In this study, SAR data from Sentinel-1B was used to generate representations of surface deformation. SAR images derived from C-band data can be used to map surface deformation over broad areas while providing time-consistent ground-deformation data. Sentinel-1B has an acquisition cycle of 12 days. We used 93 SAR scenes from descending tracks. The descending datasets are listed in Table 1, the reference date with zero delta day and zero perpendicular baselines from the descending track is shown on October 11, 2018 as the reference date are shown in bold text.

StaMPS Processing
One of the known methods for generating time-series data on surface deformation is StaMPS, as it can be used for analysis on the urban area like this study area. This method is commonly recognized on man-made objects such as buildings, infrastructure, and roads in urbanized areas like Seoul. Another major advantage of this method is that it does not require a prior deformation model, thus allowing analysis of different regions and several deformation causes [24]. At the start of PSI-StaMPS analysis, the interferogram process was begun to generate a couple of interferogram images from the 93 SAR scenes in descending track. Prior to interferogram generation, SAR data underwent a co-registration process, in which two SAR images were aligned to subpixel accuracy for accurate determination and noise reduction to form interferometric pairs. The SAR images were then resampled such that the slave images matched the master image. When the co-registration process was complete, the co-registered images were cropped to focus on the study area before the interferogram generation process began. During the interferogram processing stage, a topographical phase was generated. Once the interferogram images were generated, the topographic phase was subtracted from the interferogram using Shuttle Radar Topography Mission digital elevation model (SRTM DEM) as the reference [56]. After the topographic phase was removed, a DInSAR phase was generated, which contained only the deformation phase.
For StaMPS processing, SAR data on 11 October 2018 was chosen as a master image and generated 92 interferograms from the descending track. After generating the interferograms, the StaMPS module was used to calculate the displacements of persistent scatterers. To begin the StaMPS process, phase stability estimation was used to select a subset of pixels based on amplitude analysis. Then, phase stability for each pixel was estimated through phase analysis [57]. Once the phase noise associated with all selected persistent scatterer (PS) pixels was estimated, the selected PS pixels were weeded out to separate the persistent points and noise, then the wrapped phase of the selected pixels was corrected for spatially-uncorrelated look-angle errors in the DEM. After correction, the corrected phase could now be unwrapped, and the PS output was generated. The parameter of the StaMPS process is shown in Table 2. Upon completion of the StaMPS process, PS results were plotted in a time-series map and a mean deformation from the line of sight (LOS) map [58,59]. The mean deformation map is converted into vertical deformation data by assuming the horizontal deformation is very small compared to the vertical deformation that causes by land subsidence [60,61]. In recent studies, the vertical deformation was used to monitor land subsidence in several places, therefore horizontal deformation in this study can be assumed as negligible and converted into vertical deformation value [13,14]. The result of vertical deformation will be assigned as a negative value from initial ground-level observation, which indicates the land subsidence measure vertically on that point. The vertical deformation can be calculated using this Equation (1) as follows [62]: where d LOS is the deformation in line of sight and θ is the incident angle.

Generation of Susceptibility Map
The workflow to generate land subsidence susceptibility maps, using machine learning algorithms, is illustrated in Figure 2, the summary of the methodology is as follows: 1.
The land subsidence inventory was generated by analyzing Sentinel-1 SAR datasets from 2017 to 2020 from descending tracks using the time-series InSAR technique based on StaMPS algorithms.

2.
In order to generate land susceptibility maps, the training and test datasets were prepared by randomly divided the persistent scatterers (PS) points of time series into 50% of training data and 50% of testing datasets to validate the land subsidence susceptibility map. Training data is used to train the machine learning to predict subsidence in our land subsidence susceptibility model. Besides, test data is used to measure the performance, of the algorithm that we used to make the land subsidence susceptibility model. This preparation method of training and testing datasets was used in several studies of land subsidence susceptibility which has optimal results [6,63,64].

3.
Preparation of land subsidence conditioning factors: Spatial correlation analysis was applied to assess each factor before the land-subsidence model was generated. In the spatial correlation analysis, the spatial relationship between historical subsidence events, and each factor was examined [65]. Spatial correlation analysis was also used to investigate the weight of each factor class to assess the strength of the relationship between each factor class and subsidence occurrence. Frequency ratios were calculated to reflect spatial correlations by calculating the proportion of cells in which subsidence occurred in each class; then, factors were reclassified. Frequency ratios have been commonly used to determine spatial correlations [40,42,66]. Here, each frequency ratio represents the quantitative relationship between subsidence in a selected class and all subsidence in the area for all classes as a percentage of the entire map [67]. If the ratio is greater than one, the relationship between subsidence and the factor class is considered strong. By contrast, if the ratio is less than one, the spatial relationship is weak [40].

4.
Generating land subsidence susceptibility map: in this step, we constructed a land subsidence susceptibility map using Bagging, LogitBoost, and Multiclass Classifier algorithms. The land subsidence conditioning factors that consist of frequency ratio values.

5.
After the land subsidence susceptibility map was generated, all susceptibility maps were evaluated using ROC analysis.

Bagging
Bagging is a commonly used meta-algorithm that was developed to enhance the stability and accuracy of the machine-learning algorithms used in statistical classification and regression [43]. Bagging was one of the earliest ensemble techniques that used the bootstrap sampling method [68]. The bootstrap method entails apparent random sampling with replacements to generate more than one sample that shapes a training set. Each generated subset is used to assemble a decision tree, with all trees aggregated later into the final model. This improves class accuracy by reducing the variance of class error. We used Bagging to obtain a much improved and more accurate land subsidence model because this algorithm performs well in predicting land subsidence susceptibility, as it is sensitive to small adjustments in the training data and consequently [43,46]. Bagging ensembles more effectively reduce uncertainty and bias compared to other ensembles [69]. In addition, this algorithm is capable of reflecting complex non-linear interaction between land subsidence and related factors, although it lacks a statistical significance test which can limit quantitative hypothesis testing [43]. Bagging first uses a classifier to reduce variance, then carries out classification and regression by relying on bottom-up learning.

LogitBoost
LogitBoost is a boosting algorithm developed by Friedman et al., [70] to reduce bias and variance. The LogitBoost algorithm was modified from AdaBoost, which was the commonly boosting method for handling noisy data that execute additive logistic regression with least-square fits for individual class [48,71]. LogitBoost reduces training errors and enhances classification accuracy [72] by using additive logistic regression for classification with a base-learning regression scheme and an ability to perform multiclass classification. The land subsidence-inventory map was divided into two classessubsidence occurrence and subsidence non-occurrence-using the following equation [71]: where D is the number of landslide-dependent factors and β is the coefficient of the i-th component within input vector x. Probabilities were constructed using the linear logistic regression method, as follows:

Bagging
Bagging is a commonly used meta-algorithm that was developed to enhance the stability and accuracy of the machine-learning algorithms used in statistical classification and regression [43]. Bagging was one of the earliest ensemble techniques that used the bootstrap sampling method [68]. The bootstrap method entails apparent random sampling with replacements to generate more than one sample that shapes a training set. Each generated subset is used to assemble a decision tree, with all trees aggregated later into the final model. This improves class accuracy by reducing the variance of class error. We used Bagging to obtain a much improved and more accurate land subsidence model because this algorithm performs well in predicting land subsidence susceptibility, as it is sensitive to small adjustments in the training data and consequently [43,46]. Bagging ensembles more effectively reduce uncertainty and bias compared to other ensembles [69]. In addition, this algorithm is capable of reflecting complex non-linear interaction between land subsidence and related factors, although it lacks a statistical significance test which can limit quantitative hypothesis testing [43]. Bagging first uses a classifier to reduce variance, then carries out classification and regression by relying on bottom-up learning.

LogitBoost
LogitBoost is a boosting algorithm developed by Friedman et al., [70] to reduce bias and variance. The LogitBoost algorithm was modified from AdaBoost, which was the commonly boosting method for handling noisy data that execute additive logistic regression with least-square fits for individual class [48,71]. LogitBoost reduces training errors and enhances classification accuracy [72] by using additive logistic regression for classification with a base-learning regression scheme and an ability to perform multiclass classification. The land subsidence-inventory map was divided into two classes-subsidence occurrence and subsidence non-occurrence-using the following equation [71]: where D is the number of landslide-dependent factors and β i is the coefficient of the i-th component within input vector x. Probabilities were constructed using the linear logistic regression method, as follows: where C is the number of classes and the least-square fit Lc(x) is resolved such C=1 L C C (x)= 0 to set up the least number of instances per node of the logistic model trees.

Multiclass Classifier
Multiclass Classifier is a meta-classifier that is used to process multiclass datasets with two-class classifiers. It is efficient at applying error-correcting output codes to enhance accuracy [47]. In the field of machine learning, multiclass classification can classify events into one of three or more classes. Although several classification algorithms can work with more than two classes with the aid of natural binary algorithms, the conversion to multinomial classifiers requires the use of several strategies. Multiclass classification techniques can be divided into categories such as transformation to binary, extension from binary, and hierarchical classification [73].

Factors Related to Land Subsidence
The increase in land subsidence occurrence in megacities mainly due to lowered groundwater levels and the presence of heavy buildings [17]. Excessive use of groundwater has an impact on decreasing pore pressure on the soil, coupled with the presence of heavy buildings that lead to further soil compaction [74,75]. A large amount of groundwater leaked, and dewatering accompanies a decrease in groundwater levels. Those conditions weaken the surrounding land and lead to subsidence occurrence. In addition, based on a report by the Seoul government which has conducted field investigations, one of the causes of subsidence is excessive groundwater use and damage to water utilities and sewage [3,17].
A combination of several environmental factors can influence land-subsidence susceptibility. The training data and test data point was chosen from the land subsidence inventory map as shown in Figure 3a. Here, we investigated 10 subsidence-related factors (Figure 3b-k) and evaluated the correlation between each factor and land-subsidence occurrence as shown in Table 3 below. In the previous studies, land subsidence conditioning factors such as altitude, slope, aspect, plan curvature, profile curvature, lithology, distance to the river, land use, normalized differential vegetation index (NDVI), piezometric data (groundwater drawdown) have been used with the main cause of subsidence in Iran being groundwater drawdown [49]. Another study has evaluated several factors mentioned before to identify land subsidence in the mining area in Malaysia [40]. Reclassification was employed to place subsidence related-factors into several classes using the quantile method to objectively identify and analyze the effect of each class using a specific range of values. The quantile classification method can solve unbalanced distribution by focusing on the equality of domain grids [76]. Thus, the range of each class is automatically determined based on the quantile method.
The derivative feature from the digital elevation model (DEM) contains hydrogeological and topographic conditions such as hydrological zone response, concentration, and containment of runoff volume in the landscape, which directly or indirectly affect the occurrence of land subsidence. The topography features such as elevation, slope, aspect, topographic wetness index (TWI), and profile curvature (Figure 3b-f) data were extracted using SRTM DEM 1 arc second. This feature has been widely used as conditional factors in the land subsidence susceptibility model [37,43,49]. Elevation has a role as a bridge between lithology and rain characteristics in the area. A higher area has a lower probability of additional precipitation than a lower area which has the potential to have high precipitation [49]. The elevation refers to the height of the study area which varies between 0-813 m. subsidence area has correlated with the fault distance in the area between 0-1,972 m. The area with a groundwater extraction rate above 350 m 3 /day has a spatial correlation with the land subsidence occurrences.
(a)    Slope and aspect factor as a secondary feature of DEM may relate to the land subsidence occurrence because it can affect the soil infiltration in the landscape and water utility conditions [77,78]. The groundwater or sewage infiltration through a damaged pipe may erode the soil particles [77]. That condition can indirectly affect soil conditions that lead to land subsidence occurrence.
TWI is a secondary topographic variable that specifies the degree of water accumulation in a certain location; it is commonly used to quantify topographic influence on hydrological processes. The TWI of this study area was prepared based on the DEM and categorized into five classes. Profile curvature is a geomorphic property which shows the flow intensity, the amount of sediment, and erosion [79]. The profile curvature map was categorized into three classes: convex (less than −0.01), flat (−0.01-0.01), and concave (larger than 0.01).
Land-use is related to the ecological conditions and anthropological activities of land subsidence occurrence [80,81]. Variation in land use can explain the highly dissected zones within the region and provide insight into the land subsidence activity that is likely to occur. A land-use map was created based on a digital characteristics map provided by (National Geographic Information Institute) NGII; six land-use categories were analyzed in this study and the map is shown in Figure 3g. Underground water utilities are one of the most frequently cited factors impacting land subsidence [54,82].
Groundwater extraction can correlate to land subsidence event, especially in the underground structure [49]. Groundwater extraction map of this study area was prepared from annual average groundwater outflow data measured in 231 points from Seoul Government. Prior to spatial analysis, data on groundwater extraction should be converted into raster data. The accuracy of the raster map depends on the number of data points; however, the availability of groundwater data was limited in this study. To generate raster maps from these limited data, we applied the inverse distance weighted method to make statistical inferences using observed values before interpolating to create raster maps of groundwater extraction with a 30 m × 30 m cell size as shown in Figure 3h.
In order to evaluate the relationship between groundwater conditions and the occurrence of subsidence, several factors related to groundwater conditions can be evaluated such as distance to rivers [43,83]. The distance to rivers is represented by the proximity of the rivers and drainages in the study area [40]. The distance to the river map was calculated based on the map of the river provided by the National Geographic Information Institute (NGII). Then, buffers around the river were created (measured in meters), then the raster map was divided into five classes as shown in Figure 3i.
The geological parameters of a certain area may influence the occurrence of land subsidence, which is related to the lithological and structural variation which leads to differences in strength and permeability of rocks and soil. Lithology has also been an important feature to understand the land subsidence process by describing the structure of underground materials, as most cases of subsidence occurred on landfills and alluvium layers that have natural consolidation. Besides, the groundwater withdrawal and load from the building induce the compaction rate of the alluvium [17,62]. Any fluid present in the porous medium structure is under pressure because of the weight of the structure above it. If the fluid is withdrawn from below the surface, a decrease in pore pressure can occur, resulting in the loss of the supports and possibly lead to subsidence [74]. The lithology map can be seen in Figure 3j and the description of the lithology is shown in Table 4.  Figure 3k shows a raster map of distance to the fault which used in this research. The presence of a fault line may weaken the porous medium structure and influence the subsidence occurrence, as in the case of Las Vegas, USA [84]. In this study, we used the fault lines as one of the factors of land subsidence to consider the impact of the fault line and the ground deformation. We performed a buffering distance from the fault line with the data published by the Korean Institute of Geoscience and Mineral Resources (KIGAM) with a 1:50,000 scale then categorized into five classes.
The spatial correlation for each factor was calculated using the frequency ratio and shown in Table 3. For the classes with frequency ratio values close to one or more, it shows a high correlation between subsidence and class of those factors, and vice versa [40]. From this calculation, it is considered that the subsidence has a correlation in the area with characteristics low elevation (0-30 m) and the flat area. The three classes of slope map (0-1.8, 1.8-3.86, 3.86-7.97 degree) and four classes of aspect map (Flat, North, East, Southeast) shows a spatial correlation with the land subsidence from this frequency ratio calculation. In addition, subsidence correlated with the drying area covered by building and non-permeable surface as shown in the land-use factor. Also, the ratio of alluvium (Qa) which dominates appear around the Han river exhibit a correlation with subsidence. Besides, there are six categories from this map that correlate with this calculation of lithology factor too. The land subsidence area has correlated with the fault distance in the area between 0-1972 m. The area with a groundwater extraction rate above 350 m 3 /day has a spatial correlation with the land subsidence occurrences.

Results
The results from the PSI-StaMPS time-series analysis on deformation and the land-subsidencesusceptibility map are presented below. The time-series results were obtained by selecting location points at which subsidence occurred.

Land Subsidence Inventory Map
The land subsidence map from the Seoul Metropolitan Area was generated via the PSI-StaMPS method, using InSAR images in descending track captured from 2017 to 2020, and is shown in Figure 4a. To enhance measurement reliability, the vertical deformation map was generated using mean line-of-sight velocity [62]. Zooming locations for each selected point are shown in Figure 4b-e and the time-series graphs of all points are shown in Figure 4f,g.
In Figure 4b, represented the subsidence map in Gimpo the western part of this study area, with the black line indicating the subway line. In this area, especially in point A, there was a subsidence of 33.5 mm recorded with a mean deformation velocity of 12.57 mm/year from 2017 to 2020. The subsidence in Gimpo mostly occurs along the subway line that was newly operational in 2019. In this location, the subsidence is associated with the compressible deposits which consist of alluvium.
As can be seen from Figure 4c, the subsidence was exhibited around the subway line where the Shincheon subway station is located. Point B was recorded the maximum subsidence of up to 29 mm from 2017-2020 with the mean deformation velocity of 7.34 mm/year. This location consists of the intersection of subway line no 5 and subway line no 2, a residential area with high-density building also appears in this area. The subsidence in point B correlates with groundwater extraction and high-density building, as those conditions influence the subsidence rate in this area. Figure 4d shows an overview of the subsidence near the Haengsin Station, known to be a depot for the metro train. The observation in point C revealed total subsidence of 34.35 mm from 2017 to 2020 and a mean deformation velocity of subsidence of 10.25 mm/year. The location is characterized by alluvium deposit that dominantly appears around the Han river. The geological features in this area show a correlation with the subsidence. Figure 4e shows the subsidence map in Hanam city, the eastern part of the study area, with a black line indicates the subway line. The StaMPS result in point D shows the maximum subsidence of 26.17 mm from 2017 to 2020 with a mean deformation velocity rate of 8.42 mm/year. Hanam city is an area that has many developments such as residential areas and commercial buildings; subway construction is also being carried out in this area. Land subsidence can be related to underground work and building construction which pumping a large amount the groundwater [85]. In Hanam city, the subsidence is associated with urban land use and groundwater usage of this area. Figure 4f,g show the time series graph from four selecting points in the study area. Generally, the periodic subsidence appeared in the vertical deformation graph. A possible reason for periodic subsidence in those areas was seasonal variation in the groundwater level and surface water loading. This result occurred due to the seasonal effect of groundwater extraction, where the selected points were surrounded by high-density buildings that mostly used groundwater as a water source. During the high season of groundwater withdrawal, the groundwater level decreased. After the rainy season, the groundwater level will rise and increase the aquifer system recovery (uplift) [84]. Those conditions may influence the deformation velocity in this study area.

Land Subsidence Susceptibility Map
Land subsidence susceptibility maps were constructed using the training dataset compiled from InSAR time-series data as land subsidence inventory map, ten land subsidence conditioning factors, and three different algorithms. Once the model training process was completed, susceptibility maps were constructed to visualize vulnerability to subsidence in the study area. In the land-subsidence-susceptibility map, each pixel in the study area was assigned a specific subsidence value using the quantile method [47].
Five susceptibility classes were used to reflect vulnerability to land subsidence: very low, low, moderate, high, and very high. Areas of very high susceptibility (marked red in Figure 5a-c) were most frequently found near the Han River and subway lines. The algorithms indicated that the northwestern area is very susceptible to subsidence, which may be due to several factors. For example, the geology of the northwestern area, which is near the Han River, is dominated by alluvium, which likely increases subsidence susceptibility. Most cases of observed subsidence have occurred on alluvium layers exhibiting natural consolidation; additionally, the increasing number of buildings and use of groundwater can exacerbate this condition [15,37,86]. A highly susceptible area was observed in the east, which may be associated with on-going construction in the same area. A few susceptible areas were observed along the northern Han River, mostly comprising high-density buildings and subway stations, but most of the northern area has low-to-moderate susceptibility to subsidence. Groundwater extraction in this area may increase the risk of subsidence, as some areas in which groundwater was extracted are now used for subway stations. Thus, the ground conditions in these areas might have been affected.

Land Subsidence Susceptibility Map
Land subsidence susceptibility maps were constructed using the training dataset compiled from InSAR time-series data as land subsidence inventory map, ten land subsidence conditioning factors, and three different algorithms. Once the model training process was completed, susceptibility maps were constructed to visualize vulnerability to subsidence in the study area. In the land-subsidencesusceptibility map, each pixel in the study area was assigned a specific subsidence value using the quantile method [47].
Five susceptibility classes were used to reflect vulnerability to land subsidence: very low, low, moderate, high, and very high. Areas of very high susceptibility (marked red in Figure 5a-c) were most frequently found near the Han River and subway lines. The algorithms indicated that the northwestern area is very susceptible to subsidence, which may be due to several factors. For example, the geology of the northwestern area, which is near the Han River, is dominated by alluvium, which likely increases subsidence susceptibility. Most cases of observed subsidence have occurred on alluvium layers exhibiting natural consolidation; additionally, the increasing number of buildings and use of groundwater can exacerbate this condition [15,37,86]. A highly susceptible area was observed in the east, which may be associated with on-going construction in the same area. A few susceptible areas were observed along the northern Han River, mostly comprising high-density buildings and subway stations, but most of the northern area has low-to-moderate susceptibility to subsidence. Groundwater extraction in this area may increase the risk of subsidence, as some areas in which groundwater was extracted are now used for subway stations. Thus, the ground conditions in these areas might have been affected.   Figure 6 shows the distribution of pixels in each susceptibility map generated by the meta-ensemble models. In the land-subsidence-susceptibility map generated using the Bagging model, 63.67% of the area exhibited very low susceptibility to subsidence, whereas 15.53%, 7.55%, 6.42%, and 7.04% of the area exhibited low, moderate, high, and very high susceptibility to subsidence, respectively. In the map constructed using the Multiclass Classifier model, 63.66% and 18.31% of the area exhibited very low and low susceptibility to subsidence, respectively, whereas 10.98%, 1.62%, and 5.42% of the area exhibited moderate, high, and very high susceptibility to subsidence, respectively. Lastly, based on the LogitBoost model, most of the area was not very susceptible to subsidence, with 64.44%, 18.07%, 6.62%, 4.41%, and 6.66% of the map classified as areas of very low, low, moderate, high, and very high susceptibility, respectively. The distribution of pixels in very low class and very high class in Figure 6 has a similar pattern between each algorithm. A very high class can be considered as the subsidence area. Meanwhile, medium and high classes are considered as areas of future land subsidence and very low and low classes are areas with the lowest probability of land subsidence in the future. With this description, it is possible to know the area and the extent of the potential for subsidence that will occur in the future. Generally, the consistency of this model can be evaluated based on the presence of past land subsidence in land subsidence susceptibility classes. The existence of a higher percentage of land subsidence pixels in a higher degree of susceptibility classes indicates higher consistency and vice-versa.
Remote Sens. 2019, 12, 3505 17 of 27 Figure 6 shows the distribution of pixels in each susceptibility map generated by the metaensemble models. In the land-subsidence-susceptibility map generated using the Bagging model, 63.67% of the area exhibited very low susceptibility to subsidence, whereas 15.53%, 7.55%, 6.42%, and 7.04% of the area exhibited low, moderate, high, and very high susceptibility to subsidence, respectively. In the map constructed using the Multiclass Classifier model, 63.66% and 18.31% of the area exhibited very low and low susceptibility to subsidence, respectively, whereas 10.98%, 1.62%, and 5.42% of the area exhibited moderate, high, and very high susceptibility to subsidence, respectively. Lastly, based on the LogitBoost model, most of the area was not very susceptible to subsidence, with 64.44%, 18.07%, 6.62%, 4.41%, and 6.66% of the map classified as areas of very low, low, moderate, high, and very high susceptibility, respectively. The distribution of pixels in very low class and very high class in Figure 6 has a similar pattern between each algorithm. A very high class can be considered as the subsidence area. Meanwhile, medium and high classes are considered as areas of future land subsidence and very low and low classes are areas with the lowest probability of land subsidence in the future. With this description, it is possible to know the area and the extent of the potential for subsidence that will occur in the future. Generally, the consistency of this model can be evaluated based on the presence of past land subsidence in land subsidence susceptibility classes. The existence of a higher percentage of land subsidence pixels in a higher degree of susceptibility classes indicates higher consistency and vice-versa. Figure 6. Distribution of pixels classified as areas of very low, low, moderate, high, and very high susceptibility in the land-subsidence-susceptibility maps generated by three machine-learning algorithms.

Model Validation
A good land-subsidence-susceptibility map should be able to predict future subsidence in the target area and provide initial information for preventative actions. To validate our susceptibility maps, the accuracy of all used algorithms in this study was evaluated by ROC curve analysis. ROC curve analysis has been used as a standard way of validating the probability models used to generate land subsidence susceptibility maps, according to the area under the curve (AUC) [6]. The AUC, which ranges from 0.5 to 1, was used to assess model accuracy. An AUC value near 0.5 indicates that a model is inaccurate, whereas a value near 1.0 indicates an ideal model with a good fit [50]. AUCs were calculated to compare model performance, with the model with the highest AUC value was taken to be the best model. The Bagging model produced the largest AUC (0.883), followed by the LogitBoost model (0.871) and the Multiclass Classifier model (0.856) as shown in Figure 7. Thus, the Bagging model generated the best subsidence-susceptibility map in this study. However, all models produced good AUC values, indicating that they all performed well in terms of predicting landsubsidence susceptibility in the study area. Figure 6. Distribution of pixels classified as areas of very low, low, moderate, high, and very high susceptibility in the land-subsidence-susceptibility maps generated by three machine-learning algorithms.

Model Validation
A good land-subsidence-susceptibility map should be able to predict future subsidence in the target area and provide initial information for preventative actions. To validate our susceptibility maps, the accuracy of all used algorithms in this study was evaluated by ROC curve analysis. ROC curve analysis has been used as a standard way of validating the probability models used to generate land subsidence susceptibility maps, according to the area under the curve (AUC) [6]. The AUC, which ranges from 0.5 to 1, was used to assess model accuracy. An AUC value near 0.5 indicates that a model is inaccurate, whereas a value near 1.0 indicates an ideal model with a good fit [50]. AUCs were calculated to compare model performance, with the model with the highest AUC value was taken to be the best model. The Bagging model produced the largest AUC (0.883), followed by the LogitBoost model (0.871) and the Multiclass Classifier model (0.856) as shown in Figure 7. Thus, the Bagging model generated the best subsidence-susceptibility map in this study. However, all models produced good AUC values, indicating that they all performed well in terms of predicting land-subsidence susceptibility in the study area.

Land Subsidence Inventory Map
StaMPS was employed to analyze land subsidence in the Seoul Metropolitan Area, with a deformation time-series map generated for all-terrain in the area based on descending-track data acquired from March 2017 to May 2020. Subsequently, a vertical deformation map was generated from the time-series analysis.
The results indicate that occurrences of subsidence were distributed over several locations in the study area, such as at Gimpo City (Figure 4a, point A). At this location, subsidence occurred near a new subway line that opened in 2019. Subsidence also occurred near a station with two subway lines near the southern Han River; high-density buildings are also found in this area (Figure 4a, point B). Other areas where subsidence occurred include a Seoul metro depot for several subway lines ( Figure  4a, point C) and a newly developed area with several recently constructed residential and commercial buildings (Figure 4a, point D). Cases of land subsidence in the Seoul Metropolitan Area almost mostly occurred in the vicinity of subway lines or where the ground was weak. In particular, all areas of subsidence were located near subway lines and stations, implying that subway operation may be associated with subsidence in the study area [38,87]. Additionally, subway-tunnel excavations might have impacted the surrounding soil and the environment. During construction, how underground water is discharged, and the excavation method should be taken into consideration. Further analysis is needed to examine the impacts of construction on land subsidence in this area.
In terms of urbanization, the construction of buildings near subway lines and stations may add to the load on the soil and increase the risk of subsidence. High demand for transportation and urbanization increases the intensity of building construction and the amount of groundwater extracted. Besides, the groundwater extraction to fulfill social demand can influence the subsidence rate in the study area [3,88]. From the time-series deformation graph in Figure 4f, we can see a seasonal variation of subsidence rate in the study area between 2017-2020. It can also be noted that the high subsidence may appear in summer seasons. On the other hand, the subsidence rates lower after the rainy season which appears in July-August [3], the aquifer conditions after the rainy season are expected to have some influence on the subsidence rate. More study based on the water-level data analysis is required to better assess the possibility of the deformation, and further details of the structure and hydrologic parameters of groundwater should be resolved [89]. A combination of

Land Subsidence Inventory Map
StaMPS was employed to analyze land subsidence in the Seoul Metropolitan Area, with a deformation time-series map generated for all-terrain in the area based on descending-track data acquired from March 2017 to May 2020. Subsequently, a vertical deformation map was generated from the time-series analysis.
The results indicate that occurrences of subsidence were distributed over several locations in the study area, such as at Gimpo City (Figure 4a, point A). At this location, subsidence occurred near a new subway line that opened in 2019. Subsidence also occurred near a station with two subway lines near the southern Han River; high-density buildings are also found in this area (Figure 4a, point B). Other areas where subsidence occurred include a Seoul metro depot for several subway lines (Figure 4a, point C) and a newly developed area with several recently constructed residential and commercial buildings (Figure 4a, point D). Cases of land subsidence in the Seoul Metropolitan Area almost mostly occurred in the vicinity of subway lines or where the ground was weak. In particular, all areas of subsidence were located near subway lines and stations, implying that subway operation may be associated with subsidence in the study area [38,87]. Additionally, subway-tunnel excavations might have impacted the surrounding soil and the environment. During construction, how underground water is discharged, and the excavation method should be taken into consideration. Further analysis is needed to examine the impacts of construction on land subsidence in this area.
In terms of urbanization, the construction of buildings near subway lines and stations may add to the load on the soil and increase the risk of subsidence. High demand for transportation and urbanization increases the intensity of building construction and the amount of groundwater extracted. Besides, the groundwater extraction to fulfill social demand can influence the subsidence rate in the study area [3,88]. From the time-series deformation graph in Figure 4f, we can see a seasonal variation of subsidence rate in the study area between 2017-2020. It can also be noted that the high subsidence may appear in summer seasons. On the other hand, the subsidence rates lower after the rainy season which appears in July-August [3], the aquifer conditions after the rainy season are expected to have some influence on the subsidence rate. More study based on the water-level data analysis is required to better assess the possibility of the deformation, and further details of the structure and hydrologic parameters of groundwater should be resolved [89]. A combination of InSAR time-series analysis and analysis of hydrology of the area subsidence and the geomechanical parameters of the underlying aquifer structure area is a potential research topic to find out the cause of subsidence.
However, information extraction from the StaMPS technique is sometimes difficult due to a large number of PSs, thereby long interpretation times. The large amounts of PSs may cause several deficiencies in the analysis process, such as a reduction in the extraction of useful information from the dataset. Also, to obtain better time-series analysis results, several optimization steps can be taken for persistent scatterer points [20]. Several optimization methods for selecting PS points have been carried out [20,90]. This, in turn, could serve as a reference for future work on land subsidence studies that could increase efficiency and potentially lead to better deformation analyzes. Further studies should investigate other factors related to land subsidence. The results of GIS analysis are discussed below.

Land Subsidence Susceptibility Maps
Land subsidence should be mapped accurately to prepare subsidence inventories for target areas as an essential part of susceptibility analysis. To generate a subsidence-inventory map, we used InSAR remote-sensing data, which covered a broad area and were collected efficiently. Moreover, the InSAR time-series (StaMPS) method allows measuring of land subsidence and its rate to be measured to millimeter-level [37]. ArcGIS software was used for database construction, coordinate conversion, overlay analysis, and susceptibility modeling. Subsidence-related factors were identified based on information derived from the literature before susceptibility maps were generated [40,49]. Meta-ensemble machine learning was applied to estimate land-subsidence susceptibility, using three algorithms-Bagging, LogitBoost, and Multiclass Classifier.
Land subsidence susceptibility maps revealed that the northwestern and eastern areas, as well as a small area in the center, were most susceptible to land subsidence. We analyzed subsidence-related factors by comparing general patterns of subsidence with factor maps. The results revealed that most cases of subsidence occurred in areas where the ground consisted of alluvial layers, especially for subsidence that occurred near the Han River. However, there is a potential for subsidence in central areas that have different geological conditions from these two regions which are dominated by the alluvium layer. In this case, there are other factor influences besides geological factors in this subsidence modeling. The central area was assessed as moderately to highly susceptible to subsidence. In this area, there is a high density of buildings, and groundwater had been extracted at several spots near a subway station. Accordingly, groundwater outflow during subway operation could be another cause of land subsidence in this area [91]. If large amounts of groundwater are extracted, the surrounding soil structure may be affected. Thus, the weakened soil may be less able to withstand the pressure from aboveground buildings. Based on spatial distribution analysis, land use and groundwater extraction most strongly influence subsidence. In this study, the groundwater-extraction map was obtained via the interpolation of data on groundwater extraction near the subway infrastructure in Seoul, which might have generated errors in spatial distribution. Access to groundwater data for areas outside Seoul would allow for more accurate analyses of land subsidence.
In addition, several other factors that have not been identified in this study can be evaluated. The selection of these factors is based on the previous literature which may have some differences such as the condition of the area study and the subsidence mechanism. For this reason, additional analysis of factors related to the subsidence mechanism is needed to adjust to several concepts or assumptions of the subsidence mechanism that could potentially occur. However, adding some details such as aquifer conditions and groundwater levels can help evaluate the correlation of these factors and have the potential to improve the land subsidence susceptibility maps.
Susceptibility maps were validated based on ROC curves and AUC values and by comparing map predictions with testing data, which comprised 50% of the total dataset. The results indicate that the meta-ensemble approach performed better than the other approaches. A traditional model based on frequency ratio produced an AUC of 0.844, whereas the AUCs produced by the meta-ensemble models Multiclass Classifier, LogitBoost, and Bagging were 1.2%, 2.7%, and 2.95% larger, respectively. Therefore, these techniques can reduce bias and account for factor weights to improve the accuracy of predictions.
Although all models produced good AUC values and thus performed well in terms of predicting land subsidence in the study area, the susceptibility map constructed using the Bagging model was the most accurate. These results agree well with previous findings that model performance in terms of predicting subsidence improves with the use of machine learning [92]. Therefore, the Bagging model should be used for the susceptibility map. In fact, the Bagging model uses more recently well-organized techniques in soft computing modeling that not only enable improvement of a single classifier but can also deal with complex and high-dimensional modeling problems. Given the complexity of land subsidence and the interaction of several related factors, novel combinations of model-method can considerably improve the accuracy of land subsidence prediction.

Conclusions
This study aimed to assess and map land-subsidence susceptibility in the Seoul Metropolitan Area using InSAR data from Sentinel-1 acquired between 2017 and 2020. A deformation time-series map was generated using StaMPS, which revealed that land subsidence occurred in four areas (Figure 4a), with subsidence rates of 6-12 mm/year. Subsidence mostly occurred near subway lines and where a new subway line was being constructed. Besides, the subsidence occurrence in areas with high-density building and heavy groundwater extraction may lead to weakening of the ground.
To identify the factors influencing land subsidence, 10 potential subsidence-related factors were analyzed. Factor maps were overlaid with subsidence maps and each pixel within a layer was evaluated in a GIS environment. The training and testing datasets were prepared from time-series InSAR from the Sentinel-1 SAR dataset using the StaMPS method. Then, the spatial correlation for each factor was calculated using the frequency ratio. Meta-ensemble algorithms (Bagging, LogitBoost, and Multiclass Classifier) were employed to generate land-subsidence-susceptibility maps, and model performance in terms of reliability and prediction accuracy was compared using ROC analysis.
The land-subsidence-susceptibility maps revealed that the northwestern and eastern areas, as well as a small central area, were most vulnerable to land subsidence. The susceptibility of the northwestern and eastern areas most appear in the geological condition which is dominated by alluvium. By contrast, in the central area, which is moderate to highly susceptible, land use and groundwater extraction are the main factors influencing subsidence risk. From the ROC analysis, the AUC produced by each model was computed. All models performed well (AUC > 0.8). Bagging produced the largest AUC of 0.883, followed by LogitBoost (0.871) and Multiclass Classifier (0.856). Compared with the frequency-ratio method, machine-learning models produced more accurate predictions and are thus more appropriate for subsidence analysis in this study area.
Accurate predictions are essential for environmental planning to control and mitigate the impacts of land subsidence. Land-subsidence-susceptibility mapping is a valuable method for identifying areas with a high risk of land subsidence. Despite limitations associated with the datasets used in this study, we demonstrated that the analysis of remote-sensing and GIS spatial data via the machine-learning approach generates reliable and accurate predictions of land subsidence. Further research is needed to determine the effect of aquifer conditions, subway construction and operation on land subsidence. A large dataset of PS points may influence a deficiency in extracting useful information. The optimization approaches for selecting PS points must be proposed to overcome those limitations in future work such as optimization hotspot analysis and other statistic methods [20,90]. Furthermore, with the high complexity of the relationship between land subsidence and other factors, a novel combination of a machine learning and meta-heuristic algorithm as a hybrid method can improve the results of the land subsidence susceptibility map.