Establishing an ANN-Based Risk Model for Ground Subsidence Along Railways Establishing an ANN-Based Risk Model for Ground Subsidence Along Railways

: Ground subsidence occurrences have drastically increased in the Seoul area of the Republic of Korea. The structural defects of underground utilities were found to be the primary cause of ground subsidence based on several ﬁeld investigations. This paper presents a risk model that assesses the probability of occurrence of ground subsidence along railways. In this study, support vector machine (SVM) and multi-layer perceptron (MLP) approaches were successfully employed to develop an artiﬁcial neural network (ANN)-based risk model. The risk model, in conjunction with a database composed of underground utilities and geological boring data along urban railway networks, was utilized to develop a hazard map system. A limited ﬁeld experimental program was conducted for the purpose of veriﬁcation, resulting in a promising tool to effectively maintain railway networks. Abstract: Ground subsidence occurrences have drastically increased in the Seoul area of the Republic of Korea. The structural defects of underground utilities were found to be the primary cause of ground subsidence based on several field investigations. This paper presents a risk model that assesses the probability of occurrence of ground subsidence along railways. In this study, support vector machine (SVM) and multi-layer perceptron (MLP) approaches were successfully employed to develop an artificial neural network (ANN)-based risk model. The risk model, in conjunction with a database composed of underground utilities and geological boring data along urban railway networks, was utilized to develop a hazard map system. A limited field experimental program was conducted for the purpose of verification, resulting in a promising tool to effectively maintain railway networks.


Introduction
The number of ground subsidence occurrences has dramatically increased in the city Seoul, Republic of Korea, since 2010, as illustrated in Figure 1 [1].

Introduction
The number of ground subsidence occurrences has dramatically increased in the city Seoul, Republic of Korea, since 2010, as illustrated in Figure 1  This phenomenon can be extremely detrimental for various infrastructural and residential reasons. Ground subsidence herein needs to be defined differently from "sinkhole", which is This phenomenon can be extremely detrimental for various infrastructural and residential reasons. Ground subsidence herein needs to be defined differently from "sinkhole", which is common in limestone karsts, and is associated with sub-circular surface depressions or collapsing structures

•
Since 2010, around 600 ground subsidences have mainly taken place annually in roadways, sidewalks, and vicinities of underground construction areas. • Around 40 percent of events occurred during the summer season (June to August), which has frequent heavy rainfalls ranging from 50~100 mm/h.

•
The causes of the events have been categorized as: (1) Damage to water and sewage utilities, which allows surrounding soil loss through the holes made; (2) Inappropriate backfill compaction during excavation activities that include open-cut construction and installation of underground utilities; (3) Drop in the ground water table due to pumping activities or damage to sheet pile wall.
Several previous studies have established hazard models with respect to sinkholes, incorporating various data interpretation techniques [2,5,6]. These models mainly depend on the frequency of event occurrences of sinkholes, since they are mostly associated with a specific geological condition which can be relatively clearly identified. Since the causes of ground subsidence in this study are associated with the deterioration of buried utilities and improper construction activities, it is crucial to establish a relevant database to develop a risk model. With a given robust database, various methodologies can be adapted to develop ground subsidence susceptibility and hazard models. There are previous studies that deal with ground subsidence risk assessment due to underground mining and installation of underground box structures based on field measurements and numerical analysis [7,8].
Artificial Neural Network (ANN) is a complex mathematical model, or computational model, that is inspired by the structural aspects of biological neuron networks, which are widely used in the modelling of nonlinear systems and system identification [8]. Park [9] conducted an extensive review study on the application of ANN in geotechnical engineering. According to this reference, ANN has been drastically employed since the early 1990s, such as constitute modelling, geo-material characterization, assessment of bearing capacity of pile, slope stability, evaluation of liquefaction, shallow foundations, and tunnels and underground openings. In this study, two ANN-based models, support vector machine (SVM) [10] and multi-layer perceptron (MLP) [11], were employed to establish the risk model with respect to the underground utilities (mainly water supply and sewer pipe systems) along the railway network.

Establishing of Database
In order to establish the risk model, extensive efforts were made to integrate a water and sewer pipe databases to where the ground subsidences have taken place. Oh et al. [12] conducted a series of numerical analyses to examine the impact of ground subsidence on the railroad, as presented in Figure 2. The Mohr-Coulomb plasticity theory was applied for the plain strain 15-node element, and the ground condition was assumed to be weathered soil and rock, properties that account for the majority of the railroad roadbed in South Korea. With respect to the loading condition, 50 kPa of static distributed load was applied along the upper subgrade layer. Parametric numerical analysis yielded a critical depth of underground cavities, with sizes of 1.5 m and 3.0 m, respectively, taking into account realistic design practice [13]. The position of underground cavities varied from one to five locations, as illustrated in Figure 2, and the vertical displacement of the upper subgrade was predicted.
It was found that when the underground cavity was located in the No. 3 position, as in Figure  2, the displacement of the railroad tended to drastically decrease. Based on this finding, a lateral spacing of 25 m with respect to the center of the railroad was considered as the influential zone, and thus the corresponding data was extracted within this zone.
The following data was mainly summarized and analyzed: pipe age in years, pipe cover depth in meters, pipe diameter in meters, and pipe length in meters. Figure 3 shows the distribution of water pipe data obtained in the location where the ground subsidences occurred. As noted, while the number of relatively new pipes (less than 10 years) is increasing due to the increase of residents and business sectors within this area, the amount of old pipes (more than 25 years) is also magnificent. In terms of the cover depth, most of the water pipes are buried within 1.5 m. Although the range of depth coverage seems to be suitable in accordance with the regulation of underground utility installation, inadequate structural conditions due to improper backfilling and degradation of pipe material may result in the excessive ground subsidence. Given the location information, the network of water and sewer pipe systems along the railroad was embedded in the mapping system. Parametric numerical analysis yielded a critical depth of underground cavities, with sizes of 1.5 m and 3.0 m, respectively, taking into account realistic design practice [13]. The position of underground cavities varied from one to five locations, as illustrated in Figure 2, and the vertical displacement of the upper subgrade was predicted.
It was found that when the underground cavity was located in the No. 3 position, as in Figure 2, the displacement of the railroad tended to drastically decrease. Based on this finding, a lateral spacing of 25 m with respect to the center of the railroad was considered as the influential zone, and thus the corresponding data was extracted within this zone.
The following data was mainly summarized and analyzed: pipe age in years, pipe cover depth in meters, pipe diameter in meters, and pipe length in meters. Figure 3 shows the distribution of water pipe data obtained in the location where the ground subsidences occurred. As noted, while the number of relatively new pipes (less than 10 years) is increasing due to the increase of residents and business sectors within this area, the amount of old pipes (more than 25 years) is also magnificent. In terms of the cover depth, most of the water pipes are buried within 1.5 m. Although the range of depth coverage seems to be suitable in accordance with the regulation of underground utility installation, inadequate structural conditions due to improper backfilling and degradation of pipe material may result in the excessive ground subsidence. Parametric numerical analysis yielded a critical depth of underground cavities, with sizes of 1.5 m and 3.0 m, respectively, taking into account realistic design practice [13]. The position of underground cavities varied from one to five locations, as illustrated in Figure 2, and the vertical displacement of the upper subgrade was predicted.
It was found that when the underground cavity was located in the No. 3 position, as in Figure  2, the displacement of the railroad tended to drastically decrease. Based on this finding, a lateral spacing of 25 m with respect to the center of the railroad was considered as the influential zone, and thus the corresponding data was extracted within this zone.
The following data was mainly summarized and analyzed: pipe age in years, pipe cover depth in meters, pipe diameter in meters, and pipe length in meters. Figure 3 shows the distribution of water pipe data obtained in the location where the ground subsidences occurred. As noted, while the number of relatively new pipes (less than 10 years) is increasing due to the increase of residents and business sectors within this area, the amount of old pipes (more than 25 years) is also magnificent. In terms of the cover depth, most of the water pipes are buried within 1.5 m. Although the range of depth coverage seems to be suitable in accordance with the regulation of underground utility installation, inadequate structural conditions due to improper backfilling and degradation of pipe material may result in the excessive ground subsidence. Given the location information, the network of water and sewer pipe systems along the railroad was embedded in the mapping system. Given the location information, the network of water and sewer pipe systems along the railroad was embedded in the mapping system.

Risk Model
In this subsection, the support vector machine (SVM) [10] and multi-layer perceptron (MLP) [11] methods were employed to develop a risk model. They have been used in a variety of pattern classification, data mining, and data analysis applications. To develop the risk model, two sets of databases were used: one represents the water and wastewater pipes dataset where the ground subsidences have taken place and the other collected data from the places without the event.

Support Vector Machine (SVM)
Support vector machine creates a hyperplane using the supervised learning method of input data and determines the class of data through hyperplane. Suppose that m data X = {X 1 , X 2 , . . . , X m } for the water supply in a certain area are given, each of which is labeled with a binary class Y i ∈ {−1, 1}. If the ground subsidence occurs at the output of the SVM, . . , m which maximally separates two classes, S + = x j Y j = 1 and S − = x j Y j = −1 where W and w 0 are the weight and bias of the decision function, respectively as shown in Figure 4. In other words, SVMs are trained with samples from two classes for finding the maximum-margin (shaded area in Figure 4) hyperplane. Samples on the margin are called the support vectors. Further, the general SVM model can predict which class the data belongs to; however, unlike the logistic regression model, the probability of belonging to each class cannot be calculated. To overcome this problem, a probabilistic SVM [14] model was used to estimate posterior probability using the distance between the test data and the hyperplane.

Risk Model
In this subsection, the support vector machine (SVM) [10] and multi-layer perceptron (MLP) [11] methods were employed to develop a risk model. They have been used in a variety of pattern classification, data mining, and data analysis applications. To develop the risk model, two sets of databases were used: one represents the water and wastewater pipes dataset where the ground subsidences have taken place and the other collected data from the places without the event.

Support Vector Machine (SVM)
Support vector machine creates a hyperplane using the supervised learning method of input data and determines the class of data through hyperplane. Suppose that m data for the water supply in a certain area are given, each of which is labeled with a binary class . If the ground subsidence occurs at the output of the SVM, goal of the SVM is to design a decision hyperplane 0 for 1,..., where W and 0 w are the weight and bias of the decision function, respectively as shown in Figure 4. In other words, SVMs are trained with samples from two classes for finding the maximum-margin (shaded area in Figure 4) hyperplane. Samples on the margin are called the support vectors. Further, the general SVM model can predict which class the data belongs to; however, unlike the logistic regression model, the probability of belonging to each class cannot be calculated. To overcome this problem, a probabilistic SVM [14] model was used to estimate posterior probability using the distance between the test data and the hyperplane.

Multi-Layer Perceptron (MLP)
Multi-layer perceptron imitates the human brain to perform intelligent tasks [15]. It can represent complicated relationships between input and output and acquire knowledge about these relationships directly from the data. Suppose we are given a data set for the water supply in a certain area, and each  Figure 5, is represented by where hi ji w is a weight between the ith input node and the jth hidden node, oh lj w is a weight between the jth hidden node and the lth output node, h n is the number of hidden nodes, and ( ) f ⋅ is a actuation function.

Multi-Layer Perceptron (MLP)
Multi-layer perceptron imitates the human brain to perform intelligent tasks [15]. It can represent complicated relationships between input and output and acquire knowledge about these relationships directly from the data. Suppose we are given a data set X = {X 1 , X 2 , . . . , X m } for the water supply in a certain area, and each X n = [x n 1 , x n 2 , . . . , x n d ] T (n = 1, 2, . . . , m) belongs to one of two classes (the ground subsidence occurs or vice versa). The output of the MLP, as shown in Figure 5, is represented by where w hi ji is a weight between the ith input node and the jth hidden node, w oh lj is a weight between the jth hidden node and the lth output node, n h is the number of hidden nodes, and f (·) is a actuation function. .
As in [16], the network parameters are chosen to minimize the following function Here, W is the weight vector of the network, and l N is the number of samples belong to l ω .
In the limit of infinite data, we can use Bayes' formula to express (3) as Since the second term in Equation (4) is independent of W , to minimize ( ) Therefore, we can compute the probabilities of ground subsidence occurrences by using output values of MLP [11].

Evaluation of Risk Model
The 310 water and 101 wastewater data were obtained within 50 m of the area where ground subsidences occurred. Furthermore, 154 water and 828 wastewater data were collected at the locations where the events had not occurred. Table 1 shows the examples of water and wastewater data collected. We train the neural network such that the output g l (X n ) approaches the target value As in [16], the network parameters are chosen to minimize the following function Here, W is the weight vector of the network, and N l is the number of samples belong to ω l . In the limit of infinite data, we can use Bayes' formula to express (3) as Since the second term in Equation (4) is independent of W, to minimize J(W), we obtain g l (X n ) ≈ P(ω l |X n ).
Therefore, we can compute the probabilities of ground subsidence occurrences by using output values of MLP [11].

Evaluation of Risk Model
The 310 water and 101 wastewater data were obtained within 50 m of the area where ground subsidences occurred. Furthermore, 154 water and 828 wastewater data were collected at the locations where the events had not occurred. Table 1 shows the examples of water and wastewater data collected. As shown in the Table 1, the main attributes of the water and wastewater data were installation date, diameter, length, and average cover depth. In this study, polynomial and radial basis function (RBF) kernel techniques were used for SVM approach in order to map the given data to the new vector space. The classification performance of the SVM with polynomial and the RBF kernel is shown in Table 2. In the table, indicator of event "0.00" represents the absence of ground subsidence and "1.00" the presence of it. The classification accuracy based on the polynomial kernel was 81.5% and the accuracy based on the RBF kernel was 86.6%. Overall, the prediction accuracy for the presence of event was superior to that of the absence of event case. As stated earlier, MLP creates a hidden layer between input and output layers to learn the weights and bias of nodes belonging to the three layers. The classification accuracy was measured by varying the number of hidden nodes which are the neurons in the hidden layer to establish a water supply risk model.
As presented in Table 3, the classification accuracy of absence of the event case (87.7% with the number of hidden nodes of 700) was significantly improved compared to the SVM model. As the number of hidden nodes increased, higher accuracy was generally achieved. The wastewater risk model was also assembled using the SVM and MLP. Similar to water risk model assessment, RBF and polynomial functions were used as SVM kernel functions, and their performance is shown in Table 4.  In Table 4, the overall classification accuracy is as high as 89.6% and 90.3%; however, it should be noted that the classification accuracy of the presence of the event was substantially low compared to the water risk model. This is attributed to the fact that wastewater risk model was trained with a relatively imbalanced dataset between the absence and presence of ground subsidence cases, unlike the water risk datasets. Table 5 presents the results of the wastewater risk model assessment using the MLP model. The classification accuracy of the presence of the event case was improved compared to that of the SVM model. Consequently, the MLP model was chosen for the risk model of water supply and wastewater utilities due to the enhancement in terms of classification accuracy in case of presence of the event. Further, to utilize the risk model, a simple program was developed in the Matlab environment to compute the probability of risk.

Field Evaluation of Risk Model
Incorporating the risk model established herein, this study developed a QGIS-based mapping system as illustrated in Figure 6. Figure 6a shows the railway networks of the Seoul area in the hazard map system and Figure 6b shows the variation of risk levels of tested locations boxed in Figure 6a. This system enables users to view the distribution of risk level along with inventory data that includes basic geotechnical boring data, water supply and wastewater utility's age, cover depth, length, etc., if available. The geological risk model was developed based on regression analysis taking into account the ground water level, alluvial layer thickness, and standard penetration test (SPT) data as follows. Where, GT is the groundwater table in meters, T is the alluvial layer thickness in meters, and SPTN is the sum of the number of blows required for the second and third 150 mm of penetration from SPT test. The total risk index can be calculated using Equation (7), and is classified into five categories as shown in Table 6.

80~100
Very high risk requires closing the line for repair 65~79 High risk requires immediate maintenance action 50~64 Medium risk requires periodic maintenance action 25~49 Low risk requires a maintenance plan <25 Very low risk requires no maintenance action Figure 7 illustrates the procedure to quantify the risk level in the hazard map system.  Where, GT is the groundwater table in meters, T is the alluvial layer thickness in meters, and SPT N is the sum of the number of blows required for the second and third 150 mm of penetration from SPT test. The total risk index can be calculated using Equation (7), and is classified into five categories as shown in Table 6.

80~100
Very high risk requires closing the line for repair 65~79 High risk requires immediate maintenance action 50~64 Medium risk requires periodic maintenance action 25~49 Low risk requires a maintenance plan <25 Very low risk requires no maintenance action Figure 7 illustrates the procedure to quantify the risk level in the hazard map system. Where, GT is the groundwater table in meters, T is the alluvial layer thickness in meters, and SPTN is the sum of the number of blows required for the second and third 150 mm of penetration from SPT test. The total risk index can be calculated using Equation (7), and is classified into five categories as shown in Table 6.

80~100
Very high risk requires closing the line for repair 65~79 High risk requires immediate maintenance action 50~64 Medium risk requires periodic maintenance action 25~49 Low risk requires a maintenance plan <25 Very low risk requires no maintenance action Figure 7 illustrates the procedure to quantify the risk level in the hazard map system.  In this study, field validation of the established risk model was conducted using a nondestructive survey and pneumatic cone penetration (PCP) test. Twelve field test sections were selected based on the risk index represented from low to high. A series of nondestructive surveys were then conducted using a 500 MHz antenna equipped with ground penetrating radar (GPR) and electrical resistance (ER) survey installing twelve electrodes with one-meter spacing to investigate the following: the presence of buried utilities, abnormality of ground condition, and the presence of underground cavities. Once the nondestructive test was completed, PCP tests were conducted at the locations where abnormalities of ground were detected and normal locations for comparison. Figure 8 shows examples of nondestructive test results. Appl. Sci. 2018, 8, x 9 of 12 In this study, field validation of the established risk model was conducted using a nondestructive survey and pneumatic cone penetration (PCP) test. Twelve field test sections were selected based on the risk index represented from low to high. A series of nondestructive surveys were then conducted using a 500 MHz antenna equipped with ground penetrating radar (GPR) and electrical resistance (ER) survey installing twelve electrodes with one-meter spacing to investigate the following: the presence of buried utilities, abnormality of ground condition, and the presence of underground cavities. Once the nondestructive test was completed, PCP tests were conducted at the locations where abnormalities of ground were detected and normal locations for comparison. Figure  8 shows examples of nondestructive test results.  As shown in Figure 8, as risk indices increase, the subsurface profile obtained from GPR becomes irregular, and the average ER values become relatively lower. To determine the average ER values per test section, image analysis was employed to quantify the area of individual ER zones. Once the area was quantified for individual ER zones, the weighted average considering the quantified area was then computed to represent the test section. In this manner, the ER survey was found to be effective in the assessment of risk index along with GPR survey, which was useful to detect the presence of buried utilities as long as the depth of cover was within 3 m. Pneumatic cone penetration testing was conducted in accordance with DIN ISO 22476-2 up to 2 m below the surface. The PCP tests had an advantage of penetrating deeper depths with constant impact energy, controlled by pneumatic force, compared to a typical dynamic cone penetrometer test. From this test, two indices were assigned to evaluate the bearing capacity of subsurface. For instance, N 50 indicates the number of blows to penetrate the pneumatic cone up to 0.5 m. Higher numbers of blows indicates larger load bearing capacity of the subsurface. Generally, the number of blows tends to decrease at the locations where abnormality of ground was detected compared to normal conditions. The relationships between average electrical resistivity, the number of blows from PCP tests, and the risk index were established based on the risk model, as shown in Figure 9. The N-value of PCP test was found to have a promising relationship with risk index and average electrical resistivity even though the R-square value was not sufficiently high to ensure statistical accuracy. Consequently, it was deemed that further research needed to be conducted to validate this relationship.
As shown in Figure 8, as risk indices increase, the subsurface profile obtained from GPR becomes irregular, and the average ER values become relatively lower. To determine the average ER values per test section, image analysis was employed to quantify the area of individual ER zones. Once the area was quantified for individual ER zones, the weighted average considering the quantified area was then computed to represent the test section. In this manner, the ER survey was found to be effective in the assessment of risk index along with GPR survey, which was useful to detect the presence of buried utilities as long as the depth of cover was within 3 m. Pneumatic cone penetration testing was conducted in accordance with DIN ISO 22476-2 up to 2 m below the surface. The PCP tests had an advantage of penetrating deeper depths with constant impact energy, controlled by pneumatic force, compared to a typical dynamic cone penetrometer test. From this test, two indices were assigned to evaluate the bearing capacity of subsurface. For instance, N50 indicates the number of blows to penetrate the pneumatic cone up to 0.5 m. Higher numbers of blows indicates larger load bearing capacity of the subsurface. Generally, the number of blows tends to decrease at the locations where abnormality of ground was detected compared to normal conditions. The relationships between average electrical resistivity, the number of blows from PCP tests, and the risk index were established based on the risk model, as shown in Figure 9. The N-value of PCP test was found to have a promising relationship with risk index and average electrical resistivity even though the R-square value was not sufficiently high to ensure statistical accuracy. Consequently, it was deemed that further research needed to be conducted to validate this relationship.

Concluding Remarks
Damage from aged underground utilities in metropolitan areas was found to be the primary cause of ground subsidence in this study. The present paper developed a risk model to assess the probability of ground subsidence along railways surrounding the Seoul area of the Republic of Korea. The following conclusions and perspectives can be obtained.
(1) Between the two approaches (i.e., SVM and MLP) employed in this study to develop a risk model, the MLP approach was found to be more efficient by improving the classification accuracy of the presence of ground subsidence. (2) The number of hidden nodes played a significant role in determining overall accuracy of the risk model. Considering the number of input data for water and wastewater pipes (564 for water and 929 for wastewater pipes), it was deemed appropriate to have more hidden nodes than inputs to ensure the reliability of the risk model. A previous study recommended that the number of hidden nodes be 1.5 times the number of parameters in the input layer [17]. (3) A series of field experimental programs using GPR, ER survey, and PCP tests exhibited that the correlations between measurements seemed to be promising in the assessment of the risk index with respect to ground subsidence in spite of limited field validations. It is strongly recommended that further investigation needs to be done to verify this finding in the near future.

Concluding Remarks
Damage from aged underground utilities in metropolitan areas was found to be the primary cause of ground subsidence in this study. The present paper developed a risk model to assess the probability of ground subsidence along railways surrounding the Seoul area of the Republic of Korea. The following conclusions and perspectives can be obtained.
(1) Between the two approaches (i.e., SVM and MLP) employed in this study to develop a risk model, the MLP approach was found to be more efficient by improving the classification accuracy of the presence of ground subsidence. (2) The number of hidden nodes played a significant role in determining overall accuracy of the risk model. Considering the number of input data for water and wastewater pipes (564 for water and 929 for wastewater pipes), it was deemed appropriate to have more hidden nodes than inputs to ensure the reliability of the risk model. A previous study recommended that the number of hidden nodes be 1.5 times the number of parameters in the input layer [17]. (3) A series of field experimental programs using GPR, ER survey, and PCP tests exhibited that the correlations between measurements seemed to be promising in the assessment of the risk index with respect to ground subsidence in spite of limited field validations. It is strongly recommended that further investigation needs to be done to verify this finding in the near future. Ground penetrating radar survey is useful to visually detect underground utilities and underground homogeneity. Electrical resistance survey gives the variation of electrical resistance, which can be quantified for the tested section. Pneumatic cone penetration testing provides bearing capacity of tested ground, which is associated with risk index and ER values. Consequently, the use of hazard map systems in conjunction with field investigation is recommended as a proactive approach to mitigate the progress of ground subsidence along railways.