Support Vector Regression for the Relationships between Ground Motion Parameters and Macroseismic Intensity in the Sichuan–Yunnan Region

Featured Application: Prediction of macroseismic intensity from ground motion parameters based on a support vector regression model is better than that based on linear regression model. Abstract: In this paper, a nonlinear regression method called a support vector regression (SVR) is presented to establish the relationship between engineering ground motion parameters and macroseismic intensity (MSI). Sixteen ground motion parameters, including peak ground acceleration (PGA), peak ground velocity (PGV), Arias intensity, Housner intensity, acceleration spectrum intensity, velocity spectrum intensity, and others, are considered as candidates for feature selection to generate optimal SVR models. The datasets with both useable strong ground motion records and corresponding investigated MSIs in the Sichuan–Yunnan region, China, are all collected, and these 125 pairs of datasets are used for selecting features and comparing regression results. Nine ground motion parameters are selected as the most relevant features: PGA is the ﬁrst fundamental one and PGV is the ﬁfth relevant feature. Based on performance measures on the testing dataset, the best SVR model is given when the number of features is one all the way up to nine. According to predicted accuracy, SVR models with Gaussian kernel give much better MSI prediction than linear kernel SVR models and linear regression models. In particular, the Gaussian kernel SVR of PGA gives much higher MSI prediction accuracy than the linear regression model of PGV and PGA. The proposed SVR models are valid for MSI values from VI to IX, and they can be used for rapid mapping damage potential and reporting seismic intensity for this high-seismic-activity region.


Introduction
Macroseismic intensity (MSI) is a local measure of the degree of earthquake damage and ground motion shaking in an earthquake, as evidenced by observed damages and human responses. MSI is crucial for seismic hazard, seismic design, and seismic loss, and has been widely used in the seismological, engineering, and loss-modeling communities [1][2][3][4][5]. MSI can also provide guidelines on seismic retrofitting of existing structures after strong ground motion shaking, in order to reduce the seismic risk of the structures in future earthquakes. There are many different MSI scales, and the most used ones are the modified Mercalli intensity (MMI) scale, European macroseismic scale (EMS), Japan

Study Area and Datasets
The Sichuan-Yunnan region is located in the area where the Eurasian plate and the Indian plate collide with each other and squeeze strongly. It covers the Sichuan-Yunnan diamond block, the southern Yunnan block, the western Yunnan block, and the eastern part of the Bayanhar block. This region consists of main faults like the Longmen Mountain fault, Anning River fault, Zemu River fault, Xiaojiang River fault, Honghe River fault, and Xiaojinhe River fault, and is the most significant area for strong earthquakes in western China [36]. The area is about 865,000 square kilometers, and is twice as large as California and almost three times larger than Italy. The seismicity in the Sichuan-Yunnan region is at a high level. Up to now, more than 30 earthquakes larger than magnitude 7.0 have occurred, and these earthquakes have caused significant casualties and property losses. For example, the great Wenchuan earthquake killed more than 69,000 people and caused 852.3 billion Yuan direct economic losses [37]. After a destructive earthquake occurs, a reconnaissance team will be sent to conduct field investigation and produce an MSI map that reflects the scope and degree of the ground impact caused by the earthquake. These MSI maps are very valuable for earthquake emergency response and post-disaster rehabilitation. To measure the ground motions caused by earthquakes, there are now 400 permanent, strong ground motion observation stations mounted in this region: 224 stations in Sichuan Province and 176 in Yunnan Province. These stations have recorded high-quality, strong ground motions in the past few years [38].
The MSI for a station in an earthquake is determined by adding the location of the station to the MSI map and determining which isoseismic line encircles it. Nine moderate-to-large earthquakes that have both investigated MSI maps and ground motion records are analyzed in this study. The locations of these earthquakes, as well as the spatial distribution of the strong motion stations, are displayed Appl. Sci. 2020, 10, 3086 4 of 17 in Figure 1. There are 106 different strong motions stations, and some stations record more than one earthquake event. The detailed information of the nine earthquakes and the numbers of strong motion records are shown in Table 1. The surface magnitude (Ms), which is used in China to measure earthquake magnitude, of these earthquakes varies from 5.8 to 8.0, and the depths are from 5 km to 33 km. The epicentral distances of stations are from 6 km to 312 km, and the MSIs are from VI to IX. In total, 125 pairs of ground motion records and MSIs are used to analyze the relationship. The complete information of the 125 sets of data on MSIs, station names, site conditions, and epicentral distances is shown in Supplementary Materials Table S1.
VI to IX. In total, 125 pairs of ground motion records and MSIs are used to analyze the relationship. The complete information of the 125 sets of data on MSIs, station names, site conditions, and epicentral distances is shown in Supplementary Materials Table S1.

Ground Motion Parameters
Ground motion generated by an earthquake is complicated, and multiple parameters rather than a single parameter are used to quantitatively reflect the characteristics of ground motion. The amplitude, frequency content, and duration are the most significant characteristics in the engineering

Ground Motion Parameters
Ground motion generated by an earthquake is complicated, and multiple parameters rather than a single parameter are used to quantitatively reflect the characteristics of ground motion. The amplitude, frequency content, and duration are the most significant characteristics in the engineering community [39]. Some ground motion parameters, such as PGA and PGV, provide information on amplitude, while other parameters, such as acceleration spectrum intensity and Arias intensity, reflect the above two or three characteristics. A total of 16 ground motion parameters are used to characterize the recorded ground motion in the relationship study. The single amplitude parameters include PGA and PGV. Peak ground displacement (PGD) is not included, due to its sensitiveness to long period noise, and different choices of baseline correction and filtering of acceleration may give quite different displacements. The individual frequency content parameters include central frequency, which measures the frequency where the power spectral density is most concentrated, and the ratio v max /a max , which gives the period where the ground motion is most significant. The individual duration parameters include bracketed duration and significant duration. Bracketed duration is the total time elapsed between the first and the last excursions of a given level. Absolute level 5 gal and relative level 5% PGA are both considered. Significant duration is defined as the interval time when a proportion of the total Arias intensity is accumulated, and the interval between 5% and 95% thresholds is chosen here. The ground motion parameters reflecting more than one characteristic include derivations of accelerations, such as the root mean square of acceleration, cumulative absolute velocity, Arias intensity, characteristic intensity, JMA equivalent peak acceleration, and destructive index. Spectrum-based intensities, such as the acceleration spectrum intensity, velocity spectrum intensity, and Housner intensity, are considered. For the completeness and readability of the paper, the definitions, explanations, and calculation formulas of the above ground motions are given as follows [39][40][41].
The root mean square of acceleration (a RMS ) stands for the effective average acceleration in the significant duration, given by where T s is significant duration and t 1 and t 2 are the start and end time instants, respectively. The cumulative absolute velocity (CAV) is proposed by U.S. Electrical Power Research Institute for indicating the onset of structural damage caused by an earthquake, and is given by The Arias intensity (AI) is proposed for indicating the damage potential to nuclear power plants, and is given by The characteristic intensity (I c ) is proposed to indicate structural damage caused by maximal deformation and dissipative hysteretic energy, and is given by The JMA equivalent peak acceleration is used to calculate JMA seismic intensity. It is the value of vector composition of three component band-pass accelerations, each of which is filtered by a compound filter composed of a amplitude filter, a high-cut filter, and a low-cut filter, such that the total duration when the vector composite acceleration is larger than this value is longer than 0.3 s, as shown in Equation (5). The schematic diagram of JMA equivalent peak acceleration, compound filter, and total duration with respect to peak values are shown in [11]. Since the JMA seismic intensity scale of 0-VII is quite different than the MSI I-XII in China, the JMA equivalent peak acceleration (A 0.3 ) rather than JMA seismic intensity is used to characterize ground motion: where a 0 is the vector composite acceleration, and τ(a 0 ) is the duration of composite acceleration larger than a 0 . The destructive index (DI) has been proposed by Nakamura [42] to estimate the damage potential of ground motion by calculating the logarithm of the product of vertical acceleration and velocity, and is given by The acceleration spectrum intensity (ASI) is proposed to analyze ground motion effect on short period structures like concrete dams, and is given by where S a (ξ = 0.05, T) is the acceleration response spectrum with damping ratio ξ = 0.05. The velocity spectrum intensity (VSI) is proposed to indicate ground motion damage potential on most structures whose fundamental periods are between 0.1 and 2.5 s: where PSV(ξ = 0.05, T) is the pseudo-velocity response spectrum with damping ratio ξ = 0.05. The Housner intensity (HI) is quite similar to velocity spectrum intensity, except that the damping ratio is selected as 0.2, since the damping ratio will become larger when the structure is damaged by an earthquake: Three component accelerations are used to calculate JMA equivalent peak acceleration, while only up-down (UD) component acceleration is used for the destructive index. For the remaining ground motion parameters, the geometric means of the two horizontal component accelerations are used. The natural frequency of middle-and high-rise buildings is mainly within 0.1-2.0 Hz, and in low-rise buildings is within 5.0-10.0 Hz. The corrected acceleration is filtered by a second-order Butterworth bandpass filter with a passing band of 0.1-10.0 Hz. A complete list of the 16 calculated ground motion parameters from the set of 125 ground motion records is shown in Supplementary Materials Table S1. The scatter plots of MSI versus ground motion parameter are shown in Figure 2, and the corresponding absolute values of Pearson correlation coefficients are shown in Figure 3. It can be seen that ASI, A 0.3 , Ic, PGA, DI, HI, PGV, VSI, AI, a RMS , and CAV have higher linear correlations with MSI, and duration parameters Td b,5 , Td s , Td b,5% and frequency parameters central frequency (CF) and v max /a max almost have no linear correlation. Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 17

Support Vector Regression
For a given dataset = ( , ), ( , ), … , ( , ) , where = ( , , … , ) ∈ ℝ is the ith sample with feature dimensionality of d, and is the value for the jth feature, ∈ ℝ is the corresponding target value of the sample and is the number of samples. As shown in Figure 4, the linear kernel SVR is to find a function ( ) = + , where = ( , , … , ) ∈ ℝ is a normal vector of the hyperplane and ∈ ℝ is the offset between the hyperplane and the coordinate origin, such that the function is as flat as possible and has at most deviation from each sample target value . The optimization objective is given by [26]: where is a regularization constant and | | is an -insensitive loss function, given by

Support Vector Regression
For a given dataset = ( , ), ( , ), … , ( , ) , where = ( , , … , ) ∈ ℝ is the ith sample with feature dimensionality of d, and is the value for the jth feature, ∈ ℝ is the corresponding target value of the sample and is the number of samples. As shown in Figure 4, the linear kernel SVR is to find a function ( ) = + , where = ( , , … , ) ∈ ℝ is a normal vector of the hyperplane and ∈ ℝ is the offset between the hyperplane and the coordinate origin, such that the function is as flat as possible and has at most deviation from each sample target value . The optimization objective is given by [26]: where is a regularization constant and | | is an -insensitive loss function, given by

Support Vector Regression
For a given dataset is the ith sample with feature dimensionality of d, and x ij is the value for the jth feature, y i ∈ R is the corresponding target value of the sample and m is the number of samples. As shown in Figure 4, the linear kernel SVR is to find a function f (x) = ω T x + b, where ω = (ω 1 , ω 2 , . . . , ω d ) T ∈ R d is a normal vector of the hyperplane and b ∈ R is the offset between the hyperplane and the coordinate origin, such that the function is as flat as possible and has at most deviation from each sample target value y i . The optimization objective is given by [26]: Appl. Sci. 2020, 10, 3086 where C is a regularization constant and |ξ| is an -insensitive loss function, given by Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 17 The first term of Equation (10) is to describe the flatness of the function, and it is also called "structural risk". The second term of the equation, without , is to describe the fitness between the function and actual sample target values, and is also called "empirical risk". The constant is a compromise between the two terms. In traditional linear regression, function loss is counted as long as the function value is not equal to the sample target value. This rule is too strict, and will result in overfitting. Since the determination of the sample target value could be disturbed by some subjective or objective factors, the sample target value contains a certain level of noise. To overcome this disadvantage, SVR counts function loss only when the difference between the function value and sample target value is larger than a given threshold . The condition = 0 means only considering flatness, and SVR goes back to linear regression, while approaching infinity means that every sample target value is within the deviation of the corresponding function value. To describe the real deviation, two slack variables are introduced, and the primal objective function can be deduced from Equation (10) subject to To efficiently solve the above optimization problem with inequality constraints, the dual problem is obtained by using the Lagrange function method, given by subject to ( − * ) = 0 0 ≤ , * ≤ Solve the above quadratic programming problem to get , * , and the solution of the SVR is given by where b is the mean of all possible b values for the support vectors when − + * ≠ 0, using Karush-Kuhn-Tucker (KKT) conditions [26][27][28]. The first term of Equation (10) is to describe the flatness of the function, and it is also called "structural risk". The second term of the equation, without C, is to describe the fitness between the function and actual sample target values, and is also called "empirical risk". The constant C is a compromise between the two terms. In traditional linear regression, function loss is counted as long as the function value is not equal to the sample target value. This rule is too strict, and will result in overfitting. Since the determination of the sample target value could be disturbed by some subjective or objective factors, the sample target value contains a certain level of noise. To overcome this disadvantage, SVR counts function loss only when the difference between the function value and sample target value is larger than a given threshold . The condition C = 0 means only considering flatness, and SVR goes back to linear regression, while C approaching infinity means that every sample target value is within the deviation of the corresponding function value. To describe the real deviation, two slack variables are introduced, and the primal objective function can be deduced from Equation (10) as . . , m To efficiently solve the above optimization problem with inequality constraints, the dual problem is obtained by using the Lagrange function method, given by i ≤ C Solve the above quadratic programming problem to get α, α * , and the solution of the SVR is given by Appl. Sci. 2020, 10, 3086 where b is the mean of all possible b values for the support vectors when −α i + α * i 0, using Karush-Kuhn-Tucker (KKT) conditions [26][27][28] Here, S is the subscript set such that α j is greater than 0, given by S = j α j 0, j = 1, 2, . . . , m . For real-world problems, it is impossible to find such a hyperplane that satisfies both good flatness and fitness simultaneously. One possible approach is to find a hypersurface f (x) = ω T φ(x) + b, which preprocesses sample x into a feature space by a mapping φ(x). With the help of the kernel function method, the solution of nonlinear SVR is given by where K(x, x i ) is the kernel function, and b is similar to Equation (15), although where γ is a parameter representing the width of Gaussian kernel, are widely used as kernel functions for most SVRs. To choose the best SVR model, parameter C for linear kernel and parameters (C, γ) for a Gaussian kernel need to be selected. A simple grid search, with grids from 2 −9 to 2 9 for C and 2 −8 to 2 2 for γ, both in interval 2 1 , is used to select model parameters. To reduce overfitting, there are several strategies for model selection, such as hold-out, bootstrap, and n-fold cross-validation [27]. For a dataset with small size samples, the simple and powerful strategy is n-fold cross validation, which divides the dataset into n mutually exclusive and complementary subsets, and each time uses n -1 subsets as training sets and the remaining subset as the testing set. The best parameters are selected by choosing the model that gives the minimum average mean squared error (MSE) for the all n subsets. Besides the MSE and correlation coefficient, accuracy percentage is proposed to evaluate the performance of regression model. Accuracy percentage (P a) is defined as the number of correctly predicted data, when the rounded value of predicted MSI equals to actual MSI, divided by the number of testing data: where round is a function that rounds the element to the nearest integer, and l is the total number of testing data. Since the accuracy percentage is based on the condition that the prediction value is within MSI ± 0.5, the deviation is set to 0.5 for the SVR. The procedure of SVR for establishing the relationship between MSI and ground motion parameters are summarized as follows: (1) choosing some ground motion parameters as features; (2) making logarithmic transformation on ground motion parameters, except for the destructive index; (3) scaling the chosen features linearly to the range of [−1, +1]; (4) selecting optimal regularization constant C and kernel parameter γ for the regression model, using 10-fold cross-validation on the training dataset; and (4) assessing the performance of regression on the testing dataset. Support vector machine toolbox LIBSVM, developed by Chang and Lin [28], was used to perform the training and testing.

Results and Discussion
The observations in earlier earthquakes, such as the Ninger, Wenchuan, Panzhihua, and Lushan earthquakes, are used as a training set, and the trained model is tested on the observations in later earthquakes, such as the Ludian, Jinggu, Kangding01, Kanagding02, and Jiuzhaigou earthquakes. The training set contains 98 observations (78.4%), and the testing set has 27 observations (21.6%). The numbers of MSI VI, VII, VIII, and IX for the training and testing sets are shown in Figure 5. After optimal model parameters were obtained in cross-validation, the final regression model was trained for the whole observations of occurred earthquakes, and will be used to predict the MSIs of future earthquakes.

Feature Selection
As mentioned in Section 4, some ground motion parameters have no linear correlation with MSI, it is important to focus on the most relevant features and eliminate the irrelevant ones. The inclusion of irrelevant features in the SVR gives bad prediction results, due to the overfitting problem in the irrelevant information. Each ground motion parameter should be checked for relevancy, and each time one ground motion parameter should be selected as the sole feature. The performance of Gaussian kernel SVR is shown in Figure 6, where the optimal model parameters are = 256 and = 1/16, using 10-fold cross-validation on the 98-observation training dataset. It can be seen that parameters ASI, A0.3, Ic, PGA, DI, HI, PGV, VSI, and AI all have MSEs smaller than 0.5 and accuracy percentages more than 50%. PGA gives the highest accuracy percentage, followed by A0.3 and HI, and , CAV, , , , , % , CF, and / have MSEs greater than 0.5 and accuracy percentages less than 50%. In this regard, the first nine ground motion parameters are considered as relevant features, and the latter seven are irrelevant features. Here, only the first nine ground motion parameters are used for further regression study. There are ∑ C = 511 possible combinations of those nine features, and the best performances for SVRs having up to nine features are shown in Figure 7. It can be seen that seven features, including PGA, A0.3, ASI, HI, PGV, VSI, and Ic, give the highest accuracy percentages, followed by six features, then by one feature (PGA). It is noted that one feature, PGA, gives almost the same level accuracy as a combination of seven features, meaning PGA is the most

Feature Selection
As mentioned in Section 4, some ground motion parameters have no linear correlation with MSI, it is important to focus on the most relevant features and eliminate the irrelevant ones. The inclusion of irrelevant features in the SVR gives bad prediction results, due to the overfitting problem in the irrelevant information. Each ground motion parameter should be checked for relevancy, and each time one ground motion parameter should be selected as the sole feature. The performance of Gaussian kernel SVR is shown in Figure 6, where the optimal model parameters are C = 256 and γ = 1/16, using 10-fold cross-validation on the 98-observation training dataset. It can be seen that parameters ASI, A 0.3 , Ic, PGA, DI, HI, PGV, VSI, and AI all have MSEs smaller than 0.5 and accuracy percentages more than 50%. PGA gives the highest accuracy percentage, followed by A 0.3 and HI, and a RMS , CAV, Td b,5 , Td s , Td b,5% , CF, and v max /a max have MSEs greater than 0.5 and accuracy percentages less than 50%.

Feature Selection
As mentioned in Section 4, some ground motion parameters have no linear correlation with MSI, it is important to focus on the most relevant features and eliminate the irrelevant ones. The inclusion of irrelevant features in the SVR gives bad prediction results, due to the overfitting problem in the irrelevant information. Each ground motion parameter should be checked for relevancy, and each time one ground motion parameter should be selected as the sole feature. The performance of Gaussian kernel SVR is shown in Figure 6, where the optimal model parameters are = 256 and = 1/16, using 10-fold cross-validation on the 98-observation training dataset. It can be seen that parameters ASI, A0.3, Ic, PGA, DI, HI, PGV, VSI, and AI all have MSEs smaller than 0.5 and accuracy percentages more than 50%. PGA gives the highest accuracy percentage, followed by A0.3 and HI, and , CAV, , , , , % , CF, and / have MSEs greater than 0.5 and accuracy percentages less than 50%. In this regard, the first nine ground motion parameters are considered as relevant features, and the latter seven are irrelevant features. Here, only the first nine ground motion parameters are used for further regression study. There are ∑ C = 511 possible combinations of those nine features, and the best performances for SVRs having up to nine features are shown in Figure 7. It can be seen that seven features, including PGA, A0.3, ASI, HI, PGV, VSI, and Ic, give the highest accuracy percentages, followed by six features, then by one feature (PGA). It is noted that one feature, PGA, gives almost the same level accuracy as a combination of seven features, meaning PGA is the most In this regard, the first nine ground motion parameters are considered as relevant features, and the latter seven are irrelevant features. Here, only the first nine ground motion parameters are used for further regression study. There are 9 i=1 C i 9 = 511 possible combinations of those nine features, and the best performances for SVRs having up to nine features are shown in Figure 7. It can be seen that seven features, including PGA, A 0.3 , ASI, HI, PGV, VSI, and Ic, give the highest accuracy percentages, followed by six features, then by one feature (PGA). It is noted that one feature, PGA, gives almost the same level accuracy as a combination of seven features, meaning PGA is the most fundamental feature for all cases. One reason for this is that the other six ground motion parameters have relative high cross-correlation coefficients with PGA, and are partially linearly dependent on one another. When the number of features becomes larger than seven, the accuracy percentage drops below 50%, and this means more features do not necessarily give better prediction. With the development of strong ground motion observation network, many more stations will be constructed. In the future, a large number of stations will be triggered in an earthquake event. Calculating PGA, A 0.3 , ASI, HI, PGV, VSI, and Ic from ground motion requires much more time than just calculating PGA, and the time for calculating them for many stations will be even more. Since every millisecond is important in rapid seismic intensity reporting, the SVR of PGA will be more effective than the SVR of this seven ground motion parameters.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 17 fundamental feature for all cases. One reason for this is that the other six ground motion parameters have relative high cross-correlation coefficients with PGA, and are partially linearly dependent on one another. When the number of features becomes larger than seven, the accuracy percentage drops below 50%, and this means more features do not necessarily give better prediction. With the development of strong ground motion observation network, many more stations will be constructed.
In the future, a large number of stations will be triggered in an earthquake event. Calculating PGA, A0.3, ASI, HI, PGV, VSI, and Ic from ground motion requires much more time than just calculating PGA, and the time for calculating them for many stations will be even more. Since every millisecond is important in rapid seismic intensity reporting, the SVR of PGA will be more effective than the SVR of this seven ground motion parameters.

Gaussioan Kernel Versus Linear Kernel and Linear Regression Method
To demonstrate the advantage of Gaussian kernel SVR, the prediction performances of SVR with linear kernel and linear regression are also calculated. For brevity, only the results of best models with one, two, and seven features are shown here. The linear regression using least square on the training dataset for the three cases are as follows: The predicted MSIs versus actual ones on the testing dataset are shown in Figure 8a-c for one, two, and seven features, respectively. To show the scatter for the third method more clearly, the result of the linear regression is off to the left side of a Gaussian kernel, with a linear kernel SVR to the right side. It can be seen from Figure 8a that most of predicted values are within a ±0.5 range of the actual values for the Gaussian kernel. The predicted points of MSI VI are well concentrated in a smaller range, and even MSI IX is well predicted. On the other hand, more points are out of the ±0.5 range for the linear SVR and linear regression. This means that Gaussian kernel SVR has better prediction performance than the other two methods. Figure 8b for two features and Figure 8c for seven features show similar results. Since Gaussian kernel SVR gives the lowest MSE and highest accuracy percentage, it is the best of the three regression models. Comparing Figure 8a with Figure 8c, it can also be seen that the performance of predicted MSIs using PGA is almost the same as the one using seven features.

Gaussioan Kernel Versus Linear Kernel and Linear Regression Method
To demonstrate the advantage of Gaussian kernel SVR, the prediction performances of SVR with linear kernel and linear regression are also calculated. For brevity, only the results of best models with one, two, and seven features are shown here. The linear regression using least square on the training dataset for the three cases are as follows: The predicted MSIs versus actual ones on the testing dataset are shown in Figure 8a-c for one, two, and seven features, respectively. To show the scatter for the third method more clearly, the result of the linear regression is off to the left side of a Gaussian kernel, with a linear kernel SVR to the right side. It can be seen from Figure 8a that most of predicted values are within a ±0.5 range of the actual values for the Gaussian kernel. The predicted points of MSI VI are well concentrated in a smaller range, and even MSI IX is well predicted. On the other hand, more points are out of the ±0.5 range for the linear SVR and linear regression. This means that Gaussian kernel SVR has better prediction performance than the other two methods. Figure 8b for two features and Figure 8c for seven features show similar results. Since Gaussian kernel SVR gives the lowest MSE and highest accuracy percentage, it is the best of the three regression models. Comparing Figure 8a with Figure 8c, it can also be seen that the performance of predicted MSIs using PGA is almost the same as the one using seven features. Present empirical relationships for the MMI or MSI are mainly based on PGA and PGV [1][2][3][4][5][7][8][9][10], and the Gaussian SVR versus linear regression of PGA and PGV are also studied. As the dataset is not the same as those used in previous studies [1][2][3][4][5]7], the linear regression equation should be obtained on this training set again. The model of the PGA is given by Equation (18), and the model of the PGV is given by The performance results are shown in Table 2. The linear regression model of PGV is better than that of PGA, and the accuracy percentage increases from 44.3% to 66.7%. The MSE decreases from 0.374 in the linear regression of PGV to 0.214 in the Gaussian kernel SVR of PGA, and the predicted accuracy increases from 66.7% to 74.1%. It is noted that the Gaussian kernel SVR of PGV is not better than that of PGA, which can also be seen from Figure 6. The well-accepted conception that PGV is better than PGA for estimating MSI assumes linear regression. The Gaussian SVR of PGA and PGV has almost the same MSE and correlation coefficient as that of PGA, but the accuracy percentage decrease from 74.1% to 68.6%. From the comparison, it was found that the Gaussian kernel SVR of PGA gave the best regression model for predicting MSI, and was much better than the linear regression of PGV or PGA.  Present empirical relationships for the MMI or MSI are mainly based on PGA and PGV [1][2][3][4][5][7][8][9][10], and the Gaussian SVR versus linear regression of PGA and PGV are also studied. As the dataset is not the same as those used in previous studies [1][2][3][4][5]7], the linear regression equation should be obtained on this training set again. The model of the PGA is given by Equation (18), and the model of the PGV is given by MSI = 1.442 log(PGV) + 5.299 The performance results are shown in Table 2. The linear regression model of PGV is better than that of PGA, and the accuracy percentage increases from 44.3% to 66.7%. The MSE decreases from 0.374 in the linear regression of PGV to 0.214 in the Gaussian kernel SVR of PGA, and the predicted accuracy increases from 66.7% to 74.1%. It is noted that the Gaussian kernel SVR of PGV is not better than that of PGA, which can also be seen from Figure 6. The well-accepted conception that PGV is better than PGA for estimating MSI assumes linear regression. The Gaussian SVR of PGA and PGV has almost the same MSE and correlation coefficient as that of PGA, but the accuracy percentage decrease from 74.1% to 68.6%. From the comparison, it was found that the Gaussian kernel SVR of PGA gave the best regression model for predicting MSI, and was much better than the linear regression of PGV or PGA.

Gaussian Kernel SVR of PGA Versus Models from Previous Studies
The final Gaussian kernel SVR of PGA is obtained by training the whole available dataset, and it can be used for predicting MSIs in future earthquakes. The final SVR model was compared with regression models from previous studies to check regression performance. Models from three previous studies [3,5,7] are compared here. These three models have regression equations based on both PGA and PGV. The results are shown in Figure 9a,b, and the performance measures are shown in Table 3. It is clear from Figure 9 that the performance of the SVR model is much better than that of the other three models, especially at MSI VI and VII. The predicted points of MSI VI and VII are well concentrated in a much smaller range in the SVR model, while the points in the other three models have much larger scatter. It is interesting that all models have relatively good behavior at MSI IX. From Table 3, though the correlation coefficients of the four models are at the same level, the accuracy percentage of the SVR model is much higher than the other three. The reason for this is because these three models have too much prediction dispersion at MSI VI and VII. It should be noted that one study [3] was based on California data, and another [5] on global data. Regional variation and differences in datasets result in bad performance for the Sichuan-Yunnan earthquakes. As the regression equation was obtained for the same dataset as that in SVR model, the MSEs of the linear regression in Table 2 are much smaller than those in Table 3. Thus, for a specific region, one should be very careful using the regression model of another region. The accuracy percentage of this study is also better than the third other model [7], and there are two reasons. One is that this study contained datasets from other areas of western China besides the Sichuan-Yunnan region, and did not contain Jiuzhaigou earthquake records. The other is that the filtering process was different from this paper, which leads to the condition that the PGA and PGV are not exactly the same for the two datasets. It is suggested that to have comparable results of different regression methods, not only should the earthquake records be the same, but also the ground motion parameters after the filtering process, as much as possible.

Gaussian Kernel SVR of PGA Versus Models from Previous Studies
The final Gaussian kernel SVR of PGA is obtained by training the whole available dataset, and it can be used for predicting MSIs in future earthquakes. The final SVR model was compared with regression models from previous studies to check regression performance. Models from three previous studies [3,5,7] are compared here. These three models have regression equations based on both PGA and PGV. The results are shown in Figure 9a,b, and the performance measures are shown in Table 3. It is clear from Figure 9 that the performance of the SVR model is much better than that of the other three models, especially at MSI VI and VII. The predicted points of MSI VI and VII are well concentrated in a much smaller range in the SVR model, while the points in the other three models have much larger scatter. It is interesting that all models have relatively good behavior at MSI IX. From Table 3, though the correlation coefficients of the four models are at the same level, the accuracy percentage of the SVR model is much higher than the other three. The reason for this is because these three models have too much prediction dispersion at MSI VI and VII. It should be noted that one study [3] was based on California data, and another [5] on global data. Regional variation and differences in datasets result in bad performance for the Sichuan-Yunnan earthquakes. As the regression equation was obtained for the same dataset as that in SVR model, the MSEs of the linear regression in Table 2 are much smaller than those in Table 3. Thus, for a specific region, one should be very careful using the regression model of another region. The accuracy percentage of this study is also better than the third other model [7], and there are two reasons. One is that this study contained datasets from other areas of western China besides the Sichuan-Yunnan region, and did not contain Jiuzhaigou earthquake records. The other is that the filtering process was different from this paper, which leads to the condition that the PGA and PGV are not exactly the same for the two datasets. It is suggested that to have comparable results of different regression methods, not only should the earthquake records be the same, but also the ground motion parameters after the filtering process, as much as possible.

Disscussion of Earthquake Magnitude and Epicentral Distance
Since the MSI at a location is related to the earthquake magnitude and epicentral distance, the SVR model with and without these two parameters are also discussed. As shown in Figure 10, the performance of the SVR of PGA is almost the same as those SVRs considering earthquake magnitude and epicentral distance. This means that it is enough to use ground motion parameters for predicting MSI, and it is not necessary to include magnitude and distance terms in the SVR model. Since the MSI at a location is related to the earthquake magnitude and epicentral distance, the SVR model with and without these two parameters are also discussed. As shown in Figure 10, the performance of the SVR of PGA is almost the same as those SVRs considering earthquake magnitude and epicentral distance. This means that it is enough to use ground motion parameters for predicting MSI, and it is not necessary to include magnitude and distance terms in the SVR model.

Conclusions
In this study, SVR was used to model the relationship between discrete MSI and continuous ground motion parameters. MSI is treated as sample target, and the 16 ground motion parameters are considered as feature candidates. In the Sichuan-Yunnan region, 125 sets of ground motion records with corresponding investigated MSIs were used as a complete dataset for analysis. Based on the limited dataset, the main conclusions are as follows: (1) During the single-feature scanning test, PGA, JMA equivalent acceleration, acceleration spectrum intensity, Housner intensity, PGV, velocity spectrum intensity, Arias intensity, characteristic intensity, and damage index are the most relevant features. Unlike the linear regression method, PGA is better than PGV for predicting MSI in an SVR model.
(2) The best model parameters for Gaussian kernel SVRs with one all the way up to nine features are provided. The SVR of PGA gives almost the same performance as that of SVR with nine features. According to the performance measures of MSE, the correlation coefficient, and accuracy percentage, the Gaussian kernel SVRs perform much better than the liner kernels and linear regressions.
(3) Gaussian kernel SVRs perform much better than previous models [3,5,7], especially with regard to the accuracy percentage. The comparison results also suggest that regression should better be done with a regional dataset.
(4) Gaussian kernel SVRs with or without earthquake magnitude and epicentral distance give similar prediction performance.
Since MSI and ground motion parameters have strong regional dependence, and the number of datasets for establishing the relationship in the studied area is limited, the conclusions may not be true anymore when another dataset is used for regression. However, a Gaussian kernel SVR of PGA is a good initial start for the regression.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1. Supplement Table S1.

Conclusions
In this study, SVR was used to model the relationship between discrete MSI and continuous ground motion parameters. MSI is treated as sample target, and the 16 ground motion parameters are considered as feature candidates. In the Sichuan-Yunnan region, 125 sets of ground motion records with corresponding investigated MSIs were used as a complete dataset for analysis. Based on the limited dataset, the main conclusions are as follows: (1) During the single-feature scanning test, PGA, JMA equivalent acceleration, acceleration spectrum intensity, Housner intensity, PGV, velocity spectrum intensity, Arias intensity, characteristic intensity, and damage index are the most relevant features. Unlike the linear regression method, PGA is better than PGV for predicting MSI in an SVR model.
(2) The best model parameters for Gaussian kernel SVRs with one all the way up to nine features are provided. The SVR of PGA gives almost the same performance as that of SVR with nine features. According to the performance measures of MSE, the correlation coefficient, and accuracy percentage, the Gaussian kernel SVRs perform much better than the liner kernels and linear regressions.
(3) Gaussian kernel SVRs perform much better than previous models [3,5,7], especially with regard to the accuracy percentage. The comparison results also suggest that regression should better be done with a regional dataset.
(4) Gaussian kernel SVRs with or without earthquake magnitude and epicentral distance give similar prediction performance.
Since MSI and ground motion parameters have strong regional dependence, and the number of datasets for establishing the relationship in the studied area is limited, the conclusions may not be true anymore when another dataset is used for regression. However, a Gaussian kernel SVR of PGA is a good initial start for the regression.