An Earthquake Fatalities Assessment Method Based on Feature Importance with Deep Learning and Random Forest Models

: This study aims to analyze and compare the importance of feature a ﬀ ecting earthquake fatalities in China mainland and establish a deep learning model to assess the potential fatalities based on the selected factors. The random forest (RF) model, classiﬁcation and regression tree (CART) model, and AdaBoost model were used to assess the importance of nine features and the analysis showed that the RF model was better than the other models. Furthermore, we compared the contributions of 43 di ﬀ erent structure types to casualties based on the RF model. Finally, we proposed a model for estimating earthquake fatalities based on the seismic data from 1992 to 2017 in China mainland. These results indicate that the deep learning model produced in this study has good performance for predicting seismic fatalities. The method could be helpful to reduce casualties during emergencies and future building construction.


Introduction
Earthquakes impose a large number of threats to the Chinese (Table 1). If there is a proper rapid estimation of the number of casualties in an earthquake, the impact and losses of the disaster could be decreased [1]. The human and material resources of emergency management can be allocated by predicting the death toll [2]. We use the surface-wave magnitude (Ms) in the study. According to the current emergency response regulations of relevant Chinese departments, the following categories of emergency personnel and materials are obtained: (1) When the magnitude is less than 6 and the predicted number of deaths is 0-10. The government will need 10-50 emergency personnel and 200-300 tents; (2) When the magnitude is greater than or equal to 6 and less than 6.5, and the predicted number of deaths is 0-10. The number of emergency personnel is 50-100, and the number of tents is 1000-3000.
(3) When the magnitude is greater than or equal to 6.5 and less than 7, and the predicted death toll is 0-10, 200-500 emergency personnel and 3000-5000 tents will be needed. If the predicted number more than 10, 500-1000 emergency personnel and 5000-10000 tents will be needed. (4) When the magnitude is more than 7 and the death toll is less than 10, 500-1000 emergency personnel and 5000-10000 tents will be required. When the death toll is between 10-100, 1000-5000 emergency personnel and 10000-20000 tents will be required. When the death toll is 100-1000, 5000-10000 emergency personnel and more than 20,000 tents will be needed and (5) when the number of deaths is greater than 1000, it is necessary to draw the necessary emergency personnel and material distribution according to the specific economic and political conditions in the local area. However, there are many factors which may affect fatalities, and not every factor has a decisive impact on earthquake casualties. Therefore, it is also necessary to select a suitable method to evaluate the importance of each factor.
Linear models are the most constantly used methods for assessing feature correlation [3]. In reference [4], the research has given the relationship between human losses and factors such as population density and the intensity and magnitude of the earthquakes based on the linear models. Nevertheless, due to the uncertainties and fuzziness in the data of the factors [5], integrated ensemble models were proposed and applied to the feature importance assessment models [6][7][8][9] for the purpose of improving accuracy and generalization ability of the traditional linear models [10]. In the present studies, the excellent performance of ensemble algorithms on prediction ability and generalization capacity has been proven better than the linear models [3]. However, so far, no research has been conducted to evaluate the importance of influencing factors and different structure types on earthquake casualties using machine learning methods. Previous studies of earthquake casualties based on experience directly gave influencing factors [2,11] and structural types [12] or based on the statistical methods gave [13].
Different methods were developed to estimate the casualties in earthquakes. Most studies used empirical analysis methods [12,14,15] and some software systems to assess casualties. For instance, geographic information system (GIS) [11,16], the U.S. Geological Survey's Prompt Assessment of Global Earthquakes for Response (PAGER) system [17] and the Disaster Management Tool (DMT) software [18]. In reference [18], the authors present the casualty estimation model, which is part of the DMT software. The model is based on the evaluation of laserscanning data that collected by the airborne sensors and it also can be used to detect collapsed buildings, to assess their damage type, and to compute the number made recourse to the EERI World Housing Encyclopedia (WHE) project (including the nonengineered building) [19], can estimate the fatality for large earthquakes in the two hours [17]. However, these systems cannot assess losses in a few minutes. Empirical methods usually established linear models that were evaluated by fitting one or more functions [20]. These models have many disadvantages: the workloads tend to be large and the amount of data small; the abnormal points were usually deleted instead of calculating fit together within the models; and they have strong subjectivity. These shortcomings can be compensated by neural networks in the field of machine learning [21].
With the rise of machine learning algorithms, some studies of estimating fatalities based on back propagation neural network (BPNN) method have begun to emerge [21]. Because of different earthquakes of intensity, population density, and different structure types, it is extremely perplexing to define a certainty relevance to evaluate fatalities caused by an earthquake. Hence, deep learning method, with its abilities to estimate perplexed relevances, could be an outstanding method to evaluate fatalities. However, BPNN method is not a very perfect network, it has many shortcomings: (1) The convergence speed is too slow and it takes hundreds or more than hundreds of times to learn to converge [22]; (2) it cannot guarantee convergence to a global minimum point [23,24]; (3) there are a number of hidden layers and neurons in that are not theoretically guided, but are determined empirically, thus, the network tends to be large [22]-the redundancy invisibly increases time of network learning [25]; and (4) learning and memory of the network are unstable. Deep learning optimization algorithms can improve the shortages of BPNN method.
Therefore, we assessed the importance of the factors based on three machine learning methods and selected the random forest algorithm as the optimal classifier. At the same time, we evaluated the contribution degree of 43 different structure types based on the random forest algorithm. Finally, the deep learning assessment model was established with the factors of population density, magnitude, focal depth, epicentral intensity, and time. Figure 1 shows the flowchart of the entire assessment process.

Data
An important question, in human losses expected studies, is whether or not a conditioning variable is actually useful and needed for the assessment and prediction. There are many factors affecting the casualties of earthquakes, such as the intensity of earthquakes, the vulnerability of houses and the economic development in earthquake areas. However, some factors do not have data for every earthquake. We chose the following ten features: date, time, magnitude, epicentral intensity, abnormal intensity, focal depth, secondary disasters, population density, economic situation, and damage ratio of different structure types.

1.
The regularity of people's working and resting time makes the differences of earthquake occurrence time have a great impact on the fatalities [4,13]. Time was divided into two parts of this study: daytime (7:00-21:00) and sleeping time (21:00-next7:00).

2.
Magnitude is a parameter to measure the energy release by the seismic wave, generally, the greater the magnitude, the greater the disaster caused. The magnitude is expressed by the surface wave magnitude Ms commonly used in China.

3.
Considering the seismic source as a point, the vertical distance from the point to the ground is called the focal depth. Often, the smaller the focal depth is, the closer to the epicenter and the greater the damage caused.

4.
In general, the higher the intensity, the greater the casualties [4]. This study used the epicenter intensity listed in the Earthquake Disasters and Losses Assessment Report in Chinese Mainland.

5.
Population density has a great impact on the number of earthquake deaths [4]. There are obvious differences between densely populated and more sparsely populated areas. For example, the population density in Tibet province is low and there are even existing places where no people, hence, the casualties in the earthquake are bound to be small. Conversely, high population densities contribute to an increase in the number of deaths [2]. The location of the epicenter can be obtained from the China Earthquake Networks Center after an earthquake, and the population density can be obtained from the data published by the Statistical Bureau. 6.
Generally, seismic intensity decreases with increasing distance from the epicenter. But high intensity points appear in low intensity areas, or, conversely, for reasons such as geological structure, topography, and the superposition of deep seismic reflection waves, which is called seismic intensity abnormally. The abnormal intensity in this paper refers to the occurrence of high intensity points in low intensity areas, which often aggravates disaster losses. Abnormal intensity was expressed in two cases: yes and no. 7.
The economic situation has a great impact on disaster losses [4]. Usually, the better the economic situation, the lighter the disaster will be in the same earthquake intensity, and the higher the population density will be in the concentrated areas of social wealth. According to the situation mentioned in the Earthquake Disasters and Losses Assessment Report in Chinese Mainland, the economic situation was divided into seven categories, from top to bottom, the economic situation is getting better: (1) national poverty region; (2) special poverty area, deep poverty area, remote and poor mountainous area, remote and poor area, border poverty areas, provincial poverty region, remote area; (3) minority poverty area, general poverty area; (4) financial deficit area; (5) economically backward area, minority area; (6) general area; (7) western medium-developed areas. 8.
Secondary disasters will cause secondary damage to the disaster area. The impacts cannot be underestimated: they mostly manifest as mountain collapse, landslide and debris flow, and very few are the result of fires. They can only be divided into two cases in this study due to the small number of generations: yes and no. Most of the fatalities are caused by building damage [2] and this factor is vital to the number of deaths [13]. Therefore, this paper chooses the damage ratio of houses as the feature. Damage ratios consist of collapsed structures, heavy damage, moderate damage, and slight damage. 10. Different earthquake occurrence dates can sometimes lead to aggravation of earthquake damage, for instance, rain and snowy weather will affect rescue efforts. Dates were processed quarterly, and a year is divided into four quarters in this study.

Importance Assessments of 9 Features
The selected supervised classifiers are the random forests (RF), adaptive boosting (AdaBoost) and classification and regression tree (CART). The CART algorithm, a decision tree model and a non-parametric data mining method, has many advantages including ease of handling numerical and categorical data and multiple outputs situations [25]. The CART algorithm is a component learner with gini features, such as division standard, while ensemble learning combines multiple weak classifiers using different methods. The most common methods of ensemble learning are bootstrap aggregating (bagging) method and boosting method, in which bagging is a parallel algorithm and boosting is a sequential algorithm. The RF algorithm, the expansion of bagging method, exploits random binary trees to discriminate and classify data [7]. The AdaBoost approach, a boosting algorithm, constructs a strong classifier with weak classifiers and updates the weight of samples based on learning error, as shown in Figure 2. Each sample in the training data is given equal weight α at first. A weak classifier is trained on the training data and the error rate ε of the classifier is calculated. Then, the classifier is trained again on the unified data. The ε will be increased while αwill be reduced to the first classification on the second training classifier. AdaBoost calculates ε of each weak classifier and assigns α to each classifier. In Figure 2, the first row is the data set, where the different width of histograms represents different weight on each sample. The data set will be weighted by α in the third row after passing through the classifier [26]. The final output is obtained by summing the weighted results. The the error rate ε (Equation (1)) and the weight α (Equation (2)) are calculated as follows: where N1, N2 are the number of incorrectly classified and classified samples, respectively.
Sustainability 2019, 11, x FOR PEER REVIEW 5 of 21 9. Most of the fatalities are caused by building damage [2] and this factor is vital to the number of deaths [13]. Therefore, this paper chooses the damage ratio of houses as the feature. Damage ratios consist of collapsed structures, heavy damage, moderate damage, and slight damage. 10. Different earthquake occurrence dates can sometimes lead to aggravation of earthquake damage, for instance, rain and snowy weather will affect rescue efforts. Dates were processed quarterly, and a year is divided into four quarters in this study.

Importance Assessments of 9 Features
The selected supervised classifiers are the random forests (RF), adaptive boosting (AdaBoost) and classification and regression tree (CART). The CART algorithm, a decision tree model and a nonparametric data mining method, has many advantages including ease of handling numerical and categorical data and multiple outputs situations [25]. The CART algorithm is a component learner with gini features, such as division standard, while ensemble learning combines multiple weak classifiers using different methods. The most common methods of ensemble learning are bootstrap aggregating (bagging) method and boosting method, in which bagging is a parallel algorithm and boosting is a sequential algorithm. The RF algorithm, the expansion of bagging method, exploits random binary trees to discriminate and classify data [7]. The AdaBoost approach, a boosting algorithm, constructs a strong classifier with weak classifiers and updates the weight of samples based on learning error, as shown in Figure 2. Each sample in the training data is given equal weight α at first. A weak classifier is trained on the training data and the error rate ε of the classifier is calculated. Then, the classifier is trained again on the unified data. The ε will be increased while αwill be reduced to the first classification on the second training classifier. AdaBoost calculates ε of each weak classifier and assigns α to each classifier. In Figure 2, the first row is the data set, where the different width of histograms represents different weight on each sample. The data set will be weighted by α in the third row after passing through the classifier [26]. The final output is obtained by summing the weighted results.The the error rate ε (Equation (1)) and the weight α (Equation (2)) are calculated as follows: where N1, N2 are the number of incorrectly classified and classified samples, respectively. We worked with the CART, RF, and AdaBoost models implemented in the jupyter of the Anaconda Navigator. Data was generally required to be standardized in machine learning. We used We worked with the CART, RF, and AdaBoost models implemented in the jupyter of the Anaconda Navigator. Data was generally required to be standardized in machine learning. We used the StandardScaler preprocessing method of the sklearn function library to process the magnitude, focal depth, epicentral intensity, and population density. Equation (3) presents the process: where x is the features, µ is the mean of the data and δ is the variance of the data. The parameters selection of each model is shown in Table 2. Table 3 presents the result of verifying the model with cross validation score function in sklearn function library. And the importance of nine features in casualty assessment is shown in Figure 3. Table 2. Parameters of random forest (RF), CART, and AdaBoost (Unset parameters were used as default parameters in sklearn, AdaBoost has two classes of classification algorithms SAMME and SAMME.R, among which SAMME.R is better based on class probability).  the StandardScaler preprocessing method of the sklearn function library to process the magnitude, focal depth, epicentral intensity, and population density. Equation (3) presents the process:

Random Forest
where x is the features, μ is the mean of the data and δ is the variance of the data. The parameters selection of each model is shown in Table 2. Table 3 presents the result of verifying the model with cross validation score function in sklearn function library. And the importance of nine features in casualty assessment is shown in Figure 3. Table 2. Parameters of random forest (RF), CART, and AdaBoost (Unset parameters were used as default parameters in sklearn, AdaBoost has two classes of classification algorithms SAMME and SAMME.R, among which SAMME.R is better based on class probability).

Random Forest CART
AdaBoost number of estimators= 82 number of jobs = −1 max features = none criterion = gini max depth = 10 max features=none number of estimators=500 criterion = gini base estimator=decision trees classifier algorithm=SAMME.R learning rate = 0.5 number of estimators = 379 Table 3. Results of testing the models on unseen validation set of 9 features.

Algorithm
Random Forest CART AdaBoost Mean accuracy 0.820 0.745 0.766

Importance Assessments of Structure Types
The importance of different structure types was assessed alone because of the complexity of the structure types and the great contribution to the death [27]. For the accuracy and comprehensiveness of the assessment, this study listed 43 universal and special structure types from the Earthquake Disasters and Losses Assessment Report in Chinese Mainland from 1992 to 2007: reinforced concrete frame structure, masonry structure, brick-wood structure, civil structure, national brick-wood structure, brick-concrete structure, bucket-piercing frame structure, brick-concrete structure (building of two or more floors), shed, brick-concrete structure(building of only one floor), brick-column civil structure, wing-room, national civil structure, simple house with dry fortified earth wall, brick-column adobe structure, dry brick building, brick structure, timber structure, timber stack structure, timber framework, brick-masonry structure, frame structure, adobe structure, wood-column adobe structure, reinforced frame structure, mixed house, brick adobe structure, general houses (owned by citizens), stone-wood structure, stone structure, simple house, old Tibetan house, stone-grass structure, stone-concrete structure, alunite house, earth rock house, industrial plant, steel frame structure, soil tamper structure, cave dwelling, wooden frame house, flag stone house, and general house.
Structural damages were divided into five grades: collapse, heavy damage, moderate damage, slight damage, and basically undamaged. We chose the collapse, heavy damage of buildings, and population density as input parameters. Firstly, death was almost being caused by the collapse and heavy damage of the architecture. Another, population density, was closely related to the seismic casualty and the number of buildings. We only selected random forest algorithm to assess the importance of structure types for the mean accuracy of the RF model was higher than the other algorithm as seen from Table 3. Table 4 presents the procedure of RF in feature assessment. The importance of structure types can be seen in Figure 4. Table 4. Steps of the random forest algorithm. Step

Model
We chose the population density, magnitude, focal depth, epicentral intensity, and time as the input parameters. The criteria and reasons for selecting five input parameters among ten features were as follows:

Model
We chose the population density, magnitude, focal depth, epicentral intensity, and time as the input parameters. The criteria and reasons for selecting five input parameters among ten features were as follows: (1) Features were selected from high to low in the order of features importance. According to their subjectivity, the number of occurrences and obtained time after the earthquake. (2) The results of the importance assessment show the relative values of the nine factors that contribute to the assessment of the casualty, rather than the absolute value of each of the factors alone. Time ranked seventh instead of higher in the importance assessment results because we only divided it into the two parts. In the deep learning model, we did not to classify the time. Therefore, we chose the time because it could be almost obtained at the same time as an earthquake and it had not subjectivity in the deep learning model. (3) The division of the economic status had a strong subjectivity and the data selected in the test set was the economic situation of the year of the occurrence of the earthquake. With the annual inflation and the depreciation of the coin, the situation of that year may not be applicable to the future. (4) Although the date was more important than the time, the subjectivity of the date was also great, which was only divided into four seasons. (5) The secondary disasters and abnormal intensity were only divided into yes and no. The combined number of the two phenomena was small and not every earthquake was accompanied by them. (6) The structure types were too complex. Structure destroyed during each earthquake were different and every destruction was divided into five parts.

Data
We collected 289 destructive earthquakes occurred in the Chinese mainland from 1992 to 2017 in the Earthquake Disasters and Losses Assessment Report in Chinese Mainland (Table A1). The excel table data were pre-processed by openpyxl module in deep learning method, and the data set of 228 earthquake cases were returned without a missing value. Among these data, we selected 180 data as the training set and 38 data as the test set. The remaining ten were used as validation sets. The time was recorded in minutes, and the hours are converted into minutes, which are calculated at 1440 min per day. When the time is x: y, the data will be processed as (60x+y)/144. For example, if the time is 04:32, the change is 0.19. Other parameters need not be specially processed.

Hyper Parameter
A deep learning model is a multilayer stack of simple modules and many of which compute non-linear input to output mappings. The hidden layers of 3 to 18 in the deep learning model can implement extremely complex functions of the original variables that are sensitive to minute details [23]. Given the small data set in this study, we chose a four-layers back propagation network. The number of neurons in the hidden layers was optimized for getting accurate output [21]. To select the number of two hidden layer's neurons, the training began with ten and three neurons separately and was repeated for more neurons. The number of neurons in the hidden layers were determined to be 40 and 5 separately for reducing the complexity and running time of the model.
There are different methods to avoid over fitting during the training process and to obtain models that are expert in generalizing the problem to be explained. For instance, stopping iteration in advance, regularization, dropout, and data set expansion. Stopping iteration in advance and data set expansion are not applicable to this model. The dropout method is a random deletion of half of the hidden layer nodes, hence, it is also not applicable. Finally, the regularization method was chosen. Regularization in deep learning includes coefficients L1 and L2. We chose the most commonly used L2 regularization. L2 regularization is to add an additional regularization term to the loss function, presented in Equation (4).
where c and c 0 are the new and old loss functions, separately, λ is the L2 regularization rate, n is the number of the sample and w is the weight. Through a large number of tests, the optimal parameters were obtained as shown in Table 5. Figure 5 demonstrates the model with two hidden layers, consisting of forty and five neurons respectively, five input layers, and one output layer. implement extremely complex functions of the original variables that are sensitive to minute details [23]. Given the small data set in this study, we chose a four-layers back propagation network. The number of neurons in the hidden layers was optimized for getting accurate output [21]. To select the number of two hidden layer's neurons, the training began with ten and three neurons separately and was repeated for more neurons. The number of neurons in the hidden layers were determined to be 40 and 5 separately for reducing the complexity and running time of the model. There are different methods to avoid over fitting during the training process and to obtain models that are expert in generalizing the problem to be explained. For instance, stopping iteration in advance, regularization, dropout, and data set expansion. Stopping iteration in advance and data set expansion are not applicable to this model. The dropout method is a random deletion of half of the hidden layer nodes, hence, it is also not applicable. Finally, the regularization method was chosen. Regularization in deep learning includes coefficients L1 and L2. We chose the most commonly used L2 regularization. L2 regularization is to add an additional regularization term to the loss function, presented in Equation (4).
where c and c0 are the new and old loss functions, separately, λ is the L2 regularization rate, n is the number of the sample and w is the weight. Through a large number of tests, the optimal parameters were obtained as shown in Table 5. Figure 5 demonstrates the model with two hidden layers, consisting of forty and five neurons respectively, five input layers, and one output layer.

Optimization Algorithm
The optimization algorithms used in the model including the adaptive moment estimation (Adam) [28], mini-batch gradient descent [29], and moving average model [30]. We used the adam algorithm to make the learning rates updating independent for parameters by calculating the first and second moment estimation of gradients. The mini-batch is to reduce the randomness for the data in a pool determine the direction of the gradient together and reduce the possibility of deviation during the descent process [31]. The model computer was a workstation with a good performance and the data volume of the model was less than 2000. The size of the training pool was usually chosen as the n-th power of 2. Hence, we chose 16 data as a training pool through experiments. The moving average model is to prevent the sudden change when updating parameters [32].

Process of the Model
Definition structure and forward propagation used tensorflow framework [30]. The parameters of deep learning network needed only define weights w and b, and the normal distribution was chosen as the weight generating function. The activation function was relu function [23] and was used in the first hidden layer. The second hidden layer and the output layer are basically equivalent to linear regression. We selected Adam algorithm as back propagation algorithm. The learning rate is optimized. The mean square error loss function was selected for the data set was little (Equation (5)).
where c presents loss function, n is the number of the samples, y is the predictive value and t is the true value.

Results
We chose ten seismic cases with different parameters as the test data, of which the eighth, ninth, and tenth cases were Ludian earthquake in 2014, Yushu earthquake in 2010 and Wenchuan earthquake in 2008. Selection of specific test sets is shown in Table 6. Comparing with the intensity-based mortality estimation method in Assessment of Earthquake Disaster Situation in Emergency Period [33] (China-National Standard), presented in Equation (6) and (7). The Standard is an important basis for the government's decision-making on earthquake relief. After an earthquake, the primary tasks of the emergency work are to conduct disaster investigation and assessment in a short period of time and to provide the government with timely and necessary information on the disaster situation. Therefore, relevant departments have formulated the Standard in a comprehensive summary of the research methods and field practices of earthquake disaster assessment in China. Focus on rapid and dynamic disaster recovery and assessment.
where N D is the number of deaths; I max is the intensity of the meizoseismal area; A j is the distribution area of intensity value j; ρ is the population density; R j is the death rate corresponding to the intensity value j; I j is the intensity value j.  Table 7 and Figure 6 show the comparison results of the deep learning model, China-National Standard and true value. Table 8 presents the accuracy of the two methods in cases from1 to 7 and cases from 8 to 10, separately. The results demonstrate as follows: (1) Adam algorithm and moving average model based on deep learning can accelerate the convergence speed and improve the accuracy of model prediction. (2) The accuracy was higher, and the factors considered were more than the Standard.
(3) The prediction accuracy of the last three earthquake cases is generally not high.
(4) The China-National Standard model selected fewer parameters. The model was based on the experience of many experts and had been tested for many years. In the cases 1-7, the predicted values and the true values were all on the order of magnitude, but the performance on the cases 8-10 was far worse than the deep learning model, which was far from the true value and has no reference for the actual earthquakes.

Discussion
Three issues were considered in this study: the importance of nine features to the human losses in China mainland, the importance of 43 structure types of the fatalities, and human losses assessment. The first issue was addressed by adopting three traditional machine learning method domains to investigate different features' influence on the seismic fatalities. The results can provide a basis for further seismic assessment. The second issue was addressed by RF algorithm to assess the importance of 43 structure types. The third issue was addressed by establishing a deep learning model domain to predict the seismic death. A detailed investigation of these two issues is presented as follows.

Discussion
Three issues were considered in this study: the importance of nine features to the human losses in China mainland, the importance of 43 structure types of the fatalities, and human losses assessment. The first issue was addressed by adopting three traditional machine learning method domains to investigate different features' influence on the seismic fatalities. The results can provide a basis for further seismic assessment. The second issue was addressed by RF algorithm to assess the importance of 43 structure types. The third issue was addressed by establishing a deep learning model domain to predict the seismic death. A detailed investigation of these two issues is presented as follows.

Importance of Different Features
We propose an automatic classifier based on RF, CART, and AdaBoost, and some attributes to describe the feature importance of seismic. Random forest is the most stable among three methods. Figure 7 exhibits the results of three tests on that three methods. RF provides probability estimates on the classification that are useful to accept or reject a new classification [8]. Thus, we chose the RF algorithm as the main method. It fully proved the importance of the population density, magnitude, focal depth, epicentral intensity. The sum of the four features is 74.68%, which is far more important than other factors. for the fewer classes. However, time is divided into different levels, not modeled at an actual time, so there is interference with the results. The secondary disasters and the abnormal intensity are classified because they occur less frequently. In summary, the features selected by the RF algorithm experiments are consistent with those used by other scholars to study earthquake casualties (population density [2], magnitude [2], focal depth [34], epicentral intensity [16], and time [13]).

Importance of Different Structure Types
We chose all 43 structure types in China mainland to assess, which are far more than the number of the types in the previous studies [35]. Although the structure types of the WHE project are suitable for the most region of the world, it also lacks some structures that Chinese characteristic structure, such as the national civil structure, the old Tibetan house and the national brick-wood structure. Hence, the data from the Earthquake Disasters and Losses Assessment Report in Chinese Mainland of every earthquake in China mainland is more suitable for the study comparing the data of the WHE project. The HAZUS system (HAZad United States of Multi-Hazard) only has twelve structure types and estimates the casualties for collapsed buildings and not for the heavy damaged buildings [18]. In reference [18], the casualties are estimated at the level of single buildings and not for an entire zone in an adapted HAZUS system. The population density varies greatly in different parts of China, so the HAZUS system and the adapted system are not suitable for the casualties estimation of China.
The importance of different structural types was obtained based on RF method first. In the part of data collecting, there are no integration of structural forms and some may be slightly repeated in order not to omit any structure type. For instance, brick-column civil structure, brick-concrete structure (building of two or more floors) and brick-concrete structure (building of only one floor). However, repeated structures, even when combined, the importance was close to zero. Figure 4 displays that the contribution to the casualties of reinforced concrete structure is the largest, followed by civil structure, stone-concrete structure, brick-concrete structure, brick-wood structure, brick house and the frame house. The other structures are not shown in Figure 4 and could be neglected. We concluded that the structure with large effect may not be good at seismic behavior. The result manifests that once the building is destroyed, the damage will be greater.

Human Losses Assessment Model
We proposed a human losses rapid prognostic assessment model based on the above feature engineering. The results indicate the predictions of large earthquakes with magnitude 6.5 or more are lower than the others. For example, the actual death toll of the Wenchuan earthquake is 69227, but the estimation of the deep learning model is 37406, the PAGER model is 50,000 and the other empirical model [20] is 30,000. It can be seen that the accuracy of the assessment of casualties for a Time is vital to seismic, for example, the Tangshan earthquake occurred at 3:24 on July 28, 1976, when people slept in the house, and time aggravated the disaster. Losses will be relatively mitigated in daytime for most people are not sleeping and could escape quickly. The reason that time ranks seventh in this study is that time was only divided into two possibilities: daytime and sleeping time. More classes are discriminated by an attribute, the more important the attribute is [8]. In the same way, the ratio of abnormal intensity and the secondary disaster are only 2.63% and 1.18% separately for the fewer classes. However, time is divided into different levels, not modeled at an actual time, so there is interference with the results. The secondary disasters and the abnormal intensity are classified because they occur less frequently. In summary, the features selected by the RF algorithm experiments are consistent with those used by other scholars to study earthquake casualties (population density [2], magnitude [2], focal depth [34], epicentral intensity [16], and time [13]).

Importance of Different Structure Types
We chose all 43 structure types in China mainland to assess, which are far more than the number of the types in the previous studies [35]. Although the structure types of the WHE project are suitable for the most region of the world, it also lacks some structures that Chinese characteristic structure, such as the national civil structure, the old Tibetan house and the national brick-wood structure. Hence, the data from the Earthquake Disasters and Losses Assessment Report in Chinese Mainland of every earthquake in China mainland is more suitable for the study comparing the data of the WHE project. The HAZUS system (HAZad United States of Multi-Hazard) only has twelve structure types and estimates the casualties for collapsed buildings and not for the heavy damaged buildings [18]. In reference [18], the casualties are estimated at the level of single buildings and not for an entire zone in an adapted HAZUS system. The population density varies greatly in different parts of China, so the HAZUS system and the adapted system are not suitable for the casualties estimation of China. The importance of different structural types was obtained based on RF method first. In the part of data collecting, there are no integration of structural forms and some may be slightly repeated in order not to omit any structure type. For instance, brick-column civil structure, brick-concrete structure (building of two or more floors) and brick-concrete structure (building of only one floor). However, repeated structures, even when combined, the importance was close to zero. Figure 4 displays that the contribution to the casualties of reinforced concrete structure is the largest, followed by civil structure, stone-concrete structure, brick-concrete structure, brick-wood structure, brick house and the frame house. The other structures are not shown in Figure 4 and could be neglected. We concluded that the structure with large effect may not be good at seismic behavior. The result manifests that once the building is destroyed, the damage will be greater.

Human Losses Assessment Model
We proposed a human losses rapid prognostic assessment model based on the above feature engineering. The results indicate the predictions of large earthquakes with magnitude 6.5 or more are lower than the others. For example, the actual death toll of the Wenchuan earthquake is 69227, but the estimation of the deep learning model is 37406, the PAGER model is 50,000 and the other empirical model [20] is 30,000. It can be seen that the accuracy of the assessment of casualties for a large destructive earthquake is not high according to the traditional accuracy calculation (Equation (8)). The reason may be that there are more factors affecting large earthquakes than small earthquakes [2], and the uncertainty is greater than that of small earthquakes [34]. For example, the mountainous area of Ludian County is as high as 87.9% and the high incidence of secondary disasters (e.g., debris flow and landslide) cause large number of casualties. The accuracy of general prediction models is reduced due to the neglect of geological conditions [12]. There are some important reasons that affect the differences of the accuracy between the case 1-7 and case 8-10 besides the geological conditions. Earthquake casualties come from the direct and the secondary disasters, and the assessment mode of the latter is more difficult. However, there is currently no professional method to distinguish the human losses from the direct and the secondary disasters after the earthquake. Therefore, although the secondary disaster was evaluated at the time of importance assessment, it was not selected as an input variable. For destructive earthquakes, such as case 8-10, the types of secondary disasters are more abundant, so the impact is greater. Moreover, the traffic network is very important for post-earthquake rescue. Small earthquakes with magnitudes less than 6.5, such as cases 1-7, usually have limited damage to the traffic network and rarely completely block traffic. However, in the case of a large earthquake with a magnitude more than 6.5, such as the cases 8-10, the traffic network is usually interrupted, which seriously hinders rescue and greatly increases the number of deaths. The study did not consider the damage of the traffic network is also the main reason for the accuracy of the case 8-10.
The purpose of this study is to rapidly assess the human losses. Hence, the input features should be obtained in a short time after an earthquake. In reference [21], the train set was only from the Bam earthquake in 2003, thus, the results were not representative. Compared with empirical methods [36], the data set is larger [4], covering almost all the data from 1990 to 2017 without default. The accuracy of the results is higher; the data set is larger, and the factors considered are more than the China-National Standard [33]. It proved the deep learning technical can be used to estimate the causalities without any assumptions as compared with statistical methods, and can be directly processed by inner functions.

Conclusions
We proposed a method to assess the importance of the nine factors affecting casualties in earthquakes and rank the importance of each feature based on the random forest algorithm. At the same time, 43 structural types were evaluated by this method, and the contribution of different structural forms to the death toll were obtained, which provides a basis for the future construction of structural forms. Based on the above evaluation of importance, we have reached the following conclusions: (1) The works of the features importance fully prove the importance of the population density, magnitude, focal depth, and epicentral intensity on contribution to the death. The importance of the time less than the date and economic status because it is divided into only two parts. (2) The Random Forest algorithm performs better than the AdaBoost and the CART algorithms, both in terms of stability and accuracy. (3) Reinforced concrete structure, national civil structure, and civil structure have the highest contribution to the death toll. The contribution of stone-concrete structure, brick-wood structure, brick structure, and frame structure are small. The other structures are smaller, and even cannot be displayed in the figure.
A deep learning model for estimating human losses based on the results of the Random Forest algorithm was established. We selected five important features and compared the results with the China-National Standard because the Standard is suitable for the rapid assessment works. The results demonstrate that the accuracy is higher than the other methods and the running time are suitable for the emergency rescue work. Therefore, this method can be used to evaluate the fatalities for future earthquakes in China mainland. This model can serve the China Earthquake Administration and the Chinese government.

Extension of the Works
Further extensive studies are needed and some recommendations for future research are given as follows. First, this research is based on the evaluation of factors importance. Future studies can extend the study by adding the number of factors. Second, this paper estimates the importance of different structures to death based on the Random Forest algorithm. Future studies can add the sparse learning to process the data for the classifier getting better results. Third, the deep learning model assesses the human losses with some optimization algorithms. Future studies can add the hidden layers and continue to optimize the algorithms. It will be of great interests to focus on death prediction in the future works.

Acknowledgments:
We would like to thank Chen Zhao for guidance and help with applying machine learning algorithms.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A