Prediction of Structural Type for City-Scale Seismic Damage Simulation Based on Machine Learning

: Being the necessary data of the city-scale seismic damage simulations, structural types of buildings of a city need to be collected. To this end, a prediction method of structural types of buildings based on machine learning (ML) is proposed herein. Speciﬁcally, using the training data of 230,683 buildings in Tangshan city, China, a supervised ML solution based on a decision forest model was designed for the prediction. The scale sensitivity and regional applicability of the designed solution are discussed, respectively, and the results show that the supervised ML solution can maintain high accuracy for di ﬀ erent scales; however, it is only suitable for cities similar to the sample city. For wide applicability for various cities, a semi-supervised ML solution was designed based on sampling investigation and self-training procedures. The downtowns of Daxing and Tongzhou districts in Beijing were selected as a case study for the designed semi-supervised ML solution. The overall prediction accuracies of structural types for Daxing and Tongzhou downtowns can reach 94.8% and 99.5%, respectively, which are acceptable for seismic damage simulations. Based on the predicted results, the distributions of seismic damage in Daxing and Tongzhou downtown were output. This study provides a smart and e ﬃ cient method for obtaining structural types for a city-scale seismic damage simulation. to the sample city. For wide applicability for various cities, a semi-supervised ML solution was designed based on sampling investigation and self-training procedures. The downtowns of Daxing and Tongzhou districts in Beijing were selected as a case study for the designed semi-supervised ML solution. The overall prediction accuracies of structural types for Daxing and Tongzhou downtowns can reach 94.8% and 99.5%, respectively, which are acceptable for seismic damage simulations. Based on the predicted results, the distributions of seismic damage in Daxing and Tongzhou downtown were output. This study provides a smart and e ﬃ cient method for obtaining structural types for a city-scale seismic damage simulation. parts: Training data, supervised ML solution, semi-supervised ML solution, and case study.


Introduction
Generally, cities are densely organized, with many buildings and civil infrastructures. If a city is affected by a strong earthquake, many casualties and significant losses will occur. For example, the 2011 Christchurch earthquake of New Zealand caused 185 deaths and a loss of US$ 11-15 billion [1].
Earthquakes pose a serious threat for many cities in China. For instance, Tangshan, a medium-sized city in China was hit by an Ms 7.8 intraplate earthquake on 28 July 1976, which caused more than 240,000 deaths, and razed the city of Tangshan [2]. Actually, two third of cities beyond one million people in China are located in high risk areas of earthquakes (i.e., the corresponding seismic precautionary intensities of these cities are more than 6 according to the seismic design code of China [3]). For example, Beijing, the capital of China, and Taiyuan, a large city in the north of China, are both located in the area of seismic precautionary intensity 8. Therefore, the earthquake safety of these cities deserves further study. Training data: This part includes five attributes of a building, i.e., structural type, construction year, story number, story height, and story area. The building data of two cities in China was used for the training in this study. One city is Tangshan, which has 230,683 buildings; the other city is Taiyuan, whose downtown has 31,154 buildings. These data were provided by the department of urban planning in the local government.
Supervised ML solution: The ML models and implementation platforms suitable for the prediction of structural types are determined by comparing prediction accuracies. In addition, the scale sensitivity and regional applicability of the designed solution are discussed.
Semi-supervised ML solution: First, the sampling fraction of the building investigation is determined based on the supervised ML solution above. Subsequently, the semi-supervised selftraining procedure is designed based on the sample data. Finally, the prediction performance of the semi-supervised ML solution is assessed.
Case study: The downtowns of Daxing and Tongzhou, the districts of Beijing, were selected as a case study, which has 69,180 and 34,763 buildings, respectively. The structural types of buildings in the downtowns of Daxing and Tongzhou were predicted using the designed semi-supervised ML solution, and the prediction performances were assessed based on the sample data. Furthermore, the seismic damage of Daxing and Tongzhou downtowns were simulated using the predicted structural types. Training data: This part includes five attributes of a building, i.e., structural type, construction year, story number, story height, and story area. The building data of two cities in China was used for the training in this study. One city is Tangshan, which has 230,683 buildings; the other city is Taiyuan, whose downtown has 31,154 buildings. These data were provided by the department of urban planning in the local government.

Determination of Models and Platforms
Supervised ML solution: The ML models and implementation platforms suitable for the prediction of structural types are determined by comparing prediction accuracies. In addition, the scale sensitivity and regional applicability of the designed solution are discussed.
Semi-supervised ML solution: First, the sampling fraction of the building investigation is determined based on the supervised ML solution above. Subsequently, the semi-supervised self-training procedure is designed based on the sample data. Finally, the prediction performance of the semi-supervised ML solution is assessed.
Case study: The downtowns of Daxing and Tongzhou, the districts of Beijing, were selected as a case study, which has 69,180 and 34,763 buildings, respectively. The structural types of buildings in the downtowns of Daxing and Tongzhou were predicted using the designed semi-supervised ML solution, and the prediction performances were assessed based on the sample data. Furthermore, the seismic damage of Daxing and Tongzhou downtowns were simulated using the predicted structural types.

Determination of Models and Platforms
Many ML models and implementation platforms exist [22][23][24][25][26][27], and each has its own advantages and disadvantages. Therefore, appropriate models and platforms must be determined.
(1) ML models The purpose of this study is to predict the structural type with other building attribute data. Structural types are generally limited, e.g., masonry, frame, and shear-wall structures; therefore, the prediction of structural type is a multi-class classification problem in ML. The existing studies indicate that artificial neural network [22], decision forest [23], support vector machine (SVM) [24], and logistic regression [25] are suitable for the classification problem.
The artificial neural network model that simulates the synaptic connection of the brain comprises a large number of neurons and their interconnections; it can be used for multi-class classification problems [22]. The decision forest model is equivalent to an upgraded decision tree model. The decision forest [23] is composed of many decision trees, and each decision tree is independent. For classification problems, the prediction result with the highest accuracy in all the decision trees will be selected as the result of the decision forest. Logistic regression is the appropriate regression analysis to conduct when the dependent variable is binary or multi-class, because it can describe data and explain the relationship between one dependent variable and one or more independent variables. The SVM method is mainly used to segregate the two classes. However, the prediction of structural types in this study is a multi-class problem. Therefore, except SVM model, the artificial neural network, decision forest and logistic regression models were adopted in this study. The prediction results of these three models can be compared with each other, and more accurate results will be applied in the city-scale seismic damage simulation.
(2) Implementation platforms Currently, many implementation platforms exist for ML, e.g., BigML [26], Microsoft Azure (hereinafter referred to as Azure) [27], Google's TensorFlow [28], and Amazon Machine Learning [29]. Azure has integrated lots of the existing ML models, e.g., the artificial neural network, decision forest models, and logistic regression models, and it can be freely employed for a long time. Therefore, Azure was adopted in this study.

Data Processing
The data (i.e., building footprints, construction years, story heights, story areas and story numbers) of 230,683 building in Tangshan were provided by the department of urban planning in the local government of Tangshan city.
First, the department has validated the data through the extensive surveying and mapping jobs; thus, these data can be considered to be cleaned before the training.
Subsequently, the correlation matrix of the building data was calculated to evaluate possible dependencies. Taking Tangshan city, for example, the correlation matrix of the building data is demonstrated in Figure 2. It can be observed that the building data has weak dependencies and can be used as the input data for predicting structural types of buildings.
Finally, the building data have been normalized by using the min-max normalization method, because building attributes are generally concentrated within a certain range. For example, the story numbers for most buildings in Tangshan are 1-6. By the min-max normalization, the effect on the prediction caused by different scales of data can be avoided.
Note that only four types of building data are used to predict structural types in this study, thus, the building data need to be carefully checked to avoid incorrect or missing data. Actually, the purpose of the prediction of structural types in this study is to support the urban planning for earthquake preparedness, hence, the data provider of this study is the department of urban planning of a city, and they can guarantee the accuracy of the provided data.

Data Processing
The data (i.e., building footprints, construction years, story heights, story areas and story numbers) of 230,683 building in Tangshan were provided by the department of urban planning in the local government of Tangshan city.
First, the department has validated the data through the extensive surveying and mapping jobs; thus, these data can be considered to be cleaned before the training.
Subsequently, the correlation matrix of the building data was calculated to evaluate possible dependencies. Taking Tangshan city, for example, the correlation matrix of the building data is demonstrated in Figure 2. It can be observed that the building data has weak dependencies and can be used as the input data for predicting structural types of buildings.

Model Training
The data of 230,683 buildings in Tangshan were used to train the artificial neural network, decision forest and logistic regression models. Azure packages each operation as a component that can be defined and organized through visual programming. The prediction solution can be created efficiently using components. Using the decision forest model as an example, the supervised ML solution was designed using the components, as shown in Figure 3.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 23 Finally, the building data have been normalized by using the min-max normalization method, because building attributes are generally concentrated within a certain range. For example, the story numbers for most buildings in Tangshan are 1-6. By the min-max normalization, the effect on the prediction caused by different scales of data can be avoided.
Note that only four types of building data are used to predict structural types in this study, thus, the building data need to be carefully checked to avoid incorrect or missing data. Actually, the purpose of the prediction of structural types in this study is to support the urban planning for earthquake preparedness, hence, the data provider of this study is the department of urban planning of a city, and they can guarantee the accuracy of the provided data.

Model Training
The data of 230,683 buildings in Tangshan were used to train the artificial neural network, decision forest and logistic regression models. Azure packages each operation as a component that can be defined and organized through visual programming. The prediction solution can be created efficiently using components. Using the decision forest model as an example, the supervised ML solution was designed using the components, as shown in Figure 3. The components in Figure 3 are specified as follows: Component 1 (Select Data): Select the uploaded source data, i.e., Tangshan data. Component 2 (Select Model): Select "Multi-class Decision Forest" model in Azure for the prediction. Component 3 (Split Data): Split the source data into training data (i.e., 80% of the source data) and assessment data (i.e., the remaining 20% of the source data). Component 4 (Train Model): Train the prediction model using the training data. Component 5 (Score Model): Score the prediction results using the assessment data. Component 6 (Evaluate Model): Evaluate the accuracy of the prediction model.
Using the designed supervised ML solution above, the overall accuracy for predicting structural The components in Figure 3 are specified as follows: Component 1 (Select Data): Select the uploaded source data, i.e., Tangshan data. Component 2 (Select Model): Select "Multi-class Decision Forest" model in Azure for the prediction. Component 3 (Split Data): Split the source data into training data (i.e., 80% of the source data) and assessment data (i.e., the remaining 20% of the source data).
Appl. Sci. 2020, 10, 1795 6 of 24 Component 4 (Train Model): Train the prediction model using the training data. Component 5 (Score Model): Score the prediction results using the assessment data. Component 6 (Evaluate Model): Evaluate the accuracy of the prediction model.
Using the designed supervised ML solution above, the overall accuracy for predicting structural type in Tangshan reaches 98.3%, as shown in Figure 4a. If the artificial neural network and logistic regression models are adopted, the corresponding overall accuracies will be 98.0% and 97.0%, as shown in Figure 4b,c. Therefore, the designed supervised ML solution can predict the structural type of a city with high accuracy.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 23 precision and recall. According to Table 1, the macro and micro F1 scores of the decision forest model is the highest, while that of the logistic regression model is the lowest. In addition, the decision forest model also has the highest accuracy, but the logistic regression has the lowest accuracy. Therefore, although the artificial neural network and logistic regression models also have high accuracy, the decision forest model is recommended for the prediction of structural types in this study, due to the best performance in the above three models.   Besides, according to the precisions and recalls in Figure 4, the macro and micro F1 scores of the above three predictions can be calculated to evaluate the performances of different ML models further, as shown in Table 1. The F1 score is simply a way to combine the performance metrics of precision and recall. According to Table 1, the macro and micro F1 scores of the decision forest model is the highest, while that of the logistic regression model is the lowest. In addition, the decision forest model also has the highest accuracy, but the logistic regression has the lowest accuracy. Therefore, although the artificial neural network and logistic regression models also have high accuracy, the decision forest model is recommended for the prediction of structural types in this study, due to the best performance in the above three models.

Scale Sensitivity Assessment
In actual application scenarios, the scales of the predicted buildings are uncertain. To assess the scale sensitivity of the prediction model, different scales of buildings were adopted to perform the predictions. In these predictions, the building data were randomly selected from all buildings in Tangshan city. The prediction results are shown in Figure 5.

Scale Sensitivity Assessment
In actual application scenarios, the scales of the predicted buildings are uncertain. To assess the scale sensitivity of the prediction model, different scales of buildings were adopted to perform the predictions. In these predictions, the building data were randomly selected from all buildings in Tangshan city. The prediction results are shown in Figure 5. When the building scales are 1000, 3000, 5000, 10,000, and 30,000, the corresponding prediction accuracies are 97.7%, 98.1%, 98.3%, and 98.2%, 98.3% respectively. The results show that the building scales have no significant effect on the prediction accuracy. Therefore, the designed supervised ML solution can accurately predict the structural type for different building scales.

Regional Applicability Assessment
The structural types of buildings may differ from different regions. Therefore, if the prediction When the building scales are 1000, 3000, 5000, 10,000, and 30,000, the corresponding prediction accuracies are 97.7%, 98.1%, 98.3%, and 98.2%, 98.3% respectively. The results show that the building Appl. Sci. 2020, 10, 1795 8 of 24 scales have no significant effect on the prediction accuracy. Therefore, the designed supervised ML solution can accurately predict the structural type for different building scales.

Regional Applicability Assessment
The structural types of buildings may differ from different regions. Therefore, if the prediction model is trained with the data of a city, the prediction accuracy may decrease when it is used in other cities.
To assess the effects of different regions on the prediction accuracy, the model trained by Tangshan city was used to predict the structural types of 31,154 buildings in the downtown of Taiyuan city, China. Note that the building data of Taiyuan were provided by the department of urban planning in the local government. As demonstrated in Figure 6, the prediction accuracy of Taiyuan is 90.6% using the sample data in Tangshan, while that of Tangshan is 98.3% using the same data.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 23 Taiyuan is 90.6% using the sample data in Tangshan, while that of Tangshan is 98.3% using the same data. It is noted that Taiyuan and Tangshan are typical northern cities in China; therefore, the two cities are similar. However, if a southern city is used, the prediction accuracy may be not very high. Therefore, the designed supervised ML solution is recommended to be applied for cities similar to the sample city. Furthermore, a semi-supervised prediction model is designed for better regional applicability.

Determination of Sampling Fraction
The designed semi-supervised ML solution is based on the building sampling investigation in the predicted city. In detail, the sampling building is randomly selected, and then the structural types of sampling buildings are determined by consulting the relevant engineering drawings from the urban archive administration, so that the sampled structural types are accurate. The building data obtained by the sampling investigation were used to predict the structural type of all buildings in the city. In the sampling investigation, deficient building data may cause an inaccurate prediction result, while excessive building data incur extensive manual work; therefore, the number of buildings to be investigated (i.e., sampling fraction) is a critical question.
According to the existing building data of the two cities (i.e., Taiyuan and Tangshan), the predictions were performed based on different sampling fractions to determine the optimal fraction. In detail, six sampling fractions (i.e., 0.05%, 0.1%, 0.5%, 1%, 5%, and 10%) were performed for the prediction of two cities. The predictions employ the decision forest model. The sample data were It is noted that Taiyuan and Tangshan are typical northern cities in China; therefore, the two cities are similar. However, if a southern city is used, the prediction accuracy may be not very high. Therefore, the designed supervised ML solution is recommended to be applied for cities similar to the sample city. Furthermore, a semi-supervised prediction model is designed for better regional applicability.

Determination of Sampling Fraction
The designed semi-supervised ML solution is based on the building sampling investigation in the predicted city. In detail, the sampling building is randomly selected, and then the structural types of sampling buildings are determined by consulting the relevant engineering drawings from the urban archive administration, so that the sampled structural types are accurate. The building data obtained by the sampling investigation were used to predict the structural type of all buildings in the city. In the sampling investigation, deficient building data may cause an inaccurate prediction result, while excessive building data incur extensive manual work; therefore, the number of buildings to be investigated (i.e., sampling fraction) is a critical question.
According to the existing building data of the two cities (i.e., Taiyuan and Tangshan), the predictions were performed based on different sampling fractions to determine the optimal fraction. In detail, six sampling fractions (i.e., 0.05%, 0.1%, 0.5%, 1%, 5%, and 10%) were performed for the prediction of two cities. The predictions employ the decision forest model. The sample data were used to train the model, while all the remaining building data in the city except the sample data were used for assessing the accuracy of the prediction.
For Taiyuan and Tangshan, the data were randomly sampled 50 times at each sampling fraction, and the corresponding 50 times prediction were performed according to the sampled data. The accuracies of the predictions are shown in Figures 7 and 8. The curves of prediction accuracies and variance with the sampling fractions are illustrated in Figures 9 and 10, respectively.        According to the prediction results of Taiyuan and Tangshan (shown in Figures 9 and 10, respectively), when the sampling fraction is greater than 1%, the prediction is highly accurate (i.e., above 97% in two cases) and the variance of the prediction accuracies decreases. Therefore, the sampling fraction of 1% is recommended for predicting the structural type of a city.

Semi-Supervised ML Solution
The designed semi-supervised solution is shown in Figure 11. First, the sample data are obtained by the building investigation. The aforementioned prediction results indicate that when the sampling fraction of the building investigation is 1%, the prediction will be highly accurate. In this study, the sampling fraction of 3% was adopted. In detail, 1% of the sample data was used for training the model, while 2% of sample data for assessing the accuracy of the prediction. Subsequently, the designed supervised ML solution indicated that the decision forest model has the best performance for the prediction of structural type; therefore, the decision forest model was selected. Finally, a selftraining process [30][31][32] was performed iteratively until the prediction accuracy was accepted. Specifically, the ML model was trained by the sample data, and then the prediction of the trained model was scored and evaluated, separately. If the prediction accuracy of the trained model is accepted, then the training process will end; otherwise, building data with high accuracies will be selected, and these data will be used for the next training to obtain better prediction results. Such an iterative training process is defined as a self-training process. According to the prediction results of Taiyuan and Tangshan (shown in Figures 9 and 10, respectively), when the sampling fraction is greater than 1%, the prediction is highly accurate (i.e., above 97% in two cases) and the variance of the prediction accuracies decreases. Therefore, the sampling fraction of 1% is recommended for predicting the structural type of a city.

Semi-Supervised ML Solution
The designed semi-supervised solution is shown in Figure 11. First, the sample data are obtained by the building investigation. The aforementioned prediction results indicate that when the sampling fraction of the building investigation is 1%, the prediction will be highly accurate. In this study, the sampling fraction of 3% was adopted. In detail, 1% of the sample data was used for training the model, while 2% of sample data for assessing the accuracy of the prediction. Subsequently, the designed supervised ML solution indicated that the decision forest model has the best performance for the prediction of structural type; therefore, the decision forest model was selected. Finally, a self-training process [30][31][32] was performed iteratively until the prediction accuracy was accepted. Specifically, the ML model was trained by the sample data, and then the prediction of the trained model was scored and evaluated, separately. If the prediction accuracy of the trained model is accepted, then the training process will end; otherwise, building data with high accuracies will be selected, and these data will be used for the next training to obtain better prediction results. Such an iterative training process is defined as a self-training process.
training process [30][31][32] was performed iteratively until the prediction accuracy was accepted. Specifically, the ML model was trained by the sample data, and then the prediction of the trained model was scored and evaluated, separately. If the prediction accuracy of the trained model is accepted, then the training process will end; otherwise, building data with high accuracies will be selected, and these data will be used for the next training to obtain better prediction results. Such an iterative training process is defined as a self-training process.  Using Tangshan as an example, the designed semi-supervised training process is demonstrated as follows.
(1) First self-training The process of first self-training can be implemented using the components in Azure, as shown in Figure 12. Train the selected ML model using the training data. Component 6 (Score Model): Score the accuracies of the prediction results with the assessment data, as shown in Figure 13. By doing this, the building data with high accuracies will be identified. In this study, the building data ranked top 1% in the scored probabilities will be selected for the next training. Component 7 (Evaluate Model): Evaluate the performance of the prediction model and output the evaluation results. As shown in Figure 14, the overall prediction accuracy of the first self-training reaches 95.5%. However, the accuracy of the frame structure is only 81.7%, which is not acceptable; therefore, a second self-training is required. Component 8 (Convert to CSV): Convert the building data with high accuracies (see Component 6) to CSV format such that these data can be used for the second self-training. Component 7 (Evaluate Model): Evaluate the performance of the prediction model and output the evaluation results. As shown in Figure 14, the overall prediction accuracy of the first selftraining reaches 95.5%. However, the accuracy of the frame structure is only 81.7%, which is not acceptable; therefore, a second self-training is required. Component 8 (Convert to CSV): Convert the building data with high accuracies (see Component 6) to CSV format such that these data can be used for the second self-training.   (2) Second self-training Second self-training was implemented using components in Azure, as shown in Figure 15. However, the training data used were different from those of the first self-training. The original training data (the CSV file in Component 3 in Figure 12) and the identified building data with high accuracy (the CSV file in Component 8 in Figure 12) were integrated as the training data for the second self-training. It is noted that the assessment data (i.e., 2/3 of the sample data) were the same for all self-trainings.  (2) Second self-training Second self-training was implemented using components in Azure, as shown in Figure 15. However, the training data used were different from those of the first self-training. The original training data (the CSV file in Component 3 in Figure 12) and the identified building data with high accuracy (the CSV file in Component 8 in Figure 12) were integrated as the training data for the second self-training. It is noted that the assessment data (i.e., 2/3 of the sample data) were the same for all self-trainings. Similar to the first self-training, the model was trained and evaluated. As shown in Figure 16, the overall accuracy rate is above 95.1%, which is similar to that of the first self-training (95.5%). In particular, the prediction accuracy of the frame structure increased from 81.7% to 86.9%, which is an improvement. Similarly, the building data with high accuracies were converted to CSV format for the next self-training. Similar to the first self-training, the model was trained and evaluated. As shown in Figure 16, the overall accuracy rate is above 95.1%, which is similar to that of the first self-training (95.5%). In particular, the prediction accuracy of the frame structure increased from 81.7% to 86.9%, which is an improvement. Similarly, the building data with high accuracies were converted to CSV format for the next self-training.
Similar to the first self-training, the model was trained and evaluated. As shown in Figure 16, the overall accuracy rate is above 95.1%, which is similar to that of the first self-training (95.5%). In particular, the prediction accuracy of the frame structure increased from 81.7% to 86.9%, which is an improvement. Similarly, the building data with high accuracies were converted to CSV format for the next self-training. (3) Third self-training After the third self-training, the prediction results are as shown in Figure 17. The overall prediction accuracy of the model is 95.2%. In detail, the prediction accuracies of the masonry, frame, and shear wall structures are 96.1%, 87.0%, and 100.0%, respectively. The accuracies of the overall prediction and each type of structure will not increase significantly compared with those of the last (3) Third self-training After the third self-training, the prediction results are as shown in Figure 17. The overall prediction accuracy of the model is 95.2%. In detail, the prediction accuracies of the masonry, frame, and shear wall structures are 96.1%, 87.0%, and 100.0%, respectively. The accuracies of the overall prediction and each type of structure will not increase significantly compared with those of the last self-training. In addition, the corresponding precision and recall are also very high, as shown in Figure 17. Therefore, additional self-trainings are not required.
The predicted structural type of all the buildings by the third self-training was compared with the real data in Tangshan, and the error is shown in Table 2. The results indicate that the designed semi-supervised ML solution can achieve a high prediction accuracy even when using 1% of all the building data. Both the other attribute data of buildings and the structural types of the sampling buildings are accurate, which guarantees the prediction accuracy using the designed ML solution. Furthermore, the self-learning process can improve the prediction accuracy. Therefore, the prediction results in Table 2 is exact compared with the real structural types of buildings in Tangshan. semi-supervised ML solution can achieve a high prediction accuracy even when using 1% of all the building data. Both the other attribute data of buildings and the structural types of the sampling buildings are accurate, which guarantees the prediction accuracy using the designed ML solution. Furthermore, the self-learning process can improve the prediction accuracy. Therefore, the prediction results in Table 2 is exact compared with the real structural types of buildings in Tangshan.

Introduction of Case Study
Daxing and Tongzhou are two districts in Beijing, as shown in Figure 18. The earthquake risk of these two districts are very high (i.e., the precautionary seismic intensity is 8). In this intensity, the peak ground acceleration corresponding to the service level earthquake whose probability of exceedance in 50 years is 63.3% reaches 0.2 g [3]. Seismic damage simulation of these two districts will provide the decision-making references for their urban planning on earthquake preparedness. However, the structural types of buildings are unavailable for these two districts. Therefore, the designed ML solution will be applied in the downtowns of Daxing and Tongzhou for obtaining the data of structural types.

Introduction of Case Study
Daxing and Tongzhou are two districts in Beijing, as shown in Figure 18. The earthquake risk of these two districts are very high (i.e., the precautionary seismic intensity is 8). In this intensity, the peak ground acceleration corresponding to the service level earthquake whose probability of exceedance in 50 years is 63.3% reaches 0.2 g [3]. Seismic damage simulation of these two districts will provide the decision-making references for their urban planning on earthquake preparedness. However, the structural types of buildings are unavailable for these two districts. Therefore, the designed ML solution will be applied in the downtowns of Daxing and Tongzhou for obtaining the data of structural types.

Structural Type Prediction for Daxing Downtown
The department of urban planning in the local government has provided the GIS data of building footprints of Daxing downtown, which includes the attributes of story area, construction years, story heights and story numbers. According to the existing GIS data, 69,180 buildings exist in Daxing downtown. In detail, the ratios of buildings constructed before 1989, from 1989 to 2001, and after 2001 are 62%, 32%, and 6%, respectively. The distribution of construction year is shown in Figure 19. It is noted here that the codes for the seismic design of buildings were updated in 1989 and 2001; therefore, buildings that were constructed later exhibit higher anti-seismic capabilities. Additionally, most buildings in this area are low rise, e.g., the number of buildings within two floors constitutes 91% of the total number. The distribution of story number is shown in Figure 20.

Structural Type Prediction for Daxing Downtown
The department of urban planning in the local government has provided the GIS data of building footprints of Daxing downtown, which includes the attributes of story area, construction years, story heights and story numbers. According to the existing GIS data, 69,180 buildings exist in Daxing downtown. In detail, the ratios of buildings constructed before 1989, from 1989 to 2001, and after 2001 are 62%, 32%, and 6%, respectively. The distribution of construction year is shown in Figure  19. It is noted here that the codes for the seismic design of buildings were updated in 1989 and 2001; therefore, buildings that were constructed later exhibit higher anti-seismic capabilities. Additionally, most buildings in this area are low rise, e.g., the number of buildings within two floors constitutes 91% of the total number. The distribution of story number is shown in Figure 20.

Structural Type Prediction for Daxing Downtown
The department of urban planning in the local government has provided the GIS data of building footprints of Daxing downtown, which includes the attributes of story area, construction years, story heights and story numbers. According to the existing GIS data, 69,180 buildings exist in Daxing downtown. In detail, the ratios of buildings constructed before 1989, from 1989 to 2001, and after 2001 are 62%, 32%, and 6%, respectively. The distribution of construction year is shown in Figure  19. It is noted here that the codes for the seismic design of buildings were updated in 1989 and 2001; therefore, buildings that were constructed later exhibit higher anti-seismic capabilities. Additionally, most buildings in this area are low rise, e.g., the number of buildings within two floors constitutes 91% of the total number. The distribution of story number is shown in Figure 20.  No structural type exists in the GIS data of Daxing downtown; therefore, the designed semi-supervised ML solution in this study was used to predict the structural type of buildings. No structural type exists in the GIS data of Daxing downtown; therefore, the designed semisupervised ML solution in this study was used to predict the structural type of buildings.
(1) Building sampling investigation In this case study, 3% of the total buildings in Daxing downtown was investigated to obtain the complete attribution data of the buildings. According to the recommended sampling fraction in this study, 1% of the total buildings was used for the training model, while the remaining 2% for assessing the model.
As mentioned previously, 69,180 buildings exist in Daxing downtown; therefore, 2075 buildings (i.e., 3% of total buildings) were randomly investigated. The distribution of structural type in the investigated buildings is shown in Table 3. Table 3. Structure types and numbers of investigated buildings. Total  385  238  761  691  2075 (2) Training model

Masonry Frame Shear Wall Light Steel
The sample data of 2075 buildings were uploaded to the Azure platform. Using the designed semi-supervised ML solution, three self-trainings were performed, and the optimal training results are shown in Figure 21. (1) Building sampling investigation In this case study, 3% of the total buildings in Daxing downtown was investigated to obtain the complete attribution data of the buildings. According to the recommended sampling fraction in this study, 1% of the total buildings was used for the training model, while the remaining 2% for assessing the model.
As mentioned previously, 69,180 buildings exist in Daxing downtown; therefore, 2075 buildings (i.e., 3% of total buildings) were randomly investigated. The distribution of structural type in the investigated buildings is shown in Table 3. (2) Training model The sample data of 2075 buildings were uploaded to the Azure platform. Using the designed semi-supervised ML solution, three self-trainings were performed, and the optimal training results are shown in Figure 21.
As shown in Figure 21, the overall accuracy of the prediction results is 94.8%. The accuracy of the frame structure is 84.0%, whereas, those of other structures are more than 92.2%. In addition, the F1 score can be calculated using precision and recall in Figure 21. In detail, the macro and micro F1 scores are 94.8% and 92.9%, respectively, which are accepted. Generally, the trained model exhibits high accuracy and performance for the prediction of structural type. As shown in Figure 21, the overall accuracy of the prediction results is 94.8%. The accuracy of the frame structure is 84.0%, whereas, those of other structures are more than 92.2%. In addition, the F1 score can be calculated using precision and recall in Figure 21. In detail, the macro and micro F1 scores are 94.8% and 92.9%, respectively, which are accepted. Generally, the trained model exhibits high accuracy and performance for the prediction of structural type.
(3) Prediction of structural type Using the trained model, the structural type of all buildings in Daxing downtown were predicted. The ratios of different structural types are shown in Figure 22. It is clear that most buildings in Daxing downtown are masonry buildings, which account for 66% of the total buildings, while the frame shear wall buildings are the least, i.e., only 1% of the total buildings. The distribution of structural type in Daxing downtown is shown in Figure 23. It is noted that only 3% of all the buildings must be investigated manually when the designed semi-supervised ML solution is used. Compared with the manual investigation of all the buildings (i.e., a total of 69,180 buildings), the designed solution can save 97% of manual work and significantly improve the efficiency for obtaining the data of structural type.  (3) Prediction of structural type Using the trained model, the structural type of all buildings in Daxing downtown were predicted. The ratios of different structural types are shown in Figure 22. It is clear that most buildings in Daxing downtown are masonry buildings, which account for 66% of the total buildings, while the frame shear wall buildings are the least, i.e., only 1% of the total buildings. The distribution of structural type in Daxing downtown is shown in Figure 23. It is noted that only 3% of all the buildings must be investigated manually when the designed semi-supervised ML solution is used. Compared with the manual investigation of all the buildings (i.e., a total of 69,180 buildings), the designed solution can save 97% of manual work and significantly improve the efficiency for obtaining the data of structural type. As shown in Figure 21, the overall accuracy of the prediction results is 94.8%. The accuracy of the frame structure is 84.0%, whereas, those of other structures are more than 92.2%. In addition, the F1 score can be calculated using precision and recall in Figure 21. In detail, the macro and micro F1 scores are 94.8% and 92.9%, respectively, which are accepted. Generally, the trained model exhibits high accuracy and performance for the prediction of structural type.
(3) Prediction of structural type Using the trained model, the structural type of all buildings in Daxing downtown were predicted. The ratios of different structural types are shown in Figure 22. It is clear that most buildings in Daxing downtown are masonry buildings, which account for 66% of the total buildings, while the frame shear wall buildings are the least, i.e., only 1% of the total buildings. The distribution of structural type in Daxing downtown is shown in Figure 23. It is noted that only 3% of all the buildings must be investigated manually when the designed semi-supervised ML solution is used. Compared with the manual investigation of all the buildings (i.e., a total of 69,180 buildings), the designed solution can save 97% of manual work and significantly improve the efficiency for obtaining the data of structural type.

Seismic Damage Simulation for Daxing Downtown
The Sanhe-Pinggu M 8.0 earthquake [33] occurred in 1679, which is the latest M 8.0 earthquake in the history of Beijing. The ground motion of the Sanhe-Pinggu M 8.0 earthquake was simulated [34], and the corresponding time-history accelerations are shown in Figure 24. The time-history accelerations of the Sanhe-Pinggu earthquake will be input for the seismic damage simulation of Daxing downtown. Based on the predicted building data, the MDOF models of buildings were created, and the seismic damage of Daxing downtown was simulated based on the MDOF models and nonlinear time-history analysis. The distribution of seismic damage in Daxing downtown is shown in Figure   Figure 23. Distribution of building structure type in Daxing downtown.

Seismic Damage Simulation for Daxing Downtown
The Sanhe-Pinggu M 8.0 earthquake [33] occurred in 1679, which is the latest M 8.0 earthquake in the history of Beijing. The ground motion of the Sanhe-Pinggu M 8.0 earthquake was simulated [34], and the corresponding time-history accelerations are shown in Figure 24. The time-history accelerations of the Sanhe-Pinggu earthquake will be input for the seismic damage simulation of Daxing downtown.

Seismic Damage Simulation for Daxing Downtown
The Sanhe-Pinggu M 8.0 earthquake [33] occurred in 1679, which is the latest M 8.0 earthquake in the history of Beijing. The ground motion of the Sanhe-Pinggu M 8.0 earthquake was simulated [34], and the corresponding time-history accelerations are shown in Figure 24. The time-history accelerations of the Sanhe-Pinggu earthquake will be input for the seismic damage simulation of Daxing downtown. Based on the predicted building data, the MDOF models of buildings were created, and the seismic damage of Daxing downtown was simulated based on the MDOF models and nonlinear time-history analysis. The distribution of seismic damage in Daxing downtown is shown in Figure  25. It is clear that most buildings suffer slight damages. To reveal the features of the seismic damages Based on the predicted building data, the MDOF models of buildings were created, and the seismic damage of Daxing downtown was simulated based on the MDOF models and nonlinear time-history analysis. The distribution of seismic damage in Daxing downtown is shown in Figure 25. It is clear that most buildings suffer slight damages. To reveal the features of the seismic damages clearly, the damage ratios of different structural types and construction years are shown in Figure 26. As shown, buildings of masonry structure and constructed before 1989 suffer severe damages, which provides important references for seismic retrofit decisions. It is noted that the seismic damage simulation of a city is implemented based on the designed semi-supervised ML solution, which is useful for improving the seismic capability of the city.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 19 of 23 clearly, the damage ratios of different structural types and construction years are shown in Figure 26. As shown, buildings of masonry structure and constructed before 1989 suffer severe damages, which provides important references for seismic retrofit decisions. It is noted that the seismic damage simulation of a city is implemented based on the designed semi-supervised ML solution, which is useful for improving the seismic capability of the city.

Structural Type Prediction for Tongzhou Downtown
The GIS data of building footprints of Tongzhou downtown are also from the department of urban planning in the local government. According to the GIS data, Tongzhou downtown has 37,463 buildings.
By the building sampling investigation, the structural types of 3% of the total buildings in Tongzhou downtown were determined. According to the recommended sampling fraction in this study, 1% of the total buildings was used for the training model, while the remaining 2% for assessing the model.
Using the designed semi-supervised ML solution, four self-trainings were performed, and the  clearly, the damage ratios of different structural types and construction years are shown in Figure 26. As shown, buildings of masonry structure and constructed before 1989 suffer severe damages, which provides important references for seismic retrofit decisions. It is noted that the seismic damage simulation of a city is implemented based on the designed semi-supervised ML solution, which is useful for improving the seismic capability of the city.

Structural Type Prediction for Tongzhou Downtown
The GIS data of building footprints of Tongzhou downtown are also from the department of urban planning in the local government. According to the GIS data, Tongzhou downtown has 37,463 buildings.
By the building sampling investigation, the structural types of 3% of the total buildings in Tongzhou downtown were determined. According to the recommended sampling fraction in this study, 1% of the total buildings was used for the training model, while the remaining 2% for assessing the model.
Using the designed semi-supervised ML solution, four self-trainings were performed, and the optimal training results are shown in Figure 27. As shown in Figure 27, the overall accuracy of the

Structural Type Prediction for Tongzhou Downtown
The GIS data of building footprints of Tongzhou downtown are also from the department of urban planning in the local government. According to the GIS data, Tongzhou downtown has 37,463 buildings.
By the building sampling investigation, the structural types of 3% of the total buildings in Tongzhou downtown were determined. According to the recommended sampling fraction in this study, 1% of the total buildings was used for the training model, while the remaining 2% for assessing the model.
Using the designed semi-supervised ML solution, four self-trainings were performed, and the optimal training results are shown in Figure 27. As shown in Figure 27, the overall accuracy of the prediction results is 99.5%, and the accuracy of each structure type is beyond 97.9%. Obviously, the trained model exhibits a high prediction accuracy. Besides, the corresponding precisions and recalls are beyond 99.0%, as shown in Figure 27.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 20 of 23 prediction results is 99.5%, and the accuracy of each structure type is beyond 97.9%. Obviously, the trained model exhibits a high prediction accuracy. Besides, the corresponding precisions and recalls are beyond 99.0%, as shown in Figure 27. Using the trained model, the structural type of all buildings in Tongzhou downtown were predicted. The ratios of different structural types are shown in Figure 28. It is clear that most buildings in Tongzhou downtown are masonry buildings, which is similar to Daxing downtown. The distribution of structural type in Tongzhou downtown is shown in Figure 29.  Using the trained model, the structural type of all buildings in Tongzhou downtown were predicted. The ratios of different structural types are shown in Figure 28. It is clear that most buildings in Tongzhou downtown are masonry buildings, which is similar to Daxing downtown. The distribution of structural type in Tongzhou downtown is shown in Figure 29.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 20 of 23 prediction results is 99.5%, and the accuracy of each structure type is beyond 97.9%. Obviously, the trained model exhibits a high prediction accuracy. Besides, the corresponding precisions and recalls are beyond 99.0%, as shown in Figure 27. Using the trained model, the structural type of all buildings in Tongzhou downtown were predicted. The ratios of different structural types are shown in Figure 28. It is clear that most buildings in Tongzhou downtown are masonry buildings, which is similar to Daxing downtown. The distribution of structural type in Tongzhou downtown is shown in Figure 29.

Seismic Damage Simulation for Tongzhou Downtown
The ground motion of the Sanhe-Pinggu M 8.0 earthquake was also used for the seismic damage simulation of Tongzhou downtown. Based on the predicted building data and the MDOF models, the seismic damage of Tongzhou downtown was simulated, as shown in Figure 30. It is clear that most buildings suffer slight and moderate damages. The simulated seismic damage will provide the decision-making references for the urban planning on earthquake preparedness (e.g., seismic retrofitting planning).

Seismic Damage Simulation for Tongzhou Downtown
The ground motion of the Sanhe-Pinggu M 8.0 earthquake was also used for the seismic damage simulation of Tongzhou downtown. Based on the predicted building data and the MDOF models, the seismic damage of Tongzhou downtown was simulated, as shown in Figure 30. It is clear that most buildings suffer slight and moderate damages. The simulated seismic damage will provide the decision-making references for the urban planning on earthquake preparedness (e.g., seismic retrofitting planning).

Seismic Damage Simulation for Tongzhou Downtown
The ground motion of the Sanhe-Pinggu M 8.0 earthquake was also used for the seismic damage simulation of Tongzhou downtown. Based on the predicted building data and the MDOF models, the seismic damage of Tongzhou downtown was simulated, as shown in Figure 30. It is clear that most buildings suffer slight and moderate damages. The simulated seismic damage will provide the decision-making references for the urban planning on earthquake preparedness (e.g., seismic retrofitting planning).

Conclusions
An ML-based prediction method of the structural type of buildings was proposed, and the case study of Daxing and Touzhou downtowns in Beijing were investigated. Some conclusions are drawn as follows: (1) The prediction result of the designed supervised ML solution for Tangshan with 230,683 buildings indicated that decision forest, artificial neural network and logistic regression models exhibited high prediction accuracy. Especially, the decision forest model has the best performance and is recommended to predict structural types. (2) The designed supervised ML solution could maintain high prediction accuracy for different building scales; however, it should be applied for cities similar to the sample city. (3) The designed semi-supervised ML solution was applicable to different cities, based on a sampling investigation. According to the prediction with different sampling fractions, the sampling fraction of 1% is recommended. Through multiple self-trainings, the semi-supervised ML solution achieved high accuracy for predicting structural types. (4) This study provided a smart and efficient method to predict structural type for a city-scale seismic damage simulation.