Next Article in Journal
Perspectives on Smart Villages from a Bibliometric Approach
Previous Article in Journal
Temporal and Spatial Evolution of the Science and Technology Innovative Efficiency of Regional Industrial Enterprises: A Data-Driven Perspective
 
 
Article
Peer-Review Record

Identifying Exposure of Urban Area to Certain Seismic Hazard Using Machine Learning and GIS: A Case Study of Greater Cairo

Sustainability 2022, 14(17), 10722; https://doi.org/10.3390/su141710722
by Omar Hamdy 1, Hanan Gaber 2, Mohamed S. Abdalzaher 3,* and Mahmoud Elhadidy 3
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Sustainability 2022, 14(17), 10722; https://doi.org/10.3390/su141710722
Submission received: 12 June 2022 / Revised: 30 July 2022 / Accepted: 15 August 2022 / Published: 29 August 2022

Round 1

Reviewer 1 Report

Dear Authors


The presented work shows significant efforts but needs further corrections to become acceptable on this journal. The current version of the manuscript is too long and not suitable for the esteemed Sustainability journal. The manuscript is not concise and to the point. I would suggest to make it short and sweet to the reader. The manuscript should go through significant English correction.


1- Abstract does not provide important points. It should be between 250 to 300 words and concisely mention the problems of previous works, novelties and implications in this paper. Inclusion of quantitative results will be efficient.

2- Introduction has very poor structure and long. The background of the study needs to be highlighted briefly. A detail view of literature review where the previous works should be highlighted along with the main problem statements of work and how it can be improved or overcome on it, research gap, novelty recall and objectives should be pinpointed. Lots of vague sentences found in introduction. Who has done similar kind of works are not mentioned in the literature review.
3- More information about the study area, data repository and collection process required. The study area map should be improved with faults names, geology and earthquake catalogue.

4. Methodology section is not well described. Which type of methodology implemented for PGA calculation and how Hazard map was developed. A flowchart of the PGA and hazard estimation is necessary to understand the study by the readers. A detail description and implementation is required.

4- Results and discussion are not properly organized and it has to show the significant achievements of the proposed method and discuss each table and figure in detail. The results section is describing the methodology. The process of work, e.g. (Land Sat satellite images give an useful method of strewing the historical urban area, and this section will demonstrate the results of a land sat analysis to extract the land uses in 1992. Figure 4(a) depicts the outcome of the first stage, which began with integrating bands for the case study region in 1992 in natural colors without the presence of an atmosphere. Figure 4(b) depicts the same image after being subjected to supervised classification using GIS.)

5-Discussion section is poor and short while results section describes methodology. Significant improvement needed. It could be great if you do a comparison between your proposed methods and some of the other known methods to show the efficiency of it.

7- It would be useful if you provide a general framework or flowchart that how others can implement or use your proposed method for their assessment purposes.

8- In general, the conclusion section is poor and you can discuss a bit again about the achievements and limitations, drawbacks of your proposed method.


Overall, I did not find any novelty in this manuscript. Modification of the manuscript is necessary in terms of novelty and research gap. Hope these suggestions will be helpful for the manuscript correction.

Author Response

Reply to the Editor and Reviewers’ Comments

Paper ID: sustainability-1791175

Paper title: Identifying Exposure of Urban Area to Certain Seismic Hazard Using
Machine Learning and GIS: A Case Study of Greater Cairo

We would like to thank the editor and the reviewers for their valuable comments on the paper (sustainability-1791175) entitled “Identifying Exposure of Urban Area to Certain Seismic Hazard Using
Machine Learning and GIS: A Case Study of Greater Cairo’’. We have revised the manuscript according to the reviewers' comments which helped in improving the quality and presentation of the paper. In order to facilitate our reply, the amendments are clearly highlighted in the revised manuscript to clarify them. We hope that the revised version has addressed the reviewers' comments. Our detailed reply to the comments we have received is given next.

Reviewer 1 comments

1- Abstract does not provide important points. It should be between 250 to 300 words and concisely mention the problems of previous works, novelties and implications in this paper. Inclusion of quantitative results will be efficient.

Replay. Thanks for the important comment. Following your comment, the abstract has been modified as follows;

Abstract: The Cairo 1992 earthquake with a moment magnitude of 5.8 is the most catastrophic earthquake shocking the Greater Cairo (GC) in the last decades. According to the very limited number of seismological stations at that time, the peak ground acceleration (PGA) caused by this event was not recorded. PGA calculation requires identification of nature of the earthquake source, the geologic setting of the path between the source and site under investigation and soil dynamic properties of the site. Soil dynamic properties are acquired by geotechnical investigations and/or geophysical survey. These two methods are costly and are not applicable in regional studies. This study presents an adaptive and reliable PGA prediction model using machine learning (ML) along with six standard geographic information system (GIS) interpolation methods (IDW, Kriging, Natural, Spline, TopoToR, and Trend) to predict the spatial distribution of PGA caused by this event over the GC. The model is employed to estimate the exposure of the urban area and population in the GC based on the available geotechnical and geophysical investigations. The exposure (population) data is from free-easy access sources e.g., Land sat images and the Global Human Settlement Population Grid (GHS-POP). The results show that Natural, Spline, and Trend are not suitable GIS interpolation techniques for generating Seismic hazard maps (SHMs), while the Kriging Method shows sufficient prediction. Interestingly, the ML model outperforms the classical GIS methodologies with an accuracy of 96%.   

2- Introduction has very poor structure and long. The background of the study needs to be highlighted briefly. A detail view of literature review where the previous works should be highlighted along with the main problem statements of work and how it can be improved or overcome on it, research gap, novelty recall and objectives should be pinpointed. Lots of vague sentences found in introduction. Who has done similar kind of works are not mentioned in the literature review.

Replay. Thanks for the important comment. Following your comment, the Introduction has been modified as follows;

Natural disasters are the most common problem that human settlements, particularly in megacities throughout the globe, have to deal with [1]. Natural catastrophes pose a significant threat to mega-cities as well. According to statistics published by the UN-(United Habitat's Nations Human Settlements Programme), all megacities are prone to natural disasters varying from geological (earthquake ground shaking and mass movements) to meteorological (flash flooding and storms) and extreme weather events (extreme heat and cold) and wildfires, indicating the need to develop different risk reduction strategies for various conditions in megacities [2]. Earthquakes have the potential to create liquefaction, landslides, fires, and tsunamis, all of which would result in a much greater amount of damage and losses [3].

Seismic hazard maps (SHMs) represent the regional distribution of the hazard caused by the earthquake’s ground motion in an area. So, seismic intensity parameters such as peak ground acceleration (PGA) and peak ground velocity (PGV) are usually used as hazard indications in conventional SHMs. Many SHMs have been proposed and developed around the world [4]. SHMs are usually used by planners, engineers, and developers to save lives and money. They are also used in building codes, seismic risk assessment, and disaster management. Tsunamis, landslides, and liquefaction are all possible secondary effects of earthquakes. Therefore, gathering SHMs with other information compiled from the tectonic maps, geological, geodetic, and geophysical data sets can be used to identify the potential sites of these effects. Kime et al. (2020) used the GIS interpolation methodologies to map the soil classification and to produce SHMs based on remote sensing and geotechnical information in Deajeon, South Korea. At the end of their study, Kim et al. (2020) approved the applicability of using GIS interpolation methodologies in producing SHMs and strongly recommended the integration of the remote sensing based and in-situ geotechnical information in producing seismic zonation maps.

The excessive placement of major cities around the world, expansion in urbanization and population, and growing income all contribute to increasing the exposure in hazard-prone regions  [5]. Therefore, urban planning based on reliable hazard maps is safeguard inhabitants from the consequences of natural disasters [12,13]. Producing SHMs based on the earthquake scenario approach requires knowledge about the faulting mechanism, earthquake source parameters, the crustal structure between the earthquake source and the site under investigation and the soil dynamic properties in the site under investigation. Earthquake faulting mechanism and earthquake source parameters can be inferred directly from the digital earthquake record. The nature of the crustal structure can be found in the previous geophysical and seismological studies. But, the dynamic properties of the soil at the site of interest are obtained from the in-situ geotechnical bore-holes and/or geophysical measurements. In-situ geotechnical bore-holes and/or geophysical investigations are costly, time-consuming and not applicable for regional studies. Therefore, producing regional scale SHMs considering the local site conditions is a great challenge, as the main obstacle is the limited number of available geotechnical data and geophysical measurements. Consequently, implementing new approaches using recent methodologies (i.e remote sensing, GIS and machine learning (ML)) based on free-easy-access data to overcome the data limits and deficiency of the traditional techniques is a very important quick and low-cost solution.

Remote sensing technology has been advanced dramatically in the last decade, providing the opportunity for more precise characterization of urban monitoring. Remote sensing data has several benefits and plays an important role in the inventory evaluation and monitoring of environmental assets based on spatial data; hence, use of this technology in a wide range of industries increases [6,7]. However, remote sensing applications are found to be crucial for third-world nations since it is difficult for governments to update their databases using standard methods due to the time and expense connected with them [8]. Remote sensing data may also be used to determine land use in urban areas [9]. There is a large number of satellite imagery data, for example, Landsat, IKONOS-2, and OrbView-3. But Landsat has been selected as the best choice for monitoring spatial details given the availability of spectral satellite data with reasonably long timespans and suitable accuracy [10]. Landsat data is free-easy access data, available and easy to download from the United States Geological Survey website [11], these types of data may provide main findings that are near to real-life situations [12].

The Joint Research Centre (JRC) has just published the most recent worldwide gridded population dataset, dubbed the Global Human Settlement Population Grid (GHS-POP) [13]. Using this geographic raster dataset, you could see the population of the city, given as the number of people per cell, published in 2018 [14] [15]. These estimations were taken from the CIESIN GPWv4.10 datasets and dispersed from censuses or administrative units to grid cells based on the distribution and density of built-up areas as depicted in the Global Human Settlement Layer (GHSL) total world layer per epoch [14]. Global population raster maps are crucial for a variety of policy-making evaluations (from environmental assessment through disaster risk studies to city planning and management). As a result, accurate and up-to-date statistics on the population are critical. Simple GIS statistics and analyses are used to ensure the correctness of the collected data. The GHS-POP statistics are very reliable, independent of the study topic, according to the error values [16].

The geographic information system (GIS) commonly stores spatial data as discrete points or splits data in its spatial database. For this reason, it is preferable to conduct a survey of all geographic area data samples and then end the attribute value sample to gather comprehensive geographic data. According to, this is a waste of time and money [17,18]. GIS spatial interpolation methods provide an effective way of predicting the proper geographic distribution of data, enhancing data density, acquiring full information for missing data, and establishing an intense distribution of data with little observational data set. Spatial interpolation is a way of making informed assumptions that incorporate both the investigator and the GIS when the value of a continuous field has not been measured at a specific place [19]. 

Machine learning (ML) algorithms have recently emerged to tackle many research problems [20,21]. This started with the success of the use of convolutional neural networks (CNNs) for image recognition. This, in turn, has raised the interest in applying other ML algorithms to address a wide range of challenging research problems. ML tools can be used to build (learn) very complex relational models that classical approaches might not be able to capture given their models’ restrictions. The application of ML tools has witnessed great success in addressing many challenging research problems, ranging from recommendation systems to autonomous driving cars [22]. In this regard, ML has proved beneficial in both classification and regression problems in calculating the ground motion parameters that are directly employed in seismic hazard and human safety [23–25]. Moreover, it can be utilized for creating effective seismic zonation  maps [26].

The Greater Cairo (GC) region is one of the most densely populated locations in the world (about 13 million capita), with densely populated suburbs. Historical and recent earthquake catalogs prove that this mega city has experienced severe earthquakes that have destroyed many historical and archaeological structures [27,28]. Dahshour seismic source is the seismic source that generates the most catastrophic earthquakes affecting the GC. On October 12, 1992 Dahshour seismic source generated the most significant natural hazard in this region in more than a decade where, it caused a disproportionate amount of destruction and the death of numerous lives [6]in the GC region, the Nile Valley, and the Nile Delta [29]. This significant event caused 561 deaths, 9832 injured, more than 20,000 people were made homeless, and more than 8,000 structures were damaged or destroyed and 50, 000 people, were displaced in the Cairo region alone and left a damage bill of more than $35 million [30]. The study of the recent impact of such an event is very important for urban planning, seismic risk assessment, seismic risk reduction, and disaster management. The purpose of this research is threefold. For our first goal, the paper compares six GIS interpolation methods, in addition to the suggested ML method, in order to give recommendations for the most accurate method that will be used to predict PGA values and create SHMs. The second goal is to determine the contribution of using free and easy access data sources for estimating the exposed urban area and population. The final goal is to examine the urban area's exposure to the shacking caused by Oct. 12, 1992 Dahshour earthquake in GC using all tested interpolation methods and free and easy access data sources.

3- More information about the study area, data repository and collection process required. The study area map should be improved with faults names, geology and earthquake catalogue.

Replay. We thank the reviewer for this valuable comment. Figure 1 has been modified to represent the faults and the earthquake catalog as follows

 

4- Methodology section is not well described. Which type of methodology implemented for PGA calculation and how Hazard map was developed. A flowchart of the PGA and hazard estimation is necessary to understand the study by the readers. A detail description and implementation is required.

Replay. Thanks for your comment the following section has been added to the manuscript.

In the current study, the scenario based seismic hazard approach is implemented to estimates the ground motion parameters of Oct. 1992 earthquake. This approach is widely used in seismic risk assessment. The scenario based seismic hazard approach requires the definition of the characteristics of the seismic source, the nature of the path between the seismic source and the site of interest, and the local site condition expressed in the average shear wave velocity in the upper 30 meters of the soil (Vs30). These inputs are implemented in a deterministic manner to predict the ground motion parameters at different sites in GC. 

5- Results and discussion are not properly organized and it has to show the significant achievements of the proposed method and discuss each table and figure in detail. The results section is describing the methodology. The process of work, e.g. (Land Sat satellite images give an useful method of strewing the historical urban area, and this section will demonstrate the results of a land sat analysis to extract the land uses in 1992. Figure 4(a) depicts the outcome of the first stage, which began with integrating bands for the case study region in 1992 in natural colors without the presence of an atmosphere. Figure 4(b) depicts the same image after being subjected to supervised classification using GIS.))

Replay: We thank the reviewer for this valuable comment.

According to table 2, we have added: The desert was the most common and the water was the least type of land use in GC.

In table 3, we have added : The accuracy of class categorization varied from 0.95 for desert, 0,97 for Green and 0.79 for urban to 1 for water bodies, for total accuracy of 93 percent, with a Kappa value of 0.87, which shows that classification accuracy is practically totally dependable.

In table 4: we have added: It is noteworthy that the proposed ML model outperforms the other techniques from the highest accuracy point of view (0.96) and the minimum error (residual) one (8.54). While the techniques of Kriging and TopoTor methods with accuracy (0.95) and minimum error (9.06 and 9.45).

In Table 5, we have added: Table 5 shows the current PGA based intensity scale. Earthquake intensity scales describe the severity of an earthquake’s effects on the Earth's surface, humans, and buildings at different locations in the area of the epicenter .The degree of shaking due to an earthquake event can be estimated either quantitatively in terms of the peak ground parameters or qualitatively using the felt intensity. The felt-intensity values are determined based on the observed structural damage and/or the response of humans to the ground shaking. These felt intensities are useful as an input in rapid loss modeling.  The intensity scale was extended to eight levels (0 to 7) due to the widespread pattern of high ground accelerations (400 Gal and greater). At lower levels of the scale (0 to 3), the intensity is generally assessed in terms of how the shaking is felt by people. Higher levels of the scale (4 to 7) are based on observed structural damage surveyed by professionals.

In Table 6, we have added : This area created by Trend method, which has the highest percentage over the other methods, while the moderate percentage was (94.47) which was generated by Natural method and the lowest percentage of urban areas was approximately (87.43) percent in the same perceived shaking class, which it was generated by the Spline method.

 

As can be seen in Figure 10 (a), the ML, Kriging, and Spline methods generate similar values (approximately 5.7-5.9) for urban areas that are exposed to moderate shacking. However, the other comparison in Figure 10 (b) shows that the ML and Kriging methods also generate similar values (approximately 91.9- 92.2) for urban areas exposed to moderate shacking.

In table 7, we have added The majority of the population was concentrated in the "strong class" which accounted for 93-99 percent of the total population. While the average was concentrated in the "Violent class" for 2-3 percent and the minority of the population was concentrated in "Moderate class" for 1-3 percent. 

In section 4.1

We mentioned that in the following paragraph.

“this section will demonstrate the results of a land sat analysis to extract the land uses in 1992. Figure 5(a) depicts the outcome of the first stage, which began with integrating bands for the case study region in 1992 in natural colors without the presence of an atmosphere. Figure 5(b) depicts the same image after being subjected to supervised classification using GIS.”

6-Discussion section is poor and short while results section describes methodology. Significant improvement needed. It could be great if you do a comparison between your proposed methods and some of the other known methods to show the efficiency of it.

Replay: Deeply thanks to the reviewer for the appreciated suggestion and we already did.

To the best of our knowledge, no similar study has considered integrated approaches to tackle the considered problem. This integration included three main differences: first, it compared the most GIS interpolation methods (six); second, it created a new machine learning-based method; and last, it used all GIS methods and the ML method in initiating SHMs. This study tests the impact of metropolitan areas and population distribution on SHMs generated.

The outcomes of this investigation were compared to other earlier studies that dealt with similar themes. Describe the many studies that have explored data interpolation. These studies can be described as follows:

•    Previous studies compared only two or three methods [31] [32] [33], but this study compared the majority of all six GIS methods, and much of the data supported the interpolation method, which is the most accurate.

•     The second class of publications applied four or five methods to different types of data, such as climatic data, soil type, and digital elevation systems [34] [35].

•    This paper's findings are in line with earlier research showing that the interpolation kriging method can improve spatial prediction accuracy in different data types [36] [34] [32], While others show that Inverse Distance Weighted and Kriging are nearly the same [32] [33]. Kriging was the best method. The Kriging method gives superior interpolation for unmeasured quantities [37] [38] [39].

Most studies employ the same accuracy and error comparison strategy as this investigation. Cross-validation uses RMSE to account for stationary spots and extrema [35] [32] [34]. However, in this article, R2 value and MSE were used to improve the accuracy test of all interpolation methods.

 

7- It would be useful if you provide a general framework or flowchart that how others can implement or use your proposed method for their assessment purposes.

Replay: We strongly appreciate the reviewer’s suggestions, we added the following framework and hopping it will be useful.

8- In general, the conclusion section is poor and you can discuss a bit again about the achievements and limitations, drawbacks of your proposed method.

Replay. Many thanks for the important comment. We have modified the conclusion section to be as follows;

Reliable SHMs play a significant role in urban planning, seismic risk assessment, and seismic risk reduction in megacities. Obtaining such maps requires detailed geotechnical investigations, which are costly and time-consuming. In this paper, the efficiency of the common six GIS-based interpolation approaches, and several linear and non-linear ML methodologies is examined in producing SHMs (i.e., PGA map) using a limited geotechnical dataset. The previously mentioned methods were applied to the available geotechnical data occurring in the GC. Also, the developed model relied on open-access data such as Landsat and GHS-POP to identify the exposure distribution in SHMs. The results demonstrated that the Kriging method delivered the best performance among the utilized GIS approaches with an accuracy of 95%. Moreover, the ET ML model achieved the optimum performance in predicting PGA values with an accuracy of 96%. Finally, we recommend the decision makers in developing countries such as Egypt to use the proposed methodologies for accurate estimation of the SHMs, urban areas, and populations based on the location and perceived shaking.

 

According to the findings of this study, the proposed Machina Learning-based method outperformed the rest of the GIS methods in terms of accuracy, allowing it to be used to create Seismic Hazard maps. In the future, it is possible to include some other affecting variables, such as topography, which require more data and may require major financing, which is one of the most significant obstacles to developing the model. moreover, ML model needs to be improved by using updated datasets representing different regions.

Finally, the authors would like to mention that, the proposed approach has been applied to the GC. The obtained geotechnical data reflects the geological setting of the study area, where the study area is composed of three geological units. These units are not intercalated with each other. The geologic setting of the study area is considered homogenous. Therefore, applying this approach to regions with heterogenous geologic settings requires more attention. Also, the implemented approach can’t replace the site-specific- seismic hazard studies required to obtain the ground-motion parameters that are the main inputs of the design response analysis and structure response analysis of the critical facilities, mega structures and infrastructures.

 

 

 

 

 

Author Response File: Author Response.pdf

Reviewer 2 Report

It will be acceptable after major revision. The authors have a sound knowledge of theoretical science. A Case Study is presented to Identifying Exposure of Urban Area to Certain Seismic Hazard Using Machine Learning and GIS: A Case Study of Greater Cairo.  

Abstract

This section needs revision. It is suggested to add one sentence about your key problem and its significance.

Introduction

In the introduction, the section needs major revision to add the significance of the paper. This section is not written up according to professional scientific way. It is suggested rewrite this section as said below. The current form is just a literature review. It is suggested to reduce this section. The current form of this section seems to semi literature review.

It is arranged according to following order;

a)      Key Problem

b)      Review of Research History (This part is okey)

c)      Significance/ Scope

d)      My Progress/ Development

e)      Advantages 

 

Figure 2. Distribution of Population in GC.

It is suggested to develop a color figure and mentioned the proper data source of this figure.

Table 2. Description of the different - interpolation methods implemented in GIS [76].

 It is very generic information; I suggest remove it.

3.5. Machine Learning Methods and Evaluation Metrics

 It is suggested to reduce this section and remove irrelevant information from this section. Only discussed key parameters in summarized form.

Table 3. Results of Landsat classification.

It is suggested to use a single model to present the data set, Table, or pie chart.

 

1.       Conclusions

 

This section needs to reduce. It is suggested only to discuss your key finding and remove irrelevant sentences.  

Author Response

Reply to the Editor and Reviewers’ Comments

Paper ID: sustainability-1791175

Paper title: Identifying Exposure of Urban Area to Certain Seismic Hazard Using
Machine Learning and GIS: A Case Study of Greater Cairo

We would like to thank the editor and the reviewers for their valuable comments on the paper (sustainability-1791175) entitled “Identifying Exposure of Urban Area to Certain Seismic Hazard Using
Machine Learning and GIS: A Case Study of Greater Cairo’’. We have revised the manuscript according to the reviewers' comments which helped in improving the quality and presentation of the paper. In order to facilitate our reply, the amendments are clearly highlighted in the revised manuscript to clarify them. We hope that the revised version has addressed the reviewers' comments. Our detailed reply to the comments we have received is given next.

Reviewer 2 comments:

1- Abstract: This section needs revision. It is suggested to add one sentence about your key problem and its significance.

Replay. Thanks for the important comment. Following your comment, the abstract has been modified as follows;

Abstract: The Cairo 1992 earthquake with a moment magnitude of 5.8 is the most catastrophic earthquake shocking the Greater Cairo (GC) in the last decades. According to the very limited number of seismological stations at that time, the peak ground acceleration (PGA) caused by this event was not recorded. PGA calculation requires identification of nature of the earthquake source, the geologic setting of the path between the source and site under investigation and soil dynamic properties of the site. Soil dynamic properties are acquired by geotechnical investigations and/or geophysical survey. These two methods are costly and are not applicable in regional studies. This study presents an adaptive and reliable PGA prediction model using machine learning (ML) along with six standard geographic information system (GIS) interpolation methods (IDW, Kriging, Natural, Spline, TopoToR, and Trend) to predict the spatial distribution of PGA caused by this event over the GC. The model is employed to estimate the exposure of the urban area and population in the GC based on the available geotechnical and geophysical investigations. The exposure (population) data is from free-easy access sources e.g., Land sat images and the Global Human Settlement Population Grid (GHS-POP). The results show that Natural, Spline, and Trend are not suitable GIS interpolation techniques for generating Seismic hazard maps (SHMs), while the Kriging Method shows sufficient prediction. Interestingly, the ML model outperforms the classical GIS methodologies with an accuracy of 96%.   

2- Introduction: In the introduction, the section needs major revision to add the significance of the paper. This section is not written up according to professional scientific way. It is suggested rewrite this section as said below. The current form is just a literature review. It is suggested to reduce this section. The current form of this section seems to semi literature review. It is arranged according to following order;

a)      Key Problem

b)      Review of Research History (This part is okey)

c)      Significance/ Scope

d)      My Progress/ Development

e)      Advantages 

Replay. Thanks for the important comment. Following your comment, the Introduction has been modified as follows;

Natural disasters are the most common problem that human settlements, particularly in megacities throughout the globe, have to deal with [1]. Natural catastrophes pose a significant threat to mega-cities as well. According to statistics published by the UN-(United Habitat's Nations Human Settlements Programme), all megacities are prone to natural disasters varying from geological (earthquake ground shaking and mass movements) to meteorological (flash flooding and storms) and extreme weather events (extreme heat and cold) and wildfires, indicating the need to develop different risk reduction strategies for various conditions in megacities [2]. Earthquakes have the potential to create liquefaction, landslides, fires, and tsunamis, all of which would result in a much greater amount of damage and losses [3].

Seismic hazard maps (SHMs) represent the regional distribution of the hazard caused by the earthquake’s ground motion in an area. So, seismic intensity parameters such as peak ground acceleration (PGA) and peak ground velocity (PGV) are usually used as hazard indications in conventional SHMs. Many SHMs have been proposed and developed around the world [4]. SHMs are usually used by planners, engineers, and developers to save lives and money. They are also used in building codes, seismic risk assessment, and disaster management. Tsunamis, landslides, and liquefaction are all possible secondary effects of earthquakes. Therefore, gathering SHMs with other information compiled from the tectonic maps, geological, geodetic, and geophysical data sets can be used to identify the potential sites of these effects. Kime et al. (2020) used the GIS interpolation methodologies to map the soil classification and to produce SHMs based on remote sensing and geotechnical information in Deajeon, South Korea. At the end of their study, Kim et al. (2020) approved the applicability of using GIS interpolation methodologies in producing SHMs and strongly recommended the integration of the remote sensing based and in-situ geotechnical information in producing seismic zonation maps.

The excessive placement of major cities around the world, expansion in urbanization and population, and growing income all contribute to increasing the exposure in hazard-prone regions  [5]. Therefore, urban planning based on reliable hazard maps is safeguard inhabitants from the consequences of natural disasters [12,13]. Producing SHMs based on the earthquake scenario approach requires knowledge about the faulting mechanism, earthquake source parameters, the crustal structure between the earthquake source and the site under investigation and the soil dynamic properties in the site under investigation. Earthquake faulting mechanism and earthquake source parameters can be inferred directly from the digital earthquake record. The nature of the crustal structure can be found in the previous geophysical and seismological studies. But, the dynamic properties of the soil at the site of interest are obtained from the in-situ geotechnical bore-holes and/or geophysical measurements. In-situ geotechnical bore-holes and/or geophysical investigations are costly, time-consuming and not applicable for regional studies. Therefore, producing regional scale SHMs considering the local site conditions is a great challenge, as the main obstacle is the limited number of available geotechnical data and geophysical measurements. Consequently, implementing new approaches using recent methodologies (i.e remote sensing, GIS and machine learning (ML)) based on free-easy-access data to overcome the data limits and deficiency of the traditional techniques is a very important quick and low-cost solution.

Remote sensing technology has been advanced dramatically in the last decade, providing the opportunity for more precise characterization of urban monitoring. Remote sensing data has several benefits and plays an important role in the inventory evaluation and monitoring of environmental assets based on spatial data; hence, use of this technology in a wide range of industries increases [6,7]. However, remote sensing applications are found to be crucial for third-world nations since it is difficult for governments to update their databases using standard methods due to the time and expense connected with them [8]. Remote sensing data may also be used to determine land use in urban areas [9]. There is a large number of satellite imagery data, for example, Landsat, IKONOS-2, and OrbView-3. But Landsat has been selected as the best choice for monitoring spatial details given the availability of spectral satellite data with reasonably long timespans and suitable accuracy [10]. Landsat data is free-easy access data, available and easy to download from the United States Geological Survey website [11], these types of data may provide main findings that are near to real-life situations [12].

The Joint Research Centre (JRC) has just published the most recent worldwide gridded population dataset, dubbed the Global Human Settlement Population Grid (GHS-POP) [13]. Using this geographic raster dataset, you could see the population of the city, given as the number of people per cell, published in 2018 [14] [15]. These estimations were taken from the CIESIN GPWv4.10 datasets and dispersed from censuses or administrative units to grid cells based on the distribution and density of built-up areas as depicted in the Global Human Settlement Layer (GHSL) total world layer per epoch [14]. Global population raster maps are crucial for a variety of policy-making evaluations (from environmental assessment through disaster risk studies to city planning and management). As a result, accurate and up-to-date statistics on the population are critical. Simple GIS statistics and analyses are used to ensure the correctness of the collected data. The GHS-POP statistics are very reliable, independent of the study topic, according to the error values [16].

The geographic information system (GIS) commonly stores spatial data as discrete points or splits data in its spatial database. For this reason, it is preferable to conduct a survey of all geographic area data samples and then end the attribute value sample to gather comprehensive geographic data. According to, this is a waste of time and money [17,18]. GIS spatial interpolation methods provide an effective way of predicting the proper geographic distribution of data, enhancing data density, acquiring full information for missing data, and establishing an intense distribution of data with little observational data set. Spatial interpolation is a way of making informed assumptions that incorporate both the investigator and the GIS when the value of a continuous field has not been measured at a specific place [19]. 

Machine learning (ML) algorithms have recently emerged to tackle many research problems [20,21]. This started with the success of the use of convolutional neural networks (CNNs) for image recognition. This, in turn, has raised the interest in applying other ML algorithms to address a wide range of challenging research problems. ML tools can be used to build (learn) very complex relational models that classical approaches might not be able to capture given their models’ restrictions. The application of ML tools has witnessed great success in addressing many challenging research problems, ranging from recommendation systems to autonomous driving cars [22]. In this regard, ML has proved beneficial in both classification and regression problems in calculating the ground motion parameters that are directly employed in seismic hazard and human safety [23–25]. Moreover, it can be utilized for creating effective seismic zonation  maps [26].

The Greater Cairo (GC) region is one of the most densely populated locations in the world (about 13 million capita), with densely populated suburbs. Historical and recent earthquake catalogs prove that this mega city has experienced severe earthquakes that have destroyed many historical and archaeological structures [27,28]. Dahshour seismic source is the seismic source that generates the most catastrophic earthquakes affecting the GC. On October 12, 1992 Dahshour seismic source generated the most significant natural hazard in this region in more than a decade where, it caused a disproportionate amount of destruction and the death of numerous lives [6]in the GC region, the Nile Valley, and the Nile Delta [29]. This significant event caused 561 deaths, 9832 injured, more than 20,000 people were made homeless, and more than 8,000 structures were damaged or destroyed and 50, 000 people, were displaced in the Cairo region alone and left a damage bill of more than $35 million [30]. The study of the recent impact of such an event is very important for urban planning, seismic risk assessment, seismic risk reduction, and disaster management. The purpose of this research is threefold. For our first goal, the paper compares six GIS interpolation methods, in addition to the suggested ML method, in order to give recommendations for the most accurate method that will be used to predict PGA values and create SHMs. The second goal is to determine the contribution of using free and easy access data sources for estimating the exposed urban area and population. The final goal is to examine the urban area's exposure to the shacking caused by Oct. 12, 1992 Dahshour earthquake in GC using all tested interpolation methods and free and easy access data sources.

 

 

3- Figure 2. Distribution of Population in GC: It is suggested to develop a color figure and mentioned the proper data source of this figure.

Replay: We thank the reviewer for this valuable suggetion, we already did.

 

 

4- Table 2. Description of the different - interpolation methods implemented in GIS [76]. It is very generic information; I suggest remove it.

Replay: Thanks for your comment, we did.

5- 3.5. Machine Learning Methods and Evaluation Metrics:  It is suggested to reduce this section and remove irrelevant information from this section. Only discussed key parameters in summarized form.

Replay. Thanks for the important comment. Following your comment, the machine learning section has been modified as follows;

In the proposed approach, we have utilized seven linear models and eight non-linear ones as depicted in Figure 3. Figure 3 shows the proposed approach, which implements both the linear and nonlinear ML models. We have done extensive experimental work using the linear and non-linear ML models illustrated in Figure 3. The implemented linear models are logistic regression (LR), linear discriminant analysis (LDA), quadratic discriminant analysis (LDA), linear support vector machine (LSVM), Lasso regressor (LasR), Ridge regressor (RR), and Ordinary Least Squares regression (OLS) [31,76–78]. Second, the utilized non-linear models are AdaBoost (AB), Gradient Boosting (GB), Light Gradient Boosting (LGB), Extreme Gradient Boosting (XGB), random forest (RF), extra-trees (ET), decision tree (DT), and k-nearest neighbors (KNN)[77,79–81].

 

Figure 3 ML performance evaluation and best model determination.

3.5.2. Learning and testing process

•           We have started by determining the features and targets/labels of the input dataset (800 samples)). The features are represented by latitude (Lat) and longitude (Long), and the labels are the calculated PGA values. Then, the input data is handled by dividing it into three sets (training set, test set, and validation set). The training set and test set are 600 samples, while the validation set is 200 samples. First, the 600 samples are employed for training and testing the utilized models with four split ratios to 60% and 40%, 70% and 30%, 80% and 20%, and 90% and 10% for training and test, respectively. Second, the models are validated by the 200 samples by which the obtained results prove that the ET ML model achieves the best-predicted PGA values based on the utilized evaluation metrics. Accordingly, the ET model has been deployed to examine the PGA values of 48,000 location samples.

 

6- Table 3. Results of Landsat classification: It is suggested to use a single model to present the data set, Table, or pie chart.

Replay: We thank the reviewer for this valuable comment. We have used the table to show the data set

 

 

Class Name

Area (Ha)

Area (%)

Desert

290843

66.63%

Green

85279

19.54%

Urban

55085

12.62%

Water

5287

1.21%

Total

436494

100%

7-       Conclusions: This section needs to reduce. It is suggested only to discuss your key finding and remove irrelevant sentences.  

Replay. Many thanks for the important comment. We have modified the conclusion section to be as follows;

Reliable SHMs play a significant role in urban planning, seismic risk assessment, and seismic risk reduction in megacities. Obtaining such maps requires detailed geotechnical investigations, which are costly and time-consuming. In this paper, the efficiency of the common six GIS-based interpolation approaches, and several linear and non-linear ML methodologies is examined in producing SHMs (i.e., PGA map) using a limited geotechnical dataset. The previously mentioned methods were applied to the available geotechnical data occurring in the GC. Also, the developed model relied on open-access data such as Landsat and GHS-POP to identify the exposure distribution in SHMs. The results demonstrated that the Kriging method delivered the best performance among the utilized GIS approaches with an accuracy of 95%. Moreover, the ET ML model achieved the optimum performance in predicting PGA values with an accuracy of 96%. Finally, we recommend the decision makers in developing countries such as Egypt to use the proposed methodologies for accurate estimation of the SHMs, urban areas, and populations based on the location and perceived shaking.

 

Author Response File: Author Response.pdf

Reviewer 3 Report

The authors investigated the theme of seismic hazard exposure identification in a Urban Area, using Machine Learning (ML) technique and Geographical Information Systems (GIS) models. Their main purposes are two:

·         the comparison, in terms of performance accuracy, among the different employed models;

·         the estimation of the contribution of using free access datasets on the determination of the level of exposure to seismic hazard of urban areas and population.

This area of research is important and the topic is timely. The methodology applied is very promising and the results of the paper are very interesting and well supported by the provided analysis. However, the structure of the paper and the presentation of the results need improvements.

The main issues are reported below:

1.       The introduction paragraph needs to be reorganized. There is an excessive mix between generic considerations and case study and research contents. For a better readability of the paper, it would be advisable to separate generic aspects from those strictly related to the proposed research. For example, in page 1 and 2, the section from line 37 to line 55, cold be moved in the last part of the paragraph, before the main aims, reinforcing the research importance. Furthermore, the dataset properties description could be moved in Materials and Methods paragraph, in a properly subsection.

2.       Material and Methods paragraph could be better organized, separating the study area description from dataset description and methodology.

3.       Page 8, Table 2, line 293: it is preferable to describe the interpolation methodologies in the body of the text, rather than in this tabular form, like, for example, in the following paragraph.

4.       Page 14, Table 4, line 467: What about K coefficient value? In table 4 in correspondence of K, there are in rows all zero values except in the last cell where k reaches the value of 0.87. How this value has been determined?

5.       Page 15, Figure 7, line 480: about Trend interpolation are ground truth points displaced with respect to the measured value?

What type of regression in Trend Interpolation have been used? Polynomial, Logistic…?

Are the Val Points also present after the interpolation? (Question of graphic overlay?)

6.       Page 20, Figure 10, line 542: for the readability of the histograms, it is better to specify the variable represented in the ordinate axis. It is not clear to what the three figures ((a), (b), (c)) are related.

7.       Page 20, Table 8, line 555: If the results reported in table 8 are referred to the inhabitants, why is reported Ha as measure unit?

8.       Page 21, Figure 4 (maybe 11), line 563: are valid the same considerations of point 6. There is a non-progressive numeration of the figures. The caption “Figure 11” must replace “Figure 4”.

Author Response

Reply to the Editor and Reviewers’ Comments

Paper ID: sustainability-1791175

Paper title: Identifying Exposure of Urban Area to Certain Seismic Hazard Using
Machine Learning and GIS: A Case Study of Greater Cairo

We would like to thank the editor and the reviewers for their valuable comments on the paper (sustainability-1791175) entitled “Identifying Exposure of Urban Area to Certain Seismic Hazard Using
Machine Learning and GIS: A Case Study of Greater Cairo’’. We have revised the manuscript according to the reviewers' comments which helped in improving the quality and presentation of the paper. In order to facilitate our reply, the amendments are clearly highlighted in the revised manuscript to clarify them. We hope that the revised version has addressed the reviewers' comments. Our detailed reply to the comments we have received is given next.

Reviewer 3 comments:

The introduction paragraph needs to be reorganized. There is an excessive mix between generic considerations and case study and research contents. For a better readability of the paper, it would be advisable to separate generic aspects from those strictly related to the proposed research. For example, in page 1 and 2, the section from line 37 to line 55, cold be moved in the last part of the paragraph, before the main aims, reinforcing the research importance. Furthermore, the dataset properties description could be moved in Materials and Methods paragraph, in a properly subsection.

Replay. Thanks for the important comment. Following your comment, the Introduction has been modified as follows;

Natural disasters are the most common problem that human settlements, particularly in megacities throughout the globe, have to deal with [1]. Natural catastrophes pose a significant threat to mega-cities as well. According to statistics published by the UN-(United Habitat's Nations Human Settlements Programme), all megacities are prone to natural disasters varying from geological (earthquake ground shaking and mass movements) to meteorological (flash flooding and storms) and extreme weather events (extreme heat and cold) and wildfires, indicating the need to develop different risk reduction strategies for various conditions in megacities [2]. Earthquakes have the potential to create liquefaction, landslides, fires, and tsunamis, all of which would result in a much greater amount of damage and losses [3].

Seismic hazard maps (SHMs) represent the regional distribution of the hazard caused by the earthquake’s ground motion in an area. So, seismic intensity parameters such as peak ground acceleration (PGA) and peak ground velocity (PGV) are usually used as hazard indications in conventional SHMs. Many SHMs have been proposed and developed around the world [4]. SHMs are usually used by planners, engineers, and developers to save lives and money. They are also used in building codes, seismic risk assessment, and disaster management. Tsunamis, landslides, and liquefaction are all possible secondary effects of earthquakes. Therefore, gathering SHMs with other information compiled from the tectonic maps, geological, geodetic, and geophysical data sets can be used to identify the potential sites of these effects. Kime et al. (2020) used the GIS interpolation methodologies to map the soil classification and to produce SHMs based on remote sensing and geotechnical information in Deajeon, South Korea. At the end of their study, Kim et al. (2020) approved the applicability of using GIS interpolation methodologies in producing SHMs and strongly recommended the integration of the remote sensing based and in-situ geotechnical information in producing seismic zonation maps.

The excessive placement of major cities around the world, expansion in urbanization and population, and growing income all contribute to increasing the exposure in hazard-prone regions  [5]. Therefore, urban planning based on reliable hazard maps is safeguard inhabitants from the consequences of natural disasters [12,13]. Producing SHMs based on the earthquake scenario approach requires knowledge about the faulting mechanism, earthquake source parameters, the crustal structure between the earthquake source and the site under investigation and the soil dynamic properties in the site under investigation. Earthquake faulting mechanism and earthquake source parameters can be inferred directly from the digital earthquake record. The nature of the crustal structure can be found in the previous geophysical and seismological studies. But, the dynamic properties of the soil at the site of interest are obtained from the in-situ geotechnical bore-holes and/or geophysical measurements. In-situ geotechnical bore-holes and/or geophysical investigations are costly, time-consuming and not applicable for regional studies. Therefore, producing regional scale SHMs considering the local site conditions is a great challenge, as the main obstacle is the limited number of available geotechnical data and geophysical measurements. Consequently, implementing new approaches using recent methodologies (i.e remote sensing, GIS and machine learning (ML)) based on free-easy-access data to overcome the data limits and deficiency of the traditional techniques is a very important quick and low-cost solution.

Remote sensing technology has been advanced dramatically in the last decade, providing the opportunity for more precise characterization of urban monitoring. Remote sensing data has several benefits and plays an important role in the inventory evaluation and monitoring of environmental assets based on spatial data; hence, use of this technology in a wide range of industries increases [6,7]. However, remote sensing applications are found to be crucial for third-world nations since it is difficult for governments to update their databases using standard methods due to the time and expense connected with them [8]. Remote sensing data may also be used to determine land use in urban areas [9]. There is a large number of satellite imagery data, for example, Landsat, IKONOS-2, and OrbView-3. But Landsat has been selected as the best choice for monitoring spatial details given the availability of spectral satellite data with reasonably long timespans and suitable accuracy [10]. Landsat data is free-easy access data, available and easy to download from the United States Geological Survey website [11], these types of data may provide main findings that are near to real-life situations [12].

The Joint Research Centre (JRC) has just published the most recent worldwide gridded population dataset, dubbed the Global Human Settlement Population Grid (GHS-POP) [13]. Using this geographic raster dataset, you could see the population of the city, given as the number of people per cell, published in 2018 [14] [15]. These estimations were taken from the CIESIN GPWv4.10 datasets and dispersed from censuses or administrative units to grid cells based on the distribution and density of built-up areas as depicted in the Global Human Settlement Layer (GHSL) total world layer per epoch [14]. Global population raster maps are crucial for a variety of policy-making evaluations (from environmental assessment through disaster risk studies to city planning and management). As a result, accurate and up-to-date statistics on the population are critical. Simple GIS statistics and analyses are used to ensure the correctness of the collected data. The GHS-POP statistics are very reliable, independent of the study topic, according to the error values [16].

The geographic information system (GIS) commonly stores spatial data as discrete points or splits data in its spatial database. For this reason, it is preferable to conduct a survey of all geographic area data samples and then end the attribute value sample to gather comprehensive geographic data. According to, this is a waste of time and money [17,18]. GIS spatial interpolation methods provide an effective way of predicting the proper geographic distribution of data, enhancing data density, acquiring full information for missing data, and establishing an intense distribution of data with little observational data set. Spatial interpolation is a way of making informed assumptions that incorporate both the investigator and the GIS when the value of a continuous field has not been measured at a specific place [19]. 

Machine learning (ML) algorithms have recently emerged to tackle many research problems [20,21]. This started with the success of the use of convolutional neural networks (CNNs) for image recognition. This, in turn, has raised the interest in applying other ML algorithms to address a wide range of challenging research problems. ML tools can be used to build (learn) very complex relational models that classical approaches might not be able to capture given their models’ restrictions. The application of ML tools has witnessed great success in addressing many challenging research problems, ranging from recommendation systems to autonomous driving cars [22]. In this regard, ML has proved beneficial in both classification and regression problems in calculating the ground motion parameters that are directly employed in seismic hazard and human safety [23–25]. Moreover, it can be utilized for creating effective seismic zonation  maps [26].

The Greater Cairo (GC) region is one of the most densely populated locations in the world (about 13 million capita), with densely populated suburbs. Historical and recent earthquake catalogs prove that this mega city has experienced severe earthquakes that have destroyed many historical and archaeological structures [27,28]. Dahshour seismic source is the seismic source that generates the most catastrophic earthquakes affecting the GC. On October 12, 1992 Dahshour seismic source generated the most significant natural hazard in this region in more than a decade where, it caused a disproportionate amount of destruction and the death of numerous lives [6]in the GC region, the Nile Valley, and the Nile Delta [29]. This significant event caused 561 deaths, 9832 injured, more than 20,000 people were made homeless, and more than 8,000 structures were damaged or destroyed and 50, 000 people, were displaced in the Cairo region alone and left a damage bill of more than $35 million [30]. The study of the recent impact of such an event is very important for urban planning, seismic risk assessment, seismic risk reduction, and disaster management. The purpose of this research is threefold. For our first goal, the paper compares six GIS interpolation methods, in addition to the suggested ML method, in order to give recommendations for the most accurate method that will be used to predict PGA values and create SHMs. The second goal is to determine the contribution of using free and easy access data sources for estimating the exposed urban area and population. The final goal is to examine the urban area's exposure to the shacking caused by Oct. 12, 1992 Dahshour earthquake in GC using all tested interpolation methods and free and easy access data sources.

 

2- Material and Methods paragraph could be better organized, separating the study area description from dataset description and methodology.

Replay: Thanks a lot for the valued comment, we reorganized this section.

3- Page 8, Table 2, line 293: it is preferable to describe the interpolation methodologies in the body of the text, rather than in this tabular form, like, for example, in the following paragraph

Replay: This study examined six GIS interpolation methods (IDW-Kriging-Natural-Spline-TopoToR-Trend). IDW (Inverse Distance Weighted) can interpolate cell values [39], using the average closeness of sample data points to each processing cell. Each point's average weight grows with its distance from the cell's center. Kriging is an effective geo-statistical approach for estimating a point distribution's surface. Even more than with other interpolation approaches, it is vital to investigate the spatial behavior of the phenomena represented by the z-values before generating the final surface. Using Natural Neighbor, you may discover the closest group of input data to a data instance and apply weights based on relative regions to interpolate values. It's sometimes called "Sibson interpolation" or "area-stealing." The Spline approach interpolates data using a mathematical equation that reduces surface curvature, resulting in a smooth, flowing surface. Topo to Raster uses an interpolation method aimed at generating a surface that more closely matches an existing drainage surface while keeping ridgelines and stream systems from the input contour data. Hutchinson and colleagues created ANUDEM at the Australian National University. Trend is a polynomial interpolation method used worldwide to match input values to a smooth surface (a polynomial). The trend surface evolves gradually and catches large-scale data structures [76].

 

4- Page 14, Table 4, line 467: What about K coefficient value? In table 4 in correspondence of K, there are in rows all zero values except in the last cell where k reaches the value of 0.87. How this value has been determined?

Replay: We thank the reviewer for this valuable comment.

1-    Kappa coefficient is a common method to test the validate of the supervised classification “It is mentioned in paragraph stated with In order to confirm that the classified Landsat in Line 96 ”

2-    2-the value of 0.87 represents the accuracy of class categorization varied from 0.79 for urban to 1 for water bodies, for total accuracy of 93 percent, with a Kappa value of 0.87, which shows that classification accuracy is practically totally dependable.

5- Page 15, Figure 7, line 480: about Trend interpolation are ground truth points displaced with respect to the measured value?

What type of regression in Trend Interpolation have been used? Polynomial, Logistic…?

Are the Val Points also present after the interpolation? (Question of graphic overlay?)

Replay: Deep thank for valued comments.

1-    We used the point as same in all methods

2-    The type of regression is Linear type (Default Type).

3-      Val Points have not been used in interpolation process.

6- Page 20, Figure 10, line 542: for the readability of the histograms, it is better to specify the variable represented in the ordinate axis. It is not clear to what the three figures ((a), (b), (c)) are related.

Replay: We did, accept our thanks

7- Page 20, Table 8, line 555: If the results reported in table 8 are referred to the inhabitants, why is reported Ha as measure unit?

Replay: We are sorry for that, it is mistake (Not Ha, it is Capita), we corrected it

 

8- Page 21, Figure 4 (maybe 11), line 563: are valid the same considerations of point 6. There is a non-progressive numeration of the figures. The caption “Figure 11” must replace “Figure 4”.

Replay: We are sorry for that, it is mistake (Not 4), we corrected it

 

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

I think now it is okay for publication 

Reviewer 3 Report

In my opinion the authors have fully replied to the observations made in the previous review.

Back to TopTop