Proposal of a Disrupted Road Detection Method in a Tsunami Event Using Deep Learning and Spatial Data

: Tsunamis generated by undersea earthquakes can cause severe damage. It is essential to quickly assess tsunami-damaged areas to take emergency measures. In this study, I employ deep learning and develop a model using aerial photographs and road segment data. I obtained data from the aerial photographs taken after the Great East Japan Earthquake; the deep learning model used was YOLOv5. The proposed method based on YOLOv5 can determine damaged roads from aerial pictures taken after a disaster. The feature of the proposed method is to use training data from images separated by a speciﬁc range and to distinguish the presence or absence of damage related to the tsunami. The results show that the proposed method is more accurate than a comparable traditional method, which is constructed by labeling and learning the damaged areas. The highest F1 score of the traditional method was 60~78%, while the highest F1 score of the proposed method was 72~83%. The traditional method could not detect locations where it is difﬁcult to determine the damage status from aerial photographs, such as where houses are not completely damaged. However, the proposed method was able to detect them.


Introduction
The Great East Japan Earthquake that occurred on 11 March 2011, caused severe damage over a wide area. The municipalities damaged by the tsunami could not assess, report, and transmit information because of the disruption of communication systems and the collapse of government buildings; in addition, the safety of their leaders and employees was threatened [1]. The more severely damaged areas were, the more difficult it was to transmit and collect information; hence, it was difficult knowing whom to contact to have countermeasures taken. The Nankai Trough earthquake, which has a 0.7-0.8 probability of occurring within the next 30 years, is expected to cause a massive tsunami of more than 10 m in height over a wide area along the Pacific coast from the Kanto region to the Kyushu region [2]. Methods are required for an early warning of tsunamis and a quick assessment of the damage caused by tsunamis.
Early tsunami warnings in coastal areas enable timely evacuation. Accurate and rapid prediction of impending tsunamis is essential to mitigate damage to human life and property [3,4].
Assessing the damage after a natural disaster provides essential information for determining rescue priorities, guiding victims to safe locations, and estimating the amount of damage [5]. Aerial photographs can provide a broader range of damage information in a smaller sample than land photographs [6,7]. Previous studies have considered many methods to identify damage from remote sensing images. These methods can be classified as multi-temporal and single-temporal assessment methods.
Multi-temporal assessment methods identify damage by detecting changes. The authors of [8] extracted earthquake damage information using high-resolution remote sensing images before and after the 2010 Yushu earthquake in Qinghai. The results showed that the object-oriented change detection method could extract damage conditions with high accuracy. The authors of [9] compared the roofs of buildings before and after the 2021 earthquake in Yangbi County, Dali Prefecture, and Yunnan Province. It was found that the investigation time to detect damage was significantly shorter than the manual investigation. However, it was a limited evaluation due to the angle and time constraints of capturing images [10].
Single-time assessment methods are less data constrained because they only analyze damage from post-earthquake remote sensing. The authors of [11] identified landslides by integrating nighttime light, multi-seasonal, and elevation data and by using neural networks to classify satellite imagery. However, factors such as noise and illumination in remote sensing images seriously affect detection accuracy [12], resulting in the inability to extract building information accurately.
Computer vision methods have been widely used to investigate the extent of disasters; they are mainly categorized into two types: stereotype and deep learning (DL) methods [13]. Stereotype methods usually rely on manually designed models from features such as color, texture, contours, and edges. However, these are highly subjective and often vary widely from scene to scene, which limits their applicability. Hence, DL has been brought into the spotlight [14]. Convolutional neural network (CNN) models can eliminate many processes involved in determining disaster damage. CNN can process low-level characteristics through deep structures to obtain high-level semantic information. Compared with handcrafted features, the high-level information is more abstract and robust. Several studies have been conducted on post-disaster building damage detection using remote sensing images and DL. The authors of [15] proposed a method to extract damage information from a group of buildings in post-earthquake remote sensing images by combining CNN and geographic information system (GIS) data. The authors of [16] proposed a method to detect post-disaster building damage using only pre-disaster images of buildings. The authors of [17] proposed a method to detect objects on building roofs, vehicles, debris, and flooded areas from post-disaster aerial video footage. In addition, a building damage detection method has been proposed and demonstrated using a ground-based imagery dataset [18].
Despite numerous trials, automatic disaster detection still needs improvement for a number of reasons. First, methods that do not use aerial photographs can determine which areas in the photographs are affected but cannot decide which regions on the map are affected. Second, methods using aerial photographs cannot distinguish a road disruption if it is covered by water because it is unknown whether a road exists on the map. Finally, there are limited publicly available image datasets depicting structural damage from disasters [19]; moreover, the damage caused by a tsunami is more complex than a house collapsing, making it challenging to study.
In this study, I use aerial photographs to determine which roads were damaged by the Great East Japan Earthquake. I apply a learning model developed using aerial photographs of a tsunami-damaged area to aerial photographs of other sites and verify the model's fit. Then, I visualize the road disruption by determining the presence or absence of damage on a mesh-by-mesh basis. In other words, the features of this study are as follows: The first is to add road information to the disaster area by overlaying aerial photographs and road segment data. The second is to attempt to identify tsunami damage with high accuracy by learning and applying mesh-based learning to complex tsunami disasters.

Materials and Methods
The training data for a traditional object detection model comprise definitions of where objects exist in each image. However, when there are various types of damage, such as damage caused by a tsunami, it may be challenging to identify the presence or absence of damage from such training data. The method proposed in this study identifies tsunami damage by developing a learning model using data from segmented photographs classified according to whether they were damaged by the tsunami. I demonstrated the significance Sustainability 2023, 15, 2936 3 of 16 of the proposed method by comparing its detection accuracy with that of a traditional method. Figure 1 shows the flowchart of this study. Both the traditional and proposed methods use the same training and test images. After collecting aerial photographs of the study site (Step 1), the tsunami inundation status for the training data was defined by referring to the inundation estimation map published by the Geospatial Information Authority of Japan [20] (Step 2). The traditional method uses training photographs to label which areas are affected by the tsunami (Figure 2). Meanwhile, the proposed method divides the training photographs into 100 m image units and classifies each image as having tsunami damage. Figure 1 shows the flowchart of this study. Both the traditional and proposed me ods use the same training and test images. After collecting aerial photographs of the stu site (Step 1), the tsunami inundation status for the training data was defined by referr to the inundation estimation map published by the Geospatial Information Authority Japan [20] (Step 2). The traditional method uses training photographs to label which ar are affected by the tsunami (Figure 2). Meanwhile, the proposed method divides the tra ing photographs into 100 m image units and classifies each image as having tsunami da age.

Method Flowchart
Before setting the unit to 100 m, I examined the relationship between unit size a computation time and found that a smaller range increases computation time but i proves accuracy. For example, on a Core i7 1195G7 (Tiger Lake)/2.9 GHz/4-core compu with 16 GB of memory, the computation time to handle 100 m mesh (2108 images) a 500 m mesh (84 images) images were 14 min 40 s, and 30 s, respectively. The F1 scores the 100 m mesh (2108 images) and 500 m mesh (84 images) images were 85% and 50 respectively. Step5: Calculate the accuracy of the model  The proposed method creates text files corresponding to image files to input the classified training images into training models. Each text file has five pieces of information: the classified flag, the x coordinate of the center, the y coordinate of the center, the width of the bounding box, and the length of the bounding box ( Figure 3) [21]. This means that each text classifies the entire area of each image as inundated or non-inundated. This process can determine an inundated image rather than detecting multiple objects in each image. The traditional method also provides a text file corresponding to the image. The text file records the rectangle of the tsunami damage location.
Step 3 was building a model of the You Only Look Once (YOLO) framework using the training data. Considering the computational time for training, I chose three variants: YOLOv5n, YOLOv5s, and YOLOv5m. These models were developed using Google Colaboratory [22].
Step 4 was preparing the test image dataset. Because I aimed to identify road damage caused by the tsunami, I matched the test images and road segment data. The tsunami inundation status is also defined for the test data by referring to the inundation estimation map published by the Geospatial Information Authority of Japan.
Step 5 was the calculation of model accuracies of the three models (YOLOv5n, YOLOv5s, and YOLOv5m) developed by each method on the test dataset. As shown in the figure in Step 5, the traditional method labels the tsunami damage area in the aerial photographs, whereas the proposed method indicates tsunami damage by classifying 100 m unit images.
Finally, the best YOLO model was selected based on the calculated accuracies (Step 6); the comparison of the accuracies of both methods indicated the superiority of the proposed method. Before setting the unit to 100 m, I examined the relationship between unit size and computation time and found that a smaller range increases computation time but improves accuracy. For example, on a Core i7 1195G7 (Tiger Lake)/2.9 GHz/4-core computer with 16 GB of memory, the computation time to handle 100 m mesh (2108 images) and 500 m mesh (84 images) images were 14 min 40 s, and 30 s, respectively. The F1 scores of the 100 m mesh (2108 images) and 500 m mesh (84 images) images were 85% and 50%, respectively.
The proposed method creates text files corresponding to image files to input the classified training images into training models. Each text file has five pieces of information: the classified flag, the x coordinate of the center, the y coordinate of the center, the width of the bounding box, and the length of the bounding box ( Figure 3) [21]. This means that each text classifies the entire area of each image as inundated or non-inundated. This process can determine an inundated image rather than detecting multiple objects in each image. The traditional method also provides a text file corresponding to the image. The text file records the rectangle of the tsunami damage location. The proposed method creates text files corresponding to image files to input the sified training images into training models. Each text file has five pieces of informa the classified flag, the x coordinate of the center, the y coordinate of the center, the w of the bounding box, and the length of the bounding box ( Figure 3) [21]. This means each text classifies the entire area of each image as inundated or non-inundated. This cess can determine an inundated image rather than detecting multiple objects in each age. The traditional method also provides a text file corresponding to the image. The file records the rectangle of the tsunami damage location.
Step 3 was building a model of the You Only Look Once (YOLO) framework u the training data. Considering the computational time for training, I chose three vari YOLOv5n, YOLOv5s, and YOLOv5m. These models were developed using Google C boratory [22].
Step 4 was preparing the test image dataset. Because I aimed to identify road dam caused by the tsunami, I matched the test images and road segment data. The tsu inundation status is also defined for the test data by referring to the inundation estim map published by the Geospatial Information Authority of Japan.
Step 5 was the calculation of model accuracies of the three models (YOLO YOLOv5s, and YOLOv5m) developed by each method on the test dataset. As show the figure in Step 5, the traditional method labels the tsunami damage area in the a photographs, whereas the proposed method indicates tsunami damage by classifying m unit images.
Finally, the best YOLO model was selected based on the calculated accuracies ( 6); the comparison of the accuracies of both methods indicated the superiority of the posed method.  Step 3 was building a model of the You Only Look Once (YOLO) framework using the training data. Considering the computational time for training, I chose three variants: YOLOv5n, YOLOv5s, and YOLOv5m. These models were developed using Google Colaboratory [22]. Step 4 was preparing the test image dataset. Because I aimed to identify road damage caused by the tsunami, I matched the test images and road segment data. The tsunami inundation status is also defined for the test data by referring to the inundation estimation map published by the Geospatial Information Authority of Japan.
Step 5 was the calculation of model accuracies of the three models (YOLOv5n, YOLOv5s, and YOLOv5m) developed by each method on the test dataset. As shown in the figure in Step 5, the traditional method labels the tsunami damage area in the aerial photographs, whereas the proposed method indicates tsunami damage by classifying 100 m unit images.
Finally, the best YOLO model was selected based on the calculated accuracies (Step 6); the comparison of the accuracies of both methods indicated the superiority of the proposed method.

Outline of YOLOv5 Model
Joseph Redmon [23] proposed the YOLO target detection algorithm in 2015. It is an end-to-end network model that directly predicts a target's bounding box and category. YOLO considers object detection a single regression problem, replacing image pixels with bounding box coordinates and class probabilities. Using this, one only needs to look at an image once to predict what object is where.
In this study, I selected YOLOv5, which was released in 2020 [24]. It is lightweight and has good advantages for detecting small objects in terms of accuracy and speed. In addition, because it integrates the anchor box selection process, it can learn the best anchor box for a given dataset automatically and use it during training without considering the dataset as input. The anchor box described here is a list of predefined boxes that best match the desired objects. YOLOv5 network predicts bounding boxes as deviations from a list of anchor box dimensions [25]. YOLOv5 outperforms YOLOv4 and YOLOv3 in terms of accuracy [26]. Figure 4 shows the YOLOv5 architecture; it comprises a backbone, neck, and head [27]. The backbone extracts the essential features from the input images. The CSP1-x structure is incorporated into DarkNet to create CSPDarknet, the backbone of YOLOv5. CSPDarknet extracts feature from images comprising CSP1-x networks. Point Operations per Second (FLOPS) features can develop smaller model sizes while ensuring inference speed and accuracy. The neck is a series of network layers that mix and combine image features. The head predicts image features, generates bounding boxes for detection, and predicts the target object type. The CSP2-X structure used here enhances network feature fusion capabilities; for multiscale prediction, the head generates feature maps of three different sizes: 80 × 80 grid cells, 40 × 40 grid cells, and 20 × 20 grid cells. Detection results include class, score, location, and size [28].
YOLOv5 delivers various types of models, e.g., YOLOv5n (nano), YOLOv5s (small), YOLOv5m (medium), YOLOv5l (large), and YOLOv5x (large) [29]. As described later in Chapter 3, YOLOv5m (medium) took about 8.8 h to train using the proposed method. Prior attempts using YOLOv5l (large) and YOLOv5x (large) failed because the Google Colaboratory session timed out during the calculation. Considering the trial, I compare the training results of YOLOv5n, YOLOv5s, and YOLOv5m and then implement the best model on the test data.
YOLOv5 delivers various types of models, e.g., YOLOv5n (nano), YOLOv5s (small), YOLOv5m (medium), YOLOv5l (large), and YOLOv5x (large) [29]. As described later in Chapter 3, YOLOv5m (medium) took about 8.8 h to train using the proposed method. Prior attempts using YOLOv5l (large) and YOLOv5x (large) failed because the Google Colaboratory session timed out during the calculation. Considering the trial, I compare the training results of YOLOv5n, YOLOv5s, and YOLOv5m and then implement the best model on the test data.

Outline of Google Colaboratory
Google Colaboratory is a service designed to educate and promote machine learning. Colaboratory notebooks are based on Jupyter and run as objects in Google Docs. The notebooks can be saved to the user's Google Drive or imported from GitHub; users can share Colaboratory notebooks such as Google Docs or Google Spreadsheets. The runtime stops after a particular time and all user data and settings are lost [30]. However, they can save the notebooks and transfer the files to the user's Google Drive.
The authors of [31] summarized the advantages and disadvantages of Google Colaboratory as follows. The advantages include fast computation; training a CNN is faster with Colaboratory's accelerated runtime than with 20 physical cores on a Linux server. Meanwhile, the disadvantages include the lack of CPU cores.

Training and Test Images
It is essential to select training data similar to the test data to develop suitable model accuracy. Similarity conditions include the time of year (elapsed time since the disaster), weather, and scale of the disaster. I selected aerial photographs considering this policy.
The aerial photographs used for the training images are of Yamada Town, Iwate Prefecture; Miyako City, Iwate Prefecture; Minamisanriku Town, Miyagi Prefecture; and Watari Town, Miyagi Prefecture. The test images are of Rikuzentakata City, Iwate Prefecture and Kesennuma City, Miyagi Prefecture. The tsunami caused by the Great East Japan Earthquake in 2011 damaged these cities. Figure 5 shows the locations of the training and test images, and Table 1 shows the number of fatalities, etc., and damage to residential properties in the Great East Japan Earthquake.
The aerial photographs used for the training images are of Yamada Town, Iwate Prefecture; Miyako City, Iwate Prefecture; Minamisanriku Town, Miyagi Prefecture; and Watari Town, Miyagi Prefecture. The test images are of Rikuzentakata City, Iwate Prefecture and Kesennuma City, Miyagi Prefecture. The tsunami caused by the Great East Japan Earthquake in 2011 damaged these cities. Figure 5 shows the locations of the training and test images, and Table 1 shows the number of fatalities, etc., and damage to residential properties in the Great East Japan Earthquake.   The photographs were obtained from Google Earth [33]; they were captured a few days after the earthquake. They were collected by selecting the area to include inundated/noninundated areas and divided into 100 m image units for the proposed method. Then, as shown in Figure 6, each 100 m mesh image was classified into "inundated image" and "non-inundated image" by referring to the inundation estimation map published by the Geospatial Information Authority of Japan. I made this classification manually on a 100 m mesh, which can be challenging to determine. For example, if 20% of the mesh contains inundation, it is comprehensively classified as inundated/not inundated based on photographs and the inundation estimation map. It is a limitation of the accuracy verification of this study.  The photographs were obtained from Google Earth [33]; they were captured a few days after the earthquake. They were collected by selecting the area to include inundated/non-inundated areas and divided into 100 m image units for the proposed method. Then, as shown in Figure 6, each 100 m mesh image was classified into "inundated image" and "non-inundated image" by referring to the inundation estimation map published by the Geospatial Information Authority of Japan. I made this classification manually on a 100 m mesh, which can be challenging to determine. For example, if 20% of the mesh contains inundation, it is comprehensively classified as inundated/not inundated based on photographs and the inundation estimation map. It is a limitation of the accuracy verification of this study. Table 2 shows the number of training and test images. Figures 7 and 8 show examples of inundated/non-inundated images in 100 m units for the proposed method. The traditional method develops a model after combing the images of the same area and predicts a merged test image of the same place.    Table 2 shows the number of training and test images. Figures 7 and 8 show examples of inundated/non-inundated images in 100 m units for the proposed method. The traditional method develops a model after combing the images of the same area and predicts a merged test image of the same place. tains inundation, it is comprehensively classified as inundated/not inundated based photographs and the inundation estimation map. It is a limitation of the accuracy ve cation of this study. Table 2 shows the number of training and test images. Figures 7 and 8 show exam of inundated/non-inundated images in 100 m units for the proposed method. The tr tional method develops a model after combing the images of the same area and pred a merged test image of the same place.

Road Segment
The road segment data for the test images are from the Conservation GIS-consortium Japan [34]. The road data were constructed based on the situation as of 2006 (before the earthquake). The road segment data ranges from small roads to arterial roads. The road classification of these data consists of national roads, prefectural roads, municipal roads, expressway national highways, etc.

Evaluation Indicators
In the traditional method, when the training model is input to a test image, the locations with a certain probability of being inundated are marked. In this study, I set the probability to 0.5.
Moreover, in the proposed method, when the training model is input to a set of test images, the probabilities of "inundation" and "non-inundation" are output for each image. After comparing the probabilities, the decision result chooses the image with the higher value. For instance, if an image has a probability of 0.5 for "inundation" and 0.6 for "noninundation," the decision for this image is "non-inundation." The probability here is the predicted probability of the target defined as intersection over union (IoU) [35], a standard indicator in target detection. The primary function is to determine positive and negative samples and evaluate the distance between the output box and the correct label.
In this study, I evaluate models using the following indicators: Precision, Recall, Specificity, and F1-score [36]. Precision is the True Positive (TP) divided by the detected objects, calculated in Equation (1). TP and True Negative (TN) are the indicators of correctly detected objects by a model and correctly missed objects by the model, respectively. False Positive (FP) and False Negative (FN) are the number of wrongly detected objects by the model and the number of wrongly missed objects by the model, respectively.
Recall is a ratio of correctly detected objects retrieved to the quantity of all detected objects.
Specificity is the percentage of true negatives correctly classified by a model.
F1-score is a measure of a model's overall accuracy considering Precision and Recall. It is the harmonic mean of Precision and Recall, which have contrasting characteristics.
In this study, the above decisions (TP, TN, FP, FN) are made on the units of 100 m segments for both the traditional and proposed methods. Figure 9 shows an example of distinguishment as taking a location with 25 100 m image units. The mesh numbers of the inundation are 9, 11, and 13-24 (second from the left in the figure). The mesh numbers predicted by the traditional method are 15,18,19,20,23, and 24 (second from the right in the figure). The red boxes indicate the inundation zones, and the 100 m mesh overlapping these zones is the inundation mesh. The inundation meshes predicted by the proposed method are 2,5,7,9-12, and 14-24 (first from the right in the figure). predicted by the traditional method are 15,18,19,20,23, and 24 (second from the right in the figure). The red boxes indicate the inundation zones, and the 100 m mesh overlapping these zones is the inundation mesh. The inundation meshes predicted by the proposed method are 2,5,7,9-12, and 14-24 (first from the right in the figure). The discriminant results of the traditional method are TP = 6, TN = 11, FP = 0, and FN = 8. The discriminant results of the proposed method are TP = 13, TN = 16, FP = 5, and FN = 1.

Training Result
To accurately ascertain the models' accuracies, I evaluated the models based on the loss function curve (train/box_loss) and average accuracy value (metrics/mAP_0.5) [37]. In the learning process, the loss function curve can intuitively reflect whether the network model converges stably with respect to the number of iterations.
The upper graph of Figure 10 shows the specific changes in the models' loss functions. The horizontal axis is the number of learning epochs, 1000 for the traditional models and 200 for the proposed models. The number of training epochs differs for each model to account for the computational time required for training. As described later, the proposed model structure took a long time to train, and Google Colaboratory timed out in the middle of the training. The figure shows that as the number of training cycles increases, the curves for both model structures gradually converge and the loss values decrease. The loss values of the proposed models are significantly smaller than those of the traditional models, proving the high accuracy of the proposed method.
The mAP measures the quality of a defect detection model. The higher the mAP value, the higher the average detection accuracy and the better the performance. The

Training Result
To accurately ascertain the models' accuracies, I evaluated the models based on the loss function curve (train/box_loss) and average accuracy value (metrics/mAP_0.5) [37]. In the learning process, the loss function curve can intuitively reflect whether the network model converges stably with respect to the number of iterations.
The upper graph of Figure 10 shows the specific changes in the models' loss functions. The horizontal axis is the number of learning epochs, 1000 for the traditional models and 200 for the proposed models. The number of training epochs differs for each model to account for the computational time required for training. As described later, the proposed model structure took a long time to train, and Google Colaboratory timed out in the middle of the training. The figure shows that as the number of training cycles increases, the curves for both model structures gradually converge and the loss values decrease. The loss values of the proposed models are significantly smaller than those of the traditional models, proving the high accuracy of the proposed method.
The mAP measures the quality of a defect detection model. The higher the mAP value, the higher the average detection accuracy and the better the performance. The lower graph of Figure 10 shows the training epoch trend with respect to mAP for all models; the mAP increases with the number of epochs.

Comparative Analysis of Models
Model accuracy is verified by implementing the training models on the test images. Table 3 shows the accuracy results using the three types of YOLO training models. Rikuzentakata City is more accurate for both methods than Kesennuma City. For the traditional method, the highest F1-score for Rikuzentakata City is 78% (for YOLOv5s), whereas the highest for Kesennuma City is 60% (for YOLOv5s). For the proposed method, the highest F1-score for Rikuzentakata City is 83% (for YOLOv5m), and the highest F1-score for Kesennuma City is 72% (for YOLOv5s). The accuracy of the proposed method is better than that of the traditional method. Focusing on F1-score, the traditional models have values in the range of 59-78%, whereas the proposed models have values in the range of 66-83%, indicating an accuracy improvement. lower graph of Figure 10 shows the training epoch trend with respect to mAP for all models; the mAP increases with the number of epochs.

Comparative Analysis of Models
Model accuracy is verified by implementing the training models on the test images. Table 3 shows the accuracy results using the three types of YOLO training models. Rikuzentakata City is more accurate for both methods than Kesennuma City. For the traditional method, the highest F1-score for Rikuzentakata City is 78% (for YOLOv5s), whereas the highest for Kesennuma City is 60% (for YOLOv5s). For the proposed method, the highest F1-score for Rikuzentakata City is 83% (for YOLOv5m), and the highest F1score for Kesennuma City is 72% (for YOLOv5s). The accuracy of the proposed method is better than that of the traditional method. Focusing on F1-score, the traditional models have values in the range of 59-78%, whereas the proposed models have values in the range of 66-83%, indicating an accuracy improvement.
However, the time to build a model is longer for the proposed method than for the traditional method. As stated earlier, I developed three training models each for the traditional and proposed methods. The model with the longest calculation time for the traditional method was YOLOv5m at 0.633 h. Meanwhile, the model with the longest calculation time for the proposed model was YOLOv5m at 8.800 h. Figures 11 and 12 depict the results for Rikuzentakata City and Kesennuma City using the traditional and proposed methods (YOLOv5s), respectively. Figure 11 shows that the traditional method has good detection accuracy for no-damaged areas, such as inland areas (TN), but does not detect the damaged areas in coastal locations correctly. In particular, the FN in Kesennuma City stands out. Figure 13 shows that it can detect damaged  However, the time to build a model is longer for the proposed method than for the traditional method. As stated earlier, I developed three training models each for the traditional and proposed methods. The model with the longest calculation time for the traditional method was YOLOv5m at 0.633 h. Meanwhile, the model with the longest calculation time for the proposed model was YOLOv5m at 8.800 h. Figures 11 and 12 depict the results for Rikuzentakata City and Kesennuma City using the traditional and proposed methods (YOLOv5s), respectively. Figure 11 shows that the traditional method has good detection accuracy for no-damaged areas, such as inland areas (TN), but does not detect the damaged areas in coastal locations correctly. In particular, the FN in Kesennuma City stands out. Figure 13 shows that it can detect damaged coastal areas with high accuracy. However, many wrong detections (FP) in the inland regions exist. The coastal areas of Rikuzentakata City and Kesennuma City were devastated, and the tsunami ran up rivers and caused extensive damage. The proposed method designated these locations as TPs. Figure 13 shows an enlarged view of a location in Kesennuma City, where FN is particularly abundant. The red boxes in the figure indicate the areas extracted as disaster-stricken areas by the traditional method. It is clear from the figure that the traditional method cannot detect most of the places as disaster-stricken areas where houses were not completely damaged.

Accuracy Verification Focusing on the Number of Samples
When applying the proposed method in practice, it is necessary to understand how much training data is needed to accurately identify the disaster situation. Table 4 shows the calculation results for validating the proposed method (YOLOv5s) by the percentage of samples in the training images. The 100% sample rate in the table means the total of the 3108 training images mentioned above, and the 75%, 50%, and 25% sample rates mean the corresponding extractions, chosen at random, from the total number of the training images.

Accuracy Verification Focusing on the Number of Samples
When applying the proposed method in practice, it is necessary to understand how much training data is needed to accurately identify the disaster situation. Table 4 shows the calculation results for validating the proposed method (YOLOv5s) by the percentage of samples in the training images. The 100% sample rate in the table means the total of the 3108 training images mentioned above, and the 75%, 50%, and 25% sample rates mean the corresponding extractions, chosen at random, from the total number of the training images.

Accuracy Verification Focusing on the Number of Samples
When applying the proposed method in practice, it is necessary to understand how much training data is needed to accurately identify the disaster situation. Table 4 shows the calculation results for validating the proposed method (YOLOv5s) by the percentage of samples in the training images. The 100% sample rate in the table means the total of the 3108 training images mentioned above, and the 75%, 50%, and 25% sample rates mean the corresponding extractions, chosen at random, from the total number of the training images. A common feature of both districts is that Precision, Specificity, and F1-score improved as the number of samples increased. Precision, Specificity, and F1-score were the highest in both areas at 100%.

Conclusions
DL is a novel technique to assess damaged situations quickly. In the case of complex damage situations such as tsunamis, it takes work to develop learning models. In this study, I used YOLOv5 to develop a learning model based on data from subdividing images and classifying them into tsunami-affected and tsunami-affected areas instead of labeling tsunami-affected regions from a single image. The proposed method can quickly identify damaged areas after a tsunami disaster. Once analyzers prepare the training model and road sections in units of 100 m mesh in advance, all that is required is to upload aerial photographs to Google Colaboratory for identification. In this study, 7092 aerial pictures were uploaded, including those without road segments, and the process took only a few seconds. In addition, it took approximately three minutes to classify inundation/noninundation using the training model.
In addition, the proposed method could automatically identify the damaged areas more accurately than the traditional method. Therefore, if a road administrator develops road sections per mesh and a learning model in preparation for disasters, it will be possible to detect which road sections are damaged simply by applying aerial photographs taken after the occurrence of a disaster.
Nevertheless, this study has limitations in terms of aerial photographs and detection accuracy. Regarding the selection of aerial photographs, I set training images similar to the test images in this study, but clouds and brightness might have reduced the accuracy. It is necessary to examine the improvement in accuracy by eliminating such factors.
Regarding the improvement of the accuracy of the detection results, the proposed method determines the damaged section by mesh unit. Thus, it cannot be considered separately if the same mesh contains different types of roads: high-standard arterial highways and local roads. It is necessary to enhance the distinguishing ability by adding elevation data with each section, thereby improving accuracy.