UAVs in Disaster Management: Application of Integrated Aerial Imagery and Convolutional Neural Network for Flood Detection

: Floods have been a major cause of destruction, instigating fatalities and massive damage to the infrastructure and overall economy of the affected country. Flood-related devastation results in the loss of homes, buildings, and critical infrastructure, leaving no means of communication or travel for the people stuck in such disasters. Thus, it is essential to develop systems that can detect ﬂoods in a region to provide timely aid and relief to stranded people, save their livelihoods, homes, and buildings, and protect key city infrastructure. Flood prediction and warning systems have been implemented in developed countries, but the manufacturing cost of such systems is too high for developing countries. Remote sensing, satellite imagery, global positioning system, and geographical information systems are currently used for ﬂood detection to assess the ﬂood-related damages. These techniques use neural networks, machine learning, or deep learning methods. However, unmanned aerial vehicles (UAVs) coupled with convolution neural networks have not been explored in these contexts to instigate a swift disaster management response to minimize damage to infrastructure. Accordingly, this paper uses UAV-based aerial imagery as a ﬂood detection method based on Convolutional Neural Network (CNN) to extract ﬂood-related features from the images of the disaster zone. This method is effective in assessing the damage to local infrastructures in the disaster zones. The study area is based on a ﬂood-prone region of the Indus River in Pakistan, where both pre-and post-disaster images are collected through UAVs. For the training phase, 2150 image patches are created by resizing and cropping the source images. These patches in the training dataset train the CNN model to detect and extract the regions where a ﬂood-related change has occurred. The model is tested against both pre-and post-disaster images to validate it, which has positive ﬂood detection results with an accuracy of 91%. Disaster management organizations can use this model to assess the damages to critical city infrastructure and other assets worldwide to instigate proper disaster responses and minimize the damages. This can help with the smart governance of the cities where all emergent disasters are addressed promptly.


Introduction
Floods are characterized by an enormous water flow in large magnitudes into dry lands from water bodies such as lakes, rivers, and oceans [1]. Generally, the water level change in the water bodies is not closely monitored, especially in developing countries, and is usually neglected until it results in flood-related disasters [2]. On average, 60,000 lives are lost globally to natural disasters every year [3]. Furthermore, floods are the most frequently occurring natural disasters globally, representing 40% of the total natural disasters [4,5]. explored Pakistan for UAV-based disaster management systems. Such approaches can be adopted by countries with similar economies as well as developed countries to developing UAV-based disaster management approaches. Furthermore, the damage assessment can be conducted for critical infrastructures such as bridges, buildings, and smart grids. The main goals achieved by this research align well with the United Nations International Strategy for Disaster Reduction and Sendai Framework for Disaster Risk Reduction 2015-2030.
The rest of the paper is organized as follows: Section 2 presents the state-of-the-art methods adopted for flood prediction, detection, and mapping. Deep learning models and image processing have been studied to gain insights into the current study. Section 3 discusses the methodology followed in conducting this research. This includes selecting the study area, dataset collection, CNN architecture design, experimental setup, model training, and testing phases. Section 4 presents the experimental results where the flood detection output on input test images has been illustrated. Further Section 4 discusses the results and highlights the findings and achievements of this research, and Section 5 concludes the paper.

Literature Review
In recent years, various researchers and disaster planners have emphasized developing effective flood warning and forecasting systems to mitigate disasters and plan efficient responses. Table 1 presents some of the state-of-the-art methods from the past decade proposed for flood management. Artificial Neural Network (ANN) for flood prediction and forecasting has been frequently observed in the literature [40]. For example, Sankaranarayanan et al. [41] used a deep neural network for flood forecasting. The authors used weather data such as rainfall intensity and temperature to train a multilayer ANN model to warn about a possible flood event. Such models can assess the potential damages to the infrastructure as well as homes and buildings in case of heavy rainfalls. Similarly, Doubleday J [42] used multiple hidden layers instead of a single one in their ANN model that yielded much higher accuracy than a traditional ANN model. Their proposed flood plain mapping system used SVM to segment the cloud, land, and water regions from satellite imagery that achieved an overall classification accuracy of 92%. However, their study is limited due to the usage of remotely sensed images, which are prone to noise and are weather-dependent. Further, such images cannot be attained for darker spaces such as in the building basements, areas under the bridges and other concealed areas. This dependency restricts the application in instigating immediate disaster responses as clouds are usually present during rain-related floods.
Elsafi [43] utilized an ANN model to predict flood events using upstream flow data using the river Nile case study in Sudan. Using the data related to flow at the upstream locations, the flows at certain river locations were simulated using a trained ANN model to predict flood events. Anusha and Bharathi [44] studied flood detection and mapping and used image processing techniques such as thresholding to identify water regions from input images. The dataset was collected by obtaining Synthetic Aperture Radar (SAR) images of the target area. Pre-processing was done to remove speckle noise from images by applying a median filter followed by image rectification. A pixel-based threshold was subsequently applied to classify each pixel as "water" or "non-water". Similarly, a study by Li et al. [46] show encouraging results where the proposed CNN framework has improved the кstatistic from 0.614 to 0.686 (7%) compared to its supervised counterpart. Improved precision for flooded open areas (FO) (from 0.506 to 0.684) is obtained at the cost of a slight decrease in recall (from 0.842 to 0.824), while both precision and recall for flooded built-up areas (FB) increase. In another study by Zhao et al. [47], CNN performed better than the SVM and random forests (RF), with an accuracy of 0.90 for CNN and 0.88 for LeNet-5 in the testing period. Similarly, Hosseiny et al. [45] used RFs for flood detection, coupled with the MLP model for flood depth detection in the target regions. The RF method yielded an accuracy of 98.5% in detecting the flooded regions and the MLP model achieved a regression coefficient of 0.88 for depth. Thus, the expense of simulations in large-scale hydraulic models can be reduced using this much simpler machine learning approach.
Other studies focused on using a warning system during flood situations and emphasized "warning communities" as a solution to overcome the disastrous effects caused by the floods [48]. Alfieri et al. [49] developed a flash flood warning system based on a probabilistic framework that uses a threshold exceedance method during heavy rainfalls. An index was also defined for measuring the severity of forecasted rainfall. Narayanan et al. [50] employed a method that measures and estimates the flood level from captured images using computer vision. These images may be captured and uploaded by the people in the disaster-hit area on the server. The authors used the Scale-invariant feature transform (SIFT) method for feature processing of the image. The geographical location of the region was acquired through geo-tagging information to plan an effective disaster response. Mousa et al. [51] used flood sensors for calculating the flood water level through ultrasonic range finder and temperature sensors.
The ANN approach was used to process the sensor's data and estimate the water level in key infrastructure, homes, and smart structures (Lin and Hu, 2018). Artificial Intelligence (AI) systems to control the disaster by mining data from social media platforms such as Facebook, Twitter, and Instagram have also been introduced and utilized by various studies [52,53]. The status updates, picture uploads, and comments have provided an emergency response with AI algorithms trained to recognize such images and comments and provide relief responses to the affected regions. For example, Facebook uses CNN for tagged face detection, and Google uses it for speech recognition and photo searching [54].
UAVs have been used and explored in many fields, such as smart real estate [9,15], smart cities [55], healthcare [56], and others. In the disaster management field, UAVs have been explored for post-disaster management using IoT [57], energy-efficient task scheduling and physiological assessment [58], public safety in disaster and crisis management [59], flood management, and others [4]. In addition, various techniques involving machine learning have been explored for efficient flood disaster management through UAVs. The specific applications include ANN, CNN, SVM, Image segmentation, and others [60][61][62]. The latest research focuses on CNN applications in flood disaster management using UAVs [63]. A key advantage of UAVs is their ability to move into any space and collect data, including pictures and real-time insights into the damage to the key infrastructure. UAVs have used CNN to extract images and videos to assess damages to structures, homes, and buildings induced by floods and leakages, such as pipeline leakages and real-time flood assessment [64,65]. The current study aligns with the latest research and explores CNN's applications for post-flood disaster management, devising a swift response to initiate rescue operations and assessing damages to critical infrastructure.
CNN has been used in multiple flood-related studies. In a study led by Mason et al. [66], the flood images shared on social media were fed to embedded metadata to detect flood patterns. The CNN approach was used on an already trained ImageNet to extract clear visual information from images uploaded onto social media. Word embedding was used by metadata for feeding textual information to a bi-directional RNN. Furthermore, glove vectors were used to set the word embedding, and the final images and text were combined to get an output that consisted of related information about the flood. Mar-tino [67] developed an AI system that detected flood events from social media, retrieved them, and derived their visual properties and metadata with a multi-modal approach. The pre-processing of the images was carried out, which consisted of cropping and pre-filtering based on the color and textual metadata. Lopez-Fuentes [68] used the CNN approach with Relation Network to study the disaster image retrieval and detect the flood in satellite images. Zha et al. [69] used a CNN model trained on images to analyze flood videos captured from surface structures, key infrastructure, smart grids, homes, and buildings. The authors sorted the video with systematic exploration and used the CNN model to capture the video's Spatio-temporal features. The 3D-CNN technique is used in various studies that use 3D kernels, which stack the adjacent frames of an input video and convolve [70]. Simonyan [71] controlled the video's spatial and temporal components using a two-stream approach in which two different CNN models with different inputs were utilized. One input consisted of a Red-Green-Blue (RGB) frame for capturing spatial features, while the other was composed of multiple frames and dense optical flow to capture temporal features in the video. Calculating the weight of both streams provides the final output of the image. Ye et al. [72] used the same two streams neural network approach and applied evaluation methods such as parameters or fusion and prediction methods.
Various methods of image segmentation have been presented during the past few decades. These methods utilize context and neighborhood information for image classification. For this purpose, first, the pixels are combined into larger and meaningful image objects as a single-pixel cannot provide precise semantic information [73]. The pixels are then expanded into objects by considering the properties of texture, size, and distance of the image [74]. Fully Convolutional Network (FCN) models have been used on traditional aerial imagery by semantic segmentation [75,76]. Fu et al. [77] used the FCN model to classify high-resolution satellite images and calculated the precision, recall, and kappa coefficient as 0.81, 0.78, and 0.83, respectively, which were found to be significant. Nguyen et al. [78] utilized a five-layered network algorithm to classify satellite images and obtained an average classification accuracy of 83%. Another experiment achieved 91% accuracy where the Cafenet and GoogleNet CNN architectures were utilized. These were integrated with three different learning modalities required to classify land use images obtained by RS [79].
Furthermore, deep learning applications have also been reported where hyperspectral images are used, and their classification results have proven to be better than the SVM [80]. For example, the use of SVM and deep neural networks for flood prediction has been explored by Sankaranarayanan et al. [41]. The monsoon parameters, such as rainfall intensity and temperature, are used for training. According to the results, the deep neural network achieved an accuracy of 91.18%, while the SVM yielded 85.57% accuracy. Similarly, in another study, deep learning models such as CNN yielded better performance than SVM and random forests for the land use classification using RGB images [81].
From the literature, the use of deep learning models for image retrieval, classification, and segmentation is evident, and results have shown superior performance to the traditional image processing and machine learning models. Thus, the current study focuses on using deep learning, specifically CNN, to detect floods from aerial imagery captured through UAVs. The aim is to bring performance improvement and explore the application of deep learning for flood detection to devise proper and swift disaster response plans. The state-of-the-art UAVs have a built-in FPGS and parallel processors as hardware, which carry out feature extraction after capturing an image. Since CNN is less extensive due to the PL, it has been used in the current study to simplify image segmentation, feature classification, or texture classification. Furthermore, the CNN method is quite robust and fast, making it the preferred method in the current study.

Materials and Methods
A detailed study of the literature was conducted to understand the current technologies used for flood detection. The purpose was to identify the current technologies, their utilization, and find existing gaps in the literature. The literature was retrieved from Google Scholar, Emerald Insight, Springer, Taylor and Francis library, ASCE, ACM, MDPI, Web of Science, and Scopus repositories. The search period was restored to literature published post-2010 to keep a recent focus on literature. The search strings utilized included "flood detection using machine learning" and "flood using deep learning", "SVM for flood detection" coupled with the terms CNN, RNN, and LSTM. A limit of English language and articles published as research papers only was applied to yield 148 articles. Similar methods have been used in recent studies for retrieving relevant literature [5,7,82].
According to the extensive literature review in the current study, Machine learning has emerged as a promising domain for disaster prediction and management. Deep learning is a subset of machine learning that uses a multilayer network of neurons to automatically extract features from an image, learn about the features, and classify them. In the current study, CNN and some other techniques have been used to detect floods from multispectral aerial images captured from the Indus River located in Pakistan. Figure 1 shows the overall methodology followed for flood detection from aerial images. Aerial images to be used as input to the CNN model are collected through UAVs from the target region. Images can also be retrieved and used from online sources such as Google Earth or social media. These images belong to two classes: Pre-and Post-disaster. The spatial information of both sets of images is identical. However, they have a difference in their time series. For example, one image may be captured in 2011, whereas the other is taken in 2016. Aerial images to be used as input to the CNN model are collected through UAVs from the target region. Images can also be retrieved and used from online sources such as Google Earth or social media. These images belong to two classes: Pre-and Post-disaster. The spatial information of both sets of images is identical. However, they have a difference in their time series. For example, one image may be captured in 2011, whereas the other is taken in 2016. After acquiring images of the target location, the system underwent two phases: training and testing. The alterations of pre-disaster and post-disaster images are used as the input training patches. The goal is to observe and identify flood-related changes in the  After acquiring images of the target location, the system underwent two phases: training and testing. The alterations of pre-disaster and post-disaster images are used as the input training patches. The goal is to observe and identify flood-related changes in the post-disaster images. The model learned these differences between pre-and post-disaster images as the image features for detecting the change. In previous studies, such as those presented by Joshi et al. [31], only pre-disaster imagery was used for flood detection.
In contrast, other studies either utilized pre-or post-disaster images, making it difficult to perform change detection, which led to errors at the initial stage. Therefore, in this research, the learning phase is improved and made error-free by incorporating both preand post-disaster imagery. The disaster-related changes identified in the output reveal the occurrence of disasters. Both pre-and post-disaster images are three-channeled RGB images, making the training patch with six channels. This results in enhanced precision and higher accuracy in performance. A similar study utilized four-channel input images, i.e., RGB and InfraRed (IR), to support their inner layer [83]. Figure 2 represents the complete architecture of the current study model for fast and accurate detection of flooded regions. The CNN architecture proposed in this research is inspired by AlexNet architecture. It consists of three convolution layers, three pooling layers, and two fully connected layers. CL, PL, and Fully Connected layers (FC) are represented in the architecture diagram by C, P, and FC, respectively. CL is used to extract different features such as edges and texture from images. These features are learned from the pre-and post-flood images to recognize the differences between both types of images to be classified accurately. An activation function follows each CL. Here a RELU activation function has been used with convolutional layers. PL reduces the number of parameters in the case of large-sized images, thus reducing their dimensions. This makes the learning process less extensive. Finally, the output from the previous layers is flattened and fed to the FC, which contains an activation function, i.e., softmax, to generate the final classification output.
Sustainability 2021, 13, x FOR PEER REVIEW 8 of 22 precision and higher accuracy in performance. A similar study utilized four-channel input images, i.e., RGB and InfraRed (IR), to support their inner layer [83]. Figure 2 represents the complete architecture of the current study model for fast and accurate detection of flooded regions. The CNN architecture proposed in this research is inspired by AlexNet architecture. It consists of three convolution layers, three pooling layers, and two fully connected layers. CL, PL, and Fully Connected layers (FC) are represented in the architecture diagram by C, P, and FC, respectively. CL is used to extract different features such as edges and texture from images. These features are learned from the pre-and post-flood images to recognize the differences between both types of images to be classified accurately. An activation function follows each CL. Here a RELU activation function has been used with convolutional layers. PL reduces the number of parameters in the case of large-sized images, thus reducing their dimensions. This makes the learning process less extensive. Finally, the output from the previous layers is flattened and fed to the FC, which contains an activation function, i.e., softmax, to generate the final classification output.

Data Collection and Target Area
The target area for this study is the Indus River, located in Pakistan, which is among the longest rivers in Asia. It flows through China, India, and Pakistan. The current study's target area is the Indus Basin in Punjab, Pakistan, as shown in Figure 3. According to a United Nations report, the Indus River has been affected by severe flooding resulting in 1980 fatalities and impacted 18 million people in Pakistan. In addition, approximately 1.6 million families have been displaced during the floods due to either damage or destruction of their homes [84]. Furthermore, most of the population is directly affected due to the damages to homes, real estate, properties, and buildings. Therefore, the Indus Basin is one of the most flood-prone regions of Pakistan and is selected for the current study to examine the effects of flood disasters in this region and accurately map the floodaffected regions in case of any future disaster.

Data Collection and Target Area
The target area for this study is the Indus River, located in Pakistan, which is among the longest rivers in Asia. It flows through China, India, and Pakistan. The current study's target area is the Indus Basin in Punjab, Pakistan, as shown in Figure 3. According to a United Nations report, the Indus River has been affected by severe flooding resulting in 1980 fatalities and impacted 18 million people in Pakistan. In addition, approximately 1.6 million families have been displaced during the floods due to either damage or destruction of their homes [84]. Furthermore, most of the population is directly affected due to the damages to homes, real estate, properties, and buildings. Therefore, the Indus Basin is one of the most flood-prone regions of Pakistan and is selected for the current study to examine the effects of flood disasters in this region and accurately map the flood-affected regions in case of any future disaster. Image data for the dataset is collected from UAV-based images and online sources such as Google Earth. It consists of both pre-and post-flood images. As images in the collected dataset had varying sizes and resolutions, each image was resized to 256 × 256 pixels to have uniform imagery for training. Two sets of images are created for the experiments, which are training and testing sets. The training set consists of 10 subsets, each containing 2150 patches. The testing set consists of 14 sets, having 2150 patches in each set. All images in the training dataset are of the same size and position. In addition, the alignment of images from both pre-and post-disaster categories and the ground truth imagery is the same. This is because the system requires inputs with constant dimensionality. For training, images can be of varying intensities. The query image for testing can also be of different intensity, alignment, and location so that the system performance can be tested for varying inputs.

Experimental Setup
Experiments are conducted on an HP Envy ×360 laptop with Intel core™ i7, 2.7GHz base clock speed, and 8GB RAM. The operating system is Windows 10, with an AMD Radeon R7 M265 graphics processor. The deep learning framework Convolutional Image data for the dataset is collected from UAV-based images and online sources such as Google Earth. It consists of both pre-and post-flood images. As images in the collected dataset had varying sizes and resolutions, each image was resized to 256 × 256 pixels to have uniform imagery for training. Two sets of images are created for the experiments, which are training and testing sets. The training set consists of 10 subsets, each containing 2150 patches. The testing set consists of 14 sets, having 2150 patches in each set. All images in the training dataset are of the same size and position. In addition, the alignment of images from both pre-and post-disaster categories and the ground truth imagery is the same. This is because the system requires inputs with constant dimensionality. For training, images can be of varying intensities. The query image for testing can also be of different intensity, alignment, and location so that the system performance can be tested for varying inputs.

Experimental Setup
Experiments are conducted on an HP Envy ×360 laptop with Intel core™ i7, 2.7 GHz base clock speed, and 8 GB RAM. The operating system is Windows 10, with an AMD Radeon R7 M265 graphics processor. The deep learning framework Convolutional Archi-tecture for Fast Feature Embedding (Caffe) and the CNN architecture of AlexNet were utilized in the study.
On this platform, the Caffe framework is used as a deep learning model. It is a deep learning framework written in C++ and has an interface based on python. It is specifically developed for deep learning models focused on image classification and segmentation tasks. Therefore, it has a high processing speed and can do processing on a large number of images [85]. Figure 4 shows the implementation of CNN to build a deep learning model based on the Caffe framework. First, the flood data patches obtained from both pre-and post-flood images are used to train a CNN model. Second, the Caffe framework is used as a deep learning model after the CNN learning phase. Finally, the Caffe framework is used in the testing phase to classify disaster and non-disaster regions from a query image. Furthermore, a new image is provided to this model that is classified as either flooded or non-flooded. Architecture for Fast Feature Embedding (Caffe) and the CNN architecture of AlexNet were utilized in the study. On this platform, the Caffe framework is used as a deep learning model. It is a deep learning framework written in C++ and has an interface based on python. It is specifically developed for deep learning models focused on image classification and segmentation tasks. Therefore, it has a high processing speed and can do processing on a large number of images [85]. Figure 4 shows the implementation of CNN to build a deep learning model based on the Caffe framework. First, the flood data patches obtained from both pre-and post-flood images are used to train a CNN model. Second, the Caffe framework is used as a deep learning model after the CNN learning phase. Finally, the Caffe framework is used in the testing phase to classify disaster and non-disaster regions from a query image. Furthermore, a new image is provided to this model that is classified as either flooded or non-flooded.

Training Phase
The learning of the system is carried out in the training phase (shown in Figure 5). Accordingly, all the possible features and characteristics of a certain geographical change or damage to homes and buildings or critical infrastructure caused by floods are studied in detail. At this point, the CNN approach is used for the detection of disasters. The first step of this phase is creating training patches made by trimming the pre-and post-disaster images and the images of ground reality from a particular disaster scene. The images obtained from the flooded land area of 256 × 256 pixels are cropped to 32 × 32 pixels patches in the current study. By trimming the images, 2150 patches are obtained from a single aerial image captured through UAV on the surface or homes and buildings. Next, the training patches are aligned in a top to the bottom position, as represented in Figure  5. The process is repeated for both pre-and post-disaster image patches. After alignment, a comparison is made between the combined disaster training patches and the patches of ground reality. The training patches are subsequently labeled as 0 or 1. Number "zero" is allotted when the change rate, i.e., the white region in the ground reality patches, is less than or equal to 15%, whereas "one" is allotted on the change rate greater than 10%. Thus, 0 represents the occurrence of significant change in the image, whereas 1 represents no change. In other words, 0 highlights the occurrence of a disaster, whereas 1 represents no disaster. After labeling the training patches, the images are saved and used to train the

Training Phase
The learning of the system is carried out in the training phase (shown in Figure 5). Accordingly, all the possible features and characteristics of a certain geographical change or damage to homes and buildings or critical infrastructure caused by floods are studied in detail. At this point, the CNN approach is used for the detection of disasters. The first step of this phase is creating training patches made by trimming the pre-and post-disaster images and the images of ground reality from a particular disaster scene. The images obtained from the flooded land area of 256 × 256 pixels are cropped to 32 × 32 pixels patches in the current study. By trimming the images, 2150 patches are obtained from a single aerial image captured through UAV on the surface or homes and buildings. Next, the training patches are aligned in a top to the bottom position, as represented in Figure 5. The process is repeated for both pre-and post-disaster image patches. After alignment, a comparison is made between the combined disaster training patches and the patches of ground reality. The training patches are subsequently labeled as 0 or 1. Number "zero" is allotted when the change rate, i.e., the white region in the ground reality patches, is less than or equal to 15%, whereas "one" is allotted on the change rate greater than 10%. Thus, 0 represents the occurrence of significant change in the image, whereas 1 represents no change. In other words, 0 highlights the occurrence of a disaster, whereas 1 represents no disaster. After labeling the training patches, the images are saved and used to train the CNN model. CNN is trained for the data of more than 250,000 iterations, and the log is maintained. CNN model. CNN is trained for the data of more than 250,000 iterations, and the log is maintained. Figure 5. The training phase of the proposed method.

Testing Phase
The test phase is necessary to detect and evaluate the disaster-affected regions to plan an effective disaster response. As shown in Figure 6, the disaster detection results are extracted and prepared for the national disaster agency to work on. For successful operation and faster output delivery to the operator, the testing phase first merges the RGB channels of pre-and post-disaster images from the testing phase into a single image. The prediction value for disaster incidence is calculated using a raster scan over the image by sliding it over 16 pixels. From the previous predictions of the training phase, the patches with the "1" label are extracted and assembled to make a 32 × 32 pixel sized rectangle representing the actual disaster region. The method results are compared with actual on-ground images that underwent a raster scan on a 32 × 32 pixels region of interest

Testing Phase
The test phase is necessary to detect and evaluate the disaster-affected regions to plan an effective disaster response. As shown in Figure 6, the disaster detection results are extracted and prepared for the national disaster agency to work on. For successful operation and faster output delivery to the operator, the testing phase first merges the RGB channels of pre-and post-disaster images from the testing phase into a single image. The prediction value for disaster incidence is calculated using a raster scan over the image by sliding it over 16 pixels. From the previous predictions of the training phase, the patches with the "1" label are extracted and assembled to make a 32 × 32 pixel sized rectangle representing the actual disaster region. The method results are compared with actual on-ground images that underwent a raster scan on a 32 × 32 pixels region of interest for enhanced accuracy. Afterward, the performance is evaluated by three parameters: precision, recall, and f-score. Precision calculates how many positive predictions belong to the positive class. The recall represents the number of positive predictions out of all the positive samples. F1-score provides a numeric value to balance the precision and recall concerns. The performance assessment measures are calculated as shown in Equations (1)- (3): for enhanced accuracy. Afterward, the performance is evaluated by three parameters: precision, recall, and f-score. Precision calculates how many positive predictions belong to the positive class. The recall represents the number of positive predictions out of all the positive samples. F1-score provides a numeric value to balance the precision and recall concerns. The performance assessment measures are calculated as shown in Equations (1)-(3): Again, the test log is maintained for reference.  Again, the test log is maintained for reference. The data acquisition by UAVs through digital aerial photography has certain features. The day and night surveillance over the target area is carried out using infrared thermal imaging. A drone requires two-axis gyro stability, with the calculation of target location and tracking target image. The real-time monitoring of the location can be carried out during day or night by combing thermal imaging and visible light. The target can be captured and located automatically with the help of the image feature or lock table. Overall, the UAV used in such studies should have the characteristics as shown in Table 2.

Results and Discussions
Five segments of aerial images captured through UAVs in the affected areas are involved for each input image in the dataset. These are pre-or post-disaster images, training inputs to the model, ground truth images, segmentation outputs, and the final flood detection results illustrating the various performance metrics, as shown in Figures 7 and 8. Calculation of accuracy is important, and its high value determines the reliability of the system. It is calculated by various parameters such as true-positive (represented in red color in Figures 7 and 8, true-negative (blue color), false-positive (green color), and false negative (yellow color). These parameters are required to calculate precision, f-score, and recall, which are performance metrics for disaster detection.
Flood is detected from the test images with high accuracy and robustness. The two sets of flood detection results represented in Figures 7 and 8 show images captured at the Indus River Region in Pakistan. Figure 7 shows results on an input test image captured from the Indus River basin region I. As shown in this Figure 7, nearly all the flooded regions are successfully detected by the system. Figure 8 shows the results obtained from an image captured from the Indus River region II. Some small regions in the rooftops are falsely highlighted as flooded, while some regions containing shadows are falsely perceived as land by the system, as shown in the output image (d). Such errors can be avoided by using appropriate pre-processing steps on images such as shadow and noise removal. However, apart from some minor false detections, the system has correctly recognized all major flooded regions. Sustainability 2021, 13, x FOR PEER REVIEW 14 of 22 Flood is detected from the test images with high accuracy and robustness. The two sets of flood detection results represented in Figures 7 and 8 show images captured at the Indus River Region in Pakistan.
Shows results on an input test image captured from the Indus River basin region I. As shown in this Figure 7, nearly all the flooded regions are successfully detected by the system. Figure 8 shows the results obtained from an image captured from the Indus River region II. Some small regions in the rooftops are falsely highlighted as flooded, while some regions containing shadows are falsely perceived as land by the system, as shown in the output image (d). Such errors can be avoided by using appropriate pre-processing steps on images such as shadow and noise removal. However, apart from some minor false detections, the system has correctly recognized all major flooded regions.

Training and Testing Rates
The system demonstrates an adept training rate, with a loss rate of 0.4 at 250,000 iterations, manifested by the AlexNet model-the loss function defects how well specific algorithm models the given data. If predictions deviate too much from actual results, a loss would be higher and vice versa. Equation (4) presents the losses in the current study.
where n is the number of training examples, i is the ith training example in a data set, yi is the ground truth label for ith training example, and zi prediction for ith training.

Training and Testing Rates
The system demonstrates an adept training rate, with a loss rate of 0.4 at 250,000 iterations, manifested by the AlexNet model-the loss function defects how well specific algorithm models the given data. If predictions deviate too much from actual results, a loss would be higher and vice versa. Equation (4) presents the losses in the current study.
where n is the number of training examples, i is the ith training example in a data set, yi is the ground truth label for ith training example, and zi prediction for ith training. A stable training rate is achieved after 10,000 iterations, as depicted in Figure 9. A computing time of 9 min is required for learning at 250,000 iterations as the GPU accelerates the learning time. For a stable testing rate, 10,000 iterations are made with an accuracy of 0.82, as shown in Figure 10. Conversely, 250,000 iterations had an accuracy rate of 0.91. It took 6 s to test an image of 256 × 256 pixels. A stable training rate is achieved after 10,000 iterations, as depicted in Figure 9. A computing time of 9 min is required for learning at 250,000 iterations as the GPU accelerates the learning time. For a stable testing rate, 10,000 iterations are made with an accuracy of 0.82, as shown in Figure 10. Conversely, 250,000 iterations had an accuracy rate of 0.91. It took 6 s to test an image of 256 × 256 pixels. Table 3 summarizes the flood detection results. The proposed model achieved an overall accuracy of 91% in all the patches of the dataset available. Lopez-Fuentes [68] developed a system to detect floods from social media posts using a deep learning-based CNN model to retrieve visual features and a bi-directional LSTM model. The system achieved an accuracy of 84%. In comparison, the current study achieved a higher accuracy rate; thus, a single CNN model can yield higher accuracy than the approach that incorporated both CNN and LSTM. Sustainability 2021, 13, x FOR PEER REVIEW 16 of 22     At the end of the research, a test is conducted in a non-disaster region to check the system's authenticity. The test on a non-disaster region illustrates the system's capability to successfully distinguish between disaster and non-disaster regions, as no changes are seen in the output image with no color difference. Thus, the system gave accurate results and detected no disaster, as presented in Figure 11. This proves the authenticity of the system to distinguish between flooded regions and the river. Although the river and flood share the same color, the system was precise enough to detect differences between the two entities.
incorporated both CNN and LSTM.

Region
Precision (P) Recall (R) F1-Score (F) Indus River I (Figure 7) 0.84 0.91 0.87 Indus River II (Figure 8) 0.93 0.75 0.83 At the end of the research, a test is conducted in a non-disaster region to check the system's authenticity. The test on a non-disaster region illustrates the system's capability to successfully distinguish between disaster and non-disaster regions, as no changes are seen in the output image with no color difference. Thus, the system gave accurate results and detected no disaster, as presented in Figure 11. This proves the authenticity of the system to distinguish between flooded regions and the river. Although the river and flood share the same color, the system was precise enough to detect differences between the two entities. The flood has been detected from multispectral aerial images captured through UAV using a CNN model and automatic difference extraction in this research. It applies to all types of images, whether related to buildings or other infrastructure if the quality is consistent. For different sets of images, f1-scores of 0.83 and 0.87 are achieved. The method is successful in detecting disaster regions with high accuracy for both surface structures and agricultural lands. It depicts notable improvements in f-score, precision, and recall parameters. A similar study showed a 90% accuracy value using the CNN method to determine the flood incidence with classification [86]. Li et al. [46] used an active, selflearning CNN to detect flooded areas from SAR imagery. This model was used to overcome the impact of insufficient annotated data for training and recorded an accuracy of 86.91%. The flood has been detected from multispectral aerial images captured through UAV using a CNN model and automatic difference extraction in this research. It applies to all types of images, whether related to buildings or other infrastructure if the quality is consistent. For different sets of images, f1-scores of 0.83 and 0.87 are achieved. The method is successful in detecting disaster regions with high accuracy for both surface structures and agricultural lands. It depicts notable improvements in f-score, precision, and recall parameters. A similar study showed a 90% accuracy value using the CNN method to determine the flood incidence with classification [86]. Li et al. [46] used an active, selflearning CNN to detect flooded areas from SAR imagery. This model was used to overcome the impact of insufficient annotated data for training and recorded an accuracy of 86.91%.
In comparison, the method proposed in the current study shows better outcomes than other studies utilizing CNN for disaster management. Apart from accuracy, during image processing in the current study, all the six RGB channels are combined to obtain the disaster regions using the CNN model. Thus, no color information is lost during this process. Previously, work was done based on two channels: one pre-disaster channel having either R, G, or B channel or the same for the post-disaster image, which used a simple subtraction method. However, the current method with improved six channels retained the original color content of both pre-and post-disaster phases.
The accuracy of flood detection in the current study is further improved by eliminating ambiguity and misalignment in the results. For this purpose, the input dataset composing pre-and post-disaster imagery and the ground reality is first subjected to an aligned adjustment process. Then, the resulting data are passed on to the training phase. The data used in the presented model have the same color variation, and all images are taken on a sunny day. Having different color variations could create complexity. For example, if taken in cloudy weather, darker regions, or rainfall, the images can have color variations, making the detection process difficult. Therefore, before moving the images to the training phase, it is important to pre-process them and increase the variation to perform well in all sorts of weather conditions. Blaschke [29] utilized three different training strategies to train the CNN model for flood detection. Accordingly, in the current study, the RGB channels and IR are normalized before the model's training to remove location bias due to environmental changes distorting the image.
For flood detection and damage assessment, both the image pattern and several training patches are considered as these factors strongly affect the final output's accuracy. Both these properties are directly proportional to the overall accuracy. For example, a study led by Ahmad et al. used many training samples whose quantity was increased by performing augmentation operations on the images. The operations include horizontal and vertical flipping and changing brightness levels at an interval of ±40% [87]. Thus, many patterns and enormous training patches will ultimately result in higher accuracy and improved results adopted in the current study.

Conclusions
The current study uses a deep learning approach for detecting flooded areas in city infrastructure through UAV-captured images that have demonstrated excellent performance with an accuracy of 91%. The results have been compared with existing methods to illustrate a significant improvement over the previous flood detection model. The method proposed in this paper is useful in detecting and extracting the flooded regions in input images. It can be implemented in every disaster management situation and adopted by the global disaster response and relief organizations. The model can help detect the disaster area in an emergency and narrow down the search process so that relief routes can be determined immediately, and support can be provided to the people affected in the disaster region. Hence, casualties can be reduced, and rescue operations can be initiated immediately by the responsible organization. It is equally applicable to structures and agricultural land where the images can be captured through UAVs from the flooded region. This project can also serve as a cost-effective solution for flood detection and, therefore, assist developing countries in post-flood relief activities by quickly identifying and locating inundated areas. Other countries frequently affected by flood disasters can also benefit from this system.
One limitation of the system is that some minor false detections are noticed when shadows and other noisy pixels are classified as flooded regions. This is particularly evident in darker spaces that may appear like water and misclassified as a flooded regions. However, using sophisticated cameras in UAVs with inbuilt flashes and high-performance image enhancement and editing tools. In further work on the proposed system, noise and shadow removal techniques can be imposed in the pre-processing phase to improve the performance. In the future, the dataset can be improved and enhanced through the addition of more training images. The accuracy of the system can be noticeably enhanced by expanding the training dataset. This can be done by capturing more images, varying in scale, color, illumination, and other physical properties so that the system can work precisely under a diverse collection of test images. This will add to the versatility of the system as well as improve its performance.

Data Availability Statement:
The data is available with the first author and can be shared with anyone based on a reasonable request.