Flood Detection Using Multi-Modal and Multi-Temporal Images: A Comparative Study

: Natural disasters such as ﬂooding can severely affect human life and property. To provide rescue through an emergency response team, we need an accurate ﬂooding assessment of the affected area after the event. Traditionally, it requires a lot of human resources to obtain an accurate estimation of a ﬂooded area. In this paper, we compared several traditional machine-learning approaches for ﬂood detection including multi-layer perceptron (MLP), support vector machine (SVM), deep convolutional neural network (DCNN) with recent domain adaptation-based approaches, based on a multi-modal and multi-temporal image dataset. Speciﬁcally, we used SPOT-5 and RADAR images from the ﬂood event that occurred in November 2000 in Gloucester, UK. Experimental results show that the domain adaptation-based approach, semi-supervised domain adaptation (SSDA) with 20 labeled data samples, achieved slightly better values of the area under the precision-recall (PR) curve (AUC) of 0.9173 and F1 score of 0.8846 than those by traditional machine approaches. However, SSDA required much less labor for ground-truth labeling and should be recommended in practice. we conducted a comparative study of various methods for ﬂood detection including SVM, MLP, deep CNN models and domain adaptation method combined with those models. The comparison was conducted on the multi-temporal optical (Spot-5) and radar (SAR) sensor image dataset used for the ﬂood-detection competition organized by DFTC in 2009. We found that all the compared methods performed similarly in ﬂood detection, and the SSDA method, which is a combination of deep CNN with a semi-supervised domain adaptation strategy, achieved slightly better performance using much less training samples as compared with the competition winner methods. Our study suggests that the SSDA method is an effective algorithm for ﬂood detection and it is preferred in practice. In future work, we will apply domain adaptation methods to more datasets across different locations to further validate our ﬁndings.


Introduction
Flooding is one of the most severe forms of natural disasters. It affected all over the world and caused millions of human deaths [1,2]. Flooding is the second deadliest natural disaster in the United States and hurricanes Katrina, Mathews, Florence and Sandy caused severe property damages and thousands of lives were lost [3]. To mitigate the damage of flooding, an early accurate estimation of the flooded area is a necessary step for a rescue team to help the people in the affected area.
Throughout the years, researchers developed automatic methods to monitor flood events using remote sensing data. Automatic methods are quick, cost-effective, and monitoring from distance as compared to traditional methods, which require labor, in-person visit and longer processing time. In remote sensing domain, wide range of sensors are available. Some provide better spatial resolutions and others provide more frequent temporal revisit. Remote sensing-based flood-detection methods can be divided into two categories: optical and radar sensor. Optical data such as WorldView2 [4], Landsat [5][6][7], Spot-5 [8], sentinel [9], Moderate-Resolution Imaging Spectroradiometer (MODIS) [7,10] and RADAR [8,[11][12][13] had been explored for flood detection. In the meanwhile, machine-learning algorithms including decision tree algorithm [4,14], SVM [6,8], MLP [8] had been developed for flood detection.
The data fusion technical committee (DFTC) of the IEEE Geo-Science and Remote Sensing Society organized a contest to perform flood detection using a multi-temporal and multi-modal dataset in 2009 [8]. This dataset contains SPOT-5 satellite images and radar images from both the pre-and post-flooding event occurred in 2000 in Gloucester, UK. In the contest, traditional machine-learning • Our experimental results showed that domain adaptation-based methods can achieve competitive performance for flood detection and require much less labeled samples in the post-event images for model fine-tuning. • Our recommendation for the community is that domain adaptation methods require less labor and are better tools for flood detection.
The paper is structured as follows: Section 2 describes data and experiment design. Sections 3 and 4 present results and discussions, respectively, and Section 5 summarizes the paper.

Dataset
We used two heterogeneous satellite image sensors including SPOT-5 and a European Remote Sensing 1 (ERS-1) SAR images as shown in Figure 1 from the competition [8]. The images were captured from the flooding event occurred in November 2000 in Gloucester, UK. The SPOT-5 optical sensor image contains three bands including Red, Green, and Near-inferred-1 and radar image only contains the SAR channel. The green, red and near IR channels have spectral ranges of 0.50-0.60 µm, 0.61-0.68 µm, and 0.78-0.89 µm, respectively, with 20 m spatial resolution. The SAR image uses C-band, 5 GHZ frequency with linear VV polarization. The SAR images also use wave mode to collect the temporal images with a spatial resolution of 10 m and a swath width of 5 km.
The pre-event and post-event images are shown in Figure 1. Two temporal images of SPOT-5 are taken in September 2000 and November 2000 as shown in Figure 1a,c. The pre-event and post-event of SAR images are taken in October 2000 and November 2000 as displayed in Figure 1b,d, respectively. The original ground truth provided by the organizers were found to be inaccurate and we used the updated ground truth [15] in our study as shown in Figure 2b. For the SAR image, we took a logarithm operation on the pixel values and applied a de-speckle filtering process with a 7 × 7 window to improve its resolution.

Data Augmentation with Morphological Operation
Some winners of the competition augmented the data with morphological operation. For a fair comparison, we also created four additional bands using morphological operation as described in [8]. We used opening by reconstruction (OR) operation on the NIR band with a circular structuring element radius of 40 and 100. In addition, we implemented closing by reconstruction (CR) operation on the images resulted from the last step with a circular structuring element radius of 60 and 90. In total, we created four additional bands based on the original images. The created four bands for the pre-event and post-event image are shown in Figure 3.

Multi-Layer Perceptron (MLP)
Multi-layer Perceptron (MLP) is also known as feed-forward artificial neural network (ANN) uses a perceptron algorithm to perform tasks. Geoffrey Hinton et al. introduced a back-propagation algorithm [37] which made the MLP algorithm effective and widely popular among researchers. The back-propagation algorithm adjusts the weights of the MLP model to minimize the difference between the actual output and the desired output [37]. MLP usually uses three layers of network to perform classification and regression tasks. MLP algorithm has been effectively used for flood detection by the winners of the competition [8].
We re-implemented the MLP classifier that achieved the best performance in the data fusion contest [8]. This model stacked the SPOT-5 and radar images from both the pre-event (t1) and post-event (t2) for flood detection. The stacked data consists of 8 bands as The model was trained using the labeled regions of interest (ROIs) in both the t 1 and t 2 images and then was applied to the whole images for flood detection. The architecture of the MLP model is shown in Figure 4.

Support Vector Machine (SVM)
Support Vector Machine [38] is another popular machine-learning method which has been widely used for classification and is considered to be a baseline for image classification. SVM finds hyper-planes among data points as classification boundaries. Linear, polynomial, Gaussian radial basis function, and Hyperbolic tangent kernel is generally used for SVM classification. LIBSVM [39] is a popular implementation for SVM models.
We followed the same settings of another winner in the competition who used an SVM classifier for flood detection [8]. This model used the four additional bands created by morphological operation on the NIR band in the post-event's SPOT image as described in Section 2.2, making the input data 10 bands as . Similarly, the model was trained using the labeled ROIs in both the t 1 and t 2 images and then was applied to the whole images for flood detection.

Source Only (SO)
To compare with the domain adaptation-based flood-detection methods, we treated the pre-event images as source domain and the post-event as target domain. The SO method is the baseline model, in which we trained a classifier (CNN/MLP/SVM) only using the ROIs labeled in the pre-event images. We then directly applied the trained classifier to the post-event images for flood detection.
In source only models, we only used the pre-event image's ROIs to learn the difference between the flood and non-flood data without using any labeled and unlabeled data from the post-event images. The CNN-SO model's architecture is shown in Figure 5. Here, the DCNN model has two convolutional layers and two fully connected layers. The convolutional layers use filters to search for task-specific features in the images. If it finds similar features present in the images, the filters produce a high agreement. Initially, the filters are random but after a few iterations, the filters are optimized to scan task-specific features. After two convolutional layers, we used a fully connected layer as shown in Figure 5. The fully connected layer works similarly to the MLP model. The last layer performs the classification task and predicts the probability of being flood class or non-flood class. We stacked the images as [X t1 ] for flood detection. We normalized the data between 0 to 1 in each band and used the labeled ROIs in the pre-event images for training, no label information in the post-event images were used during training. All subsequent methods described in Sections 2.3.4-2.3.7 used the same data composition.

Unsupervised Domain Adaptation (UDA)
Unsupervised domain adaptation based on generative adversarial network (GAN) [35] became popular since GAN was created in 2014 [34], and we implemented the approach proposed for flood detection. First, we trained a deep convolutional neural network (DCNN) model in the source domain using all the labeled ROIs. After the training, the DCNN model learned the source embedding function, G s and source classifier C s for flood detection (Figure 6a). In the second step, we used the unlabeled data from both the source and target domains to match the marginal distribution between the source (P(G S (X s ))) and the target domains (P(G t (X t ))) based on the GAN loss ( Figure 6b). The target embedding was trained to represent the source embedding so that the classifier trained on the source domain was used in the target domain for flood detection.
Let the source and target domain datasets to be denoted as D s = {X s ,Y s } and D t = {X t }, respectively. First, we train a DCNN model using the labeled samples from the source domain (Figure 6a), where f s is the DCNN model and E is the expectation function and l is any related loss functions. f s can be divided into two function source embedding function, G s and source classifier, C s . We use the trained source embedding function G s to find an optimum target embedding function G t using the unlabeled data from both domains. We modify the target domain G t using a GAN loss to match the source domain embedding function G s . The discriminator, D, in Figure 6b is trained using following GAN loss, The target embedding function G t uses the following generator loss to mimic the source domain, Using this feedback, the target embedding function G t modifies its parameter so that the source domain classifier can work on the target domain.

Semi-Supervised Domain Adaptation (SSDA)
We modified our previous work [18] for flood detection as shown in Figure 6. First, a DCNN model was trained using the source domain labeled ROIs. After training, the source embedding function was first adapted to target domain using UDA described in Section 2.3.4. Then, a few labeled samples from the target domain were used to align the class-specific distribution between the source and the target domain based on the semantic contrastive alignment and classification loss (CSA) (Figure 6c). The CSA loss put the same class samples from different domains together and different class samples from the different domains as far as possible (Figure 6d).
To reduce the classwise distribution shift between the source and target domain, we used a few labeled samples from the target domain D t = {X t , Y t } with the classification and contrastive semantic alignment loss (CCSA) [18,36]. The classification loss was defined as, The contrastive semantic alignment (CSA) loss consists of semantic alignment loss and class separation loss, where L SA (G t ) is the semantic alignment loss and L CS (G t ) is the class separation loss. L SA (G t ) is computed as, where N c is the number of classes, X s a = X s /{Y = a} and X t a = X t /{Y = a} are conditional random variables and d is a distance metric between source X s a and target X t a distributions. We used the semantic alignment loss to map samples from the same class from source domain and target as close as possible. We also used the following class separation loss to map samples carrying different class labels from different domains as far as possible in the embedding space, where k denotes the similarity matrix. If the distributions of X s a and X t b are close to each other the loss function puts a penalty to keep them separate. Figure 6d represents the working procedure of the CSA loss. The semantic alignment, class separation and classification loss are represented as the orange arrows, red dashed line and blue solid line, respectively. During testing, we used the trained target mapping function, G t , to find the new representation. The overall classification and contrastive semantic alignment loss becomes, First, we used the labeled ROIs in the pre-event images to train the classifiers (SVM/MLP/DCNN). Then, we used 1, 3, 5, and 20 labeled samples from the post-event images to fine-tune the classifiers. The fine-tuned classifiers were applied to classify the post-event images.

MLP and SVM with 20 Samples from Post-Event Images
We combined the ROIs in the pre-event images with 20 labeled samples from ROIs in the post-event images to train the SVM and the MLP classifiers. After training, we used trained classifiers for flood detection. In other domain adaptation methods, the pre-event image ROIs were used for training and a few labeled samples from post-event images were used to adapt the trained model to detect flood in target domain. This experiment was conducted to test if the sequential use of pre-and post-event labeled samples can provide any advantage.

Evaluation Metrics
Our dataset contains much more non-flooded pixels than flooded ones and is highly imbalanced. It was found that for binary imbalanced dataset, the precision-recall (PR) curve is a good performance metric to assess effectiveness of different classification models [40,41]. The area under the PR curve (AUC) is also computed as a performance metric. In addition, we apply 0.5 as threshold to flood-detection probability maps to obtain the final detections, which are used to compute precision, recall and F1 score metrics for different models as follows.

Precision =
# o f true f lood pixels detected # o f total pixels detected (9) Recall = # o f true f lood pixels detected # o f total labeled f lood pixels (10)

Hyper-Parameter Determination
We selected a patch-size of 5 × 5 in our experiments after considering the trade-off between performance and computational time. In the SSDA method, 10,000 labeled samples from each class in the source domain were randomly selected to pair with a few labeled samples in the target domain for adaptation. For the embedding functions in source domain, G s and target domain, G t , we used two convolutional layers followed by a flatten layer as embedding functions. The first layer had 25 filters with a size of 2 × 2 × 8 and the second layer had 25 filters with a size of 4 × 4 × 20. All layers used the ReLu activation function. Both classifiers, C s and C t in CNN models contained a fully connected layer with 25 hidden units. The output layer had 2 units with SoftMax activation function for classification. We trained the source CNN models for 200 epochs with a batch size of 128. We trained the UDA step for 300 epochs and the SSDA step for 240 epochs in all experiments. We followed similar settings of the competition [8] and used an SVM classifier from the scikit-learn library [42] with a linear kernel. The SVM was implemented based on libsvm [39]. We use a grid search optimization technique to optimize the SVM classifier's parameters. We perform a grid search among linear and RBF kernels, C parameters between 1 to 1000 and Gamma value between 1 × 10 −3 and 1 × 10 −4 . Similarly, the MLP classifier had two hidden layers with 40 hidden unites each.

Results
In Table 1, the classification performances of SVM source only (SO), MLP SO, and DCNN SO methods are shown in the first three rows. The unsupervised domain adaptation results are shown in the fourth row in Table 1. Row 5 to row 23 are results by 1-shot, 3-shot, 5-shot, 10-shot, and 20-shot from the post-event images with MLP fine-tuning (FT), SVM FT, DCNN FT, and Semi-supervised domain adaptation (SSDA) approaches. The last two rows show results by the two winners: MLP [8] and SVM [8] in the flood-detection competition.

Original Predicted Probability (OPP) Results
Each method produced a flooding probability map and we computed Precision, Recall and F1-score on the thresholded original maps for comparison, shown as 'OPP' in Table 1. We also compare the PR-AUC values to evaluate the flood-detection performance of each method. The OPP maps for different models are shown in Figure 7. We use a 'jet' color-map (Matlab) to show the flood-detection probability for each method. Color blue represents the probability '0' and color red represents probability '1'. Also, we add a color-bar to display different probability values. The PR curves for the last six flood-detection scenarios in Table 1 are shown in Figure 8a. To remove the false positive points from the OPP maps, we applied morphological operation on the maps with a structuring element size of 5. The classification maps for different models are shown in Figure 9 and performance metrics shown as 'MO-DS' are listed in Table 1. Similarly, we display the probability maps using 'jet' colormaps where blue represents the probability '0' whereas red represents the probability of '1'. The PR curves for the last six flood-detection scenarios in Table 1 are shown in Figure 8b.

Discussions
First, we compare the source only models in Table 1. In this setting, the CNN-SO method achieved the best F1 score of 0.5638 as compared with MLP-SO and SVM-SO (0.3466). For PR-AUC metric, CNN-SO (0.8070) and SVM-SO (0.8132) showed similar outcomes. The MLP-SO method failed to detect any flood samples in the post-event images. The UDA method, which used only unlabeled samples from the target domain performed better than all the source only models. It achieved similar results (F1 score of 0.8652 and PR-AUC of 0.8833) as compared to other few-shot methods which used a few labeled samples from the post-event images. For this dataset, UDA performed superbly without any labeled samples from target domain and the predicted flooding map is shown in Figure 9d. If we compare the UDA performance with the ground truth shown in Figure 2b, we can see that UDA produced very few false positives. The false positives were reduced by the post-processing step as shown in Figure 7d (before) and in Figure 9d (after) the processing step.
Next, we compare the performance of the SSDA method with the SVM, MLP and DCNN models with fine-tuning as described in Section 2.3.6. We can observe that all the methods performed similarly except that the SV M 1 shot FT method failed the task with Recall below 1% ( Table 1). The classification prediction maps shown in Figures 7e and 9e justify the results where the flood regions show close to '0' probabilities. In addition, SSDA methods are slightly better than the other three methods. In terms of PR-AUC and F1 score, SSDA achieved the highest scores three times and four times, respectively, as shown in the prediction maps (Figures 7 and 9). If we analyze the fine-tuning results by SVM, MLP and DCNN in Figures 7 and 9, MLP and DCNN performed similarly where the SVM fine-tuning with 1-shot was much worse. With 20 shots of labeled samples, SVM improved by a large margin as shown in Figure 9f as compared with the 1-shot result shown in Figure 9e. In Table 1, all the methods improved with increased number of shots. Please note that the improvements by all the methods from 10-shot to 20-shot are marginal. Therefore, we chose 20 labeled samples from the post-event images for fine-tuning and all the fine-tuning methods produced stable performances in the post-event images.
Flooding map predicted by the SVM classifier [8] had many false positives as shown in Figure 7m. After the morphological operation post-processing step, false positives were reduced by a large margin as shown in Figure 9m. The competition winner, the MLP classifier [8], showed similar performances. If we compare the PR-AUC metrics in Table 1, SSDA 20-shot achieved the best PR-AUC values after post-processing whereas SSDA 3-shot achieved the best PR-AUC value in original predicted probability map. After applying the morphological operation on the model predicted flood map in Figure 8a, all approaches improved their detection performances as shown in Figure 8b. In Figure 8a,b, SSDA 20-shot achieved the best performances.
It is worth noting that both the SVM and MLP classifiers used 48,684 labeled samples from the preand post-events images, respectively, for training. On the other hand, the domain adaptation method, SSDA, only used the 48,684 labeled samples from the pre-event images for training, and a few labeled samples in the post-event images for adaptation. Though the competition winner methods used more samples for training, these methods performed slightly worse than the SSDA 20-shot approach (0.8727 (SVM) and 0.8949 (MLP) vs. 0.9173). In addition, the SVM and MLP models [8] had a high chance to fail if the pre-and post-event images were not registered. The MLP and SVM with 20 samples described in Section 2.3.7 performed poorly as compared with the other methods in Table 1. Our experiments suggested that domain adaptation-based method, SSDA, is preferred for flood detection.

Conclusions
In this paper, we conducted a comparative study of various methods for flood detection including SVM, MLP, deep CNN models and domain adaptation method combined with those models. The comparison was conducted on the multi-temporal optical (Spot-5) and radar (SAR) sensor image dataset used for the flood-detection competition organized by DFTC in 2009. We found that all the compared methods performed similarly in flood detection, and the SSDA method, which is a combination of deep CNN with a semi-supervised domain adaptation strategy, achieved slightly better performance using much less training samples as compared with the competition winner methods. Our study suggests that the SSDA method is an effective algorithm for flood detection and it is preferred in practice. In future work, we will apply domain adaptation methods to more datasets across different locations to further validate our findings.