2. Related Research Works
There are some previously published research articles related to GANs for converting SAR images to optical images:
CFRWD-GAN for SAR-to-Optical Image Translation: This paper proposes a cross-fusion reasoning and wavelet decomposition GAN to enhance the translation of SAR to optical images by preserving structural details and handling speckle noise [
6].
GAN-Based SAR-to-Optical Image Translation with Region Information: This research explores using a GAN framework for SAR-to-optical image translation, incorporating region information to enhance the generated image’s quality [
7].
GAN with ASPP for SAR Image to Optical Image Conversion: This paper proposes a GAN architecture with an atrous spatial pyramid pooling (ASPP) module for converting SAR images to optical images. The ASPP module helps capture multi-scale features, improving the quality of the generated optical images from SAR inputs [
8].
Improved Conditional GANs for SAR-to-Optical Image Translation: This paper proposes improvements to the conditional GAN architecture for translating SAR images to optical images. The key contributions include enhancing the generator and discriminator components of the GAN to better capture details and textures during the image translation process [
9].
Feature-Guided SAR-to-Optical Image Translation: This paper proposes a feature-guided method that leverages unique attributes of SAR and optical images to improve the translation from SAR to optical images. The key idea is to guide the translation process using features extracted from the input SAR image, enabling better preservation of details and textures in the generated optical image [
10].
The pix2pix paper pioneered the use of conditional GANs for general image-to-image translation problems and has inspired many follow-ups, including for translating SAR to optical imagery [
2]. The pix2pix model uses a U-Net-based generator and a patch-based discriminator in a cGAN setup. It employs a novel adversarial loss and combines it with a standard loss like L1 or L2 to improve training stability.
The pix2pixHD model significantly improved upon the resolution quality compared to the original pix2pix, enabling high-fidelity image synthesis useful for various computer vision and graphics applications [
3]. The model uses a coarse-to-fine generator and a multi-scale discriminator to improve details at different scales. It employs a feature matching loss and an adversarial loss to improve training stability. The generator contains a global generator network and a local enhancer network for adding high-frequency details.
There is some relevant research works related to “Deep Learning Based Method for Landslide Area Detection with Optical Camera Image Derived from SAR Image Based on GAN After Transfer Learning”:
CycleGAN-Based SAR-Optical Image Fusion for Target Recognition: This paper explores using CycleGAN, a type of GAN for SAR-to-optical image translation specifically for target recognition, which can be applicable to landslide zone detection [
11].
Exploiting GAN-Based SAR to Optical Image Transcoding for Improved Classification via Deep Learning: This work investigates using a GAN for SAR-to-optical translation followed by classification using the learned features for improved landslide zone classification [
12].
LandslideGAN: Generative Adversarial Networks for Remote Sensing Landslide Image Generation: This research explores using a GAN to generate synthetic landslide images for training purposes, which can be helpful when real landslide image datasets are limited [
13].
There are many alternatives of the object detection algorithms such as improved YOLOX as follows, YOLOX: Exceeding YOLO Series in Object Detection [
14] and YOLOX++: Improved YOLOX for Object Detection [
15]. Furthermore, RGB-D Object Detection: A Survey is published [
16]. Depth-aware YOLO (DA-YOLO): A Real-time Object Detection System for RGB-D Images is also published [
17]. On the other hand, there are instance segmentation algorithms that can be used for landslide area detection such as Improved SOLOv2 Instance Segmentation of SOLOv2: Dynamic and Fast Instance Segmentation [
18] and SOLOv2+: Improved SOLOv2 for Instance Segmentation [
19].
Furthermore, there are some recent research papers related to apple recognition and localization using RGB-D images and improved instance segmentation and object detection methods:
An improved SOLOv2 instance segmentation method was proposed for apple recognition and localization using RGB-D images. The authors introduce a new loss function and a multi-scale feature fusion module to improve the accuracy of instance segmentation. Experimental results show that the proposed method achieves a high recognition accuracy of 95.6% and a localization accuracy of 92.3% [
20].
An improved YOLOX object detection method was presented for apple detection and localization in RGB-D images. The authors introduce a new anchor-free mechanism and a spatial attention module to improve the accuracy of object detection. Experimental results show that the proposed method achieves a high detection accuracy of 96.2% and a localization accuracy of 94.5% [
21].
A real-time apple recognition and localization system was proposed using RGB-D images and deep learning. The authors use a convolutional neural network (CNN) to extract features from RGB-D images and a support vector machine (SVM) to recognize apples. Experimental results show that the proposed system achieves a high recognition accuracy of 93.5% and a localization accuracy of 91.2% [
22].
A hybrid approach for apple detection and segmentation in RGB-D images was proposed. The authors use a CNN to detect apples and a graph-based segmentation method to segment apple regions. Experimental results show that the proposed approach achieves a high detection accuracy of 94.8% and a segmentation accuracy of 92.5% [
23].
A deep learning framework for apple recognition and localization using RGB-D images was proposed. The authors use a CNN to extract features from RGB-D images and a recurrent neural network (RNN) to recognize apples. Experimental results show that the proposed framework achieves a high recognition accuracy of 92.1% and a localization accuracy of 90.5% [
24].
In summary, there are many GAN-based conversion methods from SAR images to optical images. However, there is no GAN-based conversion method that maintains spatial resolution of the converted optical images from SAR images. Therefore, we propose a method to convert SAR images to optical sensor images based on GAN of pix2pixHD with a consideration of enhancing spatial resolutions.
3. Proposed Method
The purpose of this study is to establish a method to detect disaster areas such as landslides by mobile observation like constellation SAR. Constellation SAR is characterized by its high spatial resolution and the ability to observe disaster areas in all weather conditions, regardless of day or night, but since it is not a recurrent orbit, it is difficult to observe the same area repeatedly. On the other hand, optical images are generally easier to detect in disaster areas than SAR images, and although constellations equipped with optical sensors are being considered, they cannot be observed at night and are not all-weather. Therefore, a method is needed to detect disaster areas using only a single SAR image. In the case of detecting disaster areas such as landslides, the area is often covered with vegetation before the disaster, and it is possible to grasp the vegetation before the disaster using Google Earth, etc., so we thought that it would be possible to detect disaster areas using a single SAR image. In addition, we thought that the disaster areas could be detected accurately by converting the SAR image to an optical image using GAN, etc., so we considered using pix2pixHD. However, although the loss function takes into account conditions such as the sharpness of the image, the spatial frequency components generally deteriorate due to the conversion. Therefore, in this study, we devised a new condition for the loss function that allows high-frequency components to remain. In this way, we propose a method to perform an orthogonal transformation of SAR images, taking into account the radio wave irradiation direction, observation off-nadir angle, shadowing, layover, foreshortening, etc., convert the transformed image into an optical image to detect bare soil, and detect areas covered with vegetation as disaster areas using Google Earth.
We devised a method to convert SAR images into optical images using pix2pixHD, learn the affected areas using the converted optical images, and build a trained model. In addition, many constellations SAR small satellites have been developed in recent years, but unlike large SAR satellites, it is difficult to take a recurrent orbit, and the observation width is narrow, so it is difficult to detect affected areas by taking the difference between images before and after the landslide. For these reasons, the method of detecting affected areas proposed in this paper uses only a single SAR image. Specifically, we decided to use EfficientNetV2, which is widely used as an image classification method, to classify whether an area is affected. As shown in
Figure 1, the proposed method first converts SAR images of the landslide area into optical images using pix2pixHD.
One of the problems of pix2pixHD is the degradation of spatial resolution. There are the following three loss functions in the pix2pixHD: (1) adversarial loss, which trains a discriminator to distinguish between images generated by a generator and real images; (2) feature matching loss, which trains a discriminator to match the features of images generated by a generator and real images; and (3) perceptual loss, which trains a discriminator to match the appearance of images generated by a generator and real images. Although by combining these three loss functions, pix2pixHD is able to generate high-quality images, this loss function is not good enough for spatial resolution enhancement. In order to solve the problem, we introduce the following fourth loss function of a spatial attention mechanism into the generator so that it learns to focus on high-resolution regions. The code for spatial attention is as follows Listing 1.
Listing 1. The code for spatial attention. |
class SpatialAttention (nn.Module): def __init__(self): super(SpatialAttention, self).__init__() self.conv = nn.Conv2d(2, 1, kernel_size = 7, padding = 3) def forward(self, x): avg_out = torch.mean(x, dim = 1, keepdim = True) max_out, _ = torch.max(x, dim = 1, keepdim = True) x = torch.cat([avg_out, max_out], dim = 1) x = self.conv(x) return torch.sigmoid(x)
# Incorporating it into the generator architecture self.spatial_attention = SpatialAttention()
x = self.spatial_attention(x) * x |
The parameters of kernel size and padding are determined by trial and error. This method is renamed as “pix2pixHD+” hereafter. The converted optical images are then used to annotate the landslide area with a COCO annotator [
25,
26] and the annotation regions are used as input images to learn the landslide area based on EfficientNetV2 [
27] as an image classification system, constructing a landslide area learning model.
At first, pix2pixHD+ must be trained with Sentinel-1 SAR (input data) and the corresponding areas of Sentinel-2 optical images (desired output data) using pix2pixHD+. After the creation of a trained pix2pixHD+ model, we input a Sentinel-1 SAR image to the trained pix2pixHD+ model, and then output optical images are obtained.
By using the optical images derived from SAR images with pix2pixHD+ and field survey results of correct landslide areas, EfficientNetV2 is to be trained. Then, we input Sentinel-1 SAR image of landslide areas to the learned EfficientNetV2 model. Thus, a landslide area is detected.
4. Experiments
4.1. Research Background
The Kumamoto earthquake occurred in the night on 14 April and before dawn on 16 April, with a maximum seismic intensity of 7, the highest on the Japan Meteorological Agency’s seismic intensity scale, as well as two earthquakes with a maximum seismic intensity of 6+ and three earthquakes with a maximum seismic intensity of 6−. The following information about the latitude, longitude, seismic intensity, and casualties of the Kumamoto earthquake in Minami-Aso is as follows:
Earthquake location: Southern Aso City, Kumamoto Prefecture
Latitude: 32.75° N
Longitude: 131.00° E
Seismic intensity: Maximum Mw 7.3
Casualties: 50 people killed, 2000 injured
This earthquake caused great damage to many areas in Kumamoto Prefecture, with buildings collapsing, roads being cut off, and water and electricity supplies being cut off. In addition, because the epicenter was deep, strong seismic intensity could be felt even in areas far from the epicenter, and the damage was widespread. The earthquake caused large-scale slope failures, debris flows, and landslides, with damage concentrated particularly in the vicinity of Minami-Aso Village. In the Tateno district of Minami-Aso Village, a large landslide caused by the main earthquake on the 16th caused the collapse of National Route 57 and the washing away of the Toyohashi Main Line tracks.
Due to the earthquake, there were the following landslides:
The number of landslides is 190 which includes (1) 57 debris flows (54 in Kumamoto Prefecture, 3 in Oita Prefecture), (2) 10 landslides (10 in Kumamoto Prefecture), and (3) 123 cliff collapses (94 in Kumamoto Prefecture, 15 in Oita Prefecture, 11 in Miyazaki Prefecture, 1 in Saga Prefecture, 1 in Nagasaki Prefecture, and 1 in Kagoshima Prefecture).
Figure 2a shows a Google map of Kyushu while
Figure 2b shows the biggest landslide photo that occurred at Minami-Aso in Aso Ohashi, provided by the Ministry of Land [
28].
4.3. Learned Model Creation
We built a learning model for pix2pixHD+ using the following data, which were collected in the landslide areas caused by the Kumamoto earthquake, with few clouds and close to the date of the earthquake.
The Sentinel-1 SAR image was taken on 17 October 2023 and the Sentinel-2 optical images were taken on 18 October 2023. In addition, the following data were used for the landslide area detection experiment: The Sentinel-1 SAR images taken on 27 March 2016, 20 April 2016, 17 October 2023, and 20 May 2024, and the Sentinel-2 optical image taken on 18 October 2023.
Sentinel-1 SAR images were subjected to speckle noise removal using a Lee filter, and range-dropper terrain correction was performed after calibration. Sentinel-2 optical images were also used as L2A data (RGB images with atmospheric correction). Fifty-four original images were used as training data for pix2pixHD+, and these were augmented four times using augmentation, for a total of two-hundred and sixteen images. In addition, a trained pix2pixHD+ was constructed with 200 epochs of training, batch size 2, and images of 256 × 256 pixels.
For training EfficientNetV2, we used optical images (without landslide areas) acquired on 27 March 2016, optical images (divided into those with and without landslide areas) acquired on 20 April 2016, and optical images (without landslide areas, excluding the 2016 landslide areas, and only those without landslide areas were used as training data) that were converted from SAR images to optical images using pix2pixHD+ as a trained model. Since we thought that the conversion accuracy would increase if the topography was similar, we took the same area as the 2023 data for all these data.
Figure 4a–c shows the example of the actual Sentinel-2 optical image of the Kumamoto landslide area, the corresponding area of the Sentinel-1 SAR image, the converted optical image derived from the Sentinel-1 SAR image based on pix2pixHD+, and the landslide photo of Kumamoto, respectively. In
Figure 4, the learning status in the final epoch and the red frame show the area of the large-scale landslide in Kumamoto in 2016.
Figure 5 shows the examples of Sentinel-1 SAR images, and the converted optical images converted from Sentinel-1 SAR images which were acquired on 27 March 2016, 20 April 2016, and 20 May 2024. From these, it is found that the converted optical images from the Sentinel-1 SAR images are better for annotation of landslide areas rather than with the Sentinel-1 SAR image.
- (1)
Training: 12 × 4 = 48 images with landslide, 52 images without landslide
- (2)
Validation: 6 × 4 = 24 images with landslide, 26 images without landslide
- (3)
Test: 6 × 4 = 24 images with landslide, 36 images without landslide
Augmentation was performed only on the data with landslide by rotation. In addition, the hyperparameters used were 50 epochs, lr = 0.001, batch size 4 for the training of pix2pixHD+ validation, batch size 1 for the test, and tf_EfficientNetV2_s_in21ft1k. In the EfficientNetV2 training section, the 2016 data were included in the training data, both with and without damage. The data excluded from the training data were the data for the landslide-stricken areas in 2024.
We evaluated the superiority of the proposed method by comparing it with a learning model that used only SAR images.
Figure 6a,b shows the training and validation accuracy and loss functions for EfficientNetV2 learning with Sentinel-1 SAR images, respectively, while
Figure 6c,d shows the confusion matrix and ROC (receiver operating characteristic) curve of the learning performances for EfficientNetV2 with optical images derived from Sentinel-1 SAR images.
Summarized test results are as follows:
- (1)
Accuracy: 0.4167, (2) sensitivity: 0.2500, (3) singularity: 0.6667, (4) PPV (Positive Predictive Value): 0.5294, (5) NPV (Negative Predictive Value): 0.3721, (6) F1-Score: 0.3396, (7) AUC (Area Under the Curve): 0.2697.
On the other hand,
Figure 7a,b show the training of pix2pixHD+ validation accuracy and loss functions for EfficientNetV2 learning with optical images derived from Sentinel-1 SAR images, respectively, while
Figure 7c,d show the confusion matrix and ROC curve of the learning performances for EfficientNetV2 with optical images derived from Sentinel-1 SAR images.
Summarized test results are as follows,
- (1)
Accuracy: 0.5000, (2) sensitivity: 0.6944, (3) singularity: 0.2083, (4) PPV: 0.5682, (5) NPV: 0.3125, (6) F1-Score: 0.6250, (7) AUC: 0.4109.
As a result, we confirmed that the F1-score and AUC were 0.3396 and 0.2697, respectively, when using only SAR images, but that they were 0.6250 and 0.4109, respectively, with the proposed method, which is 1.52 to 1.84 times higher.
6. Conclusions
In this study, we aimed to generate pseudo-optical sensor images to grasp the damage situation on the earth’s surface from SAR data, which is observed day and night regardless of weather. The main methods currently available for grasping natural landslides are difficult for ordinary people to understand due to the time it takes to grasp the damage situation, and the specialized knowledge required, and there is no method that can grasp the damage situation quickly.
As shown in this paper, we proposed a new method to detect landslide areas from only a single SAR image after a landslide, so that landslide areas can be detected even in cases where coherency between two orbits before and after a landslide, such as constellation SAR, or interferometric SAR is difficult due to orbital conditions. In this case, we devised the introduction of a pix2pixHD+ that takes advantage of the characteristics of SAR, which can observe in all weather conditions and can observe day and night and converts SAR images into optical images that make it easier to interpret the landslide area.
In addition, we attempted to build a learning model of EfficientNetV2, which is a well-known image classification method, as a method of landslide area detection, and devised the use of the converted optical images to detect landslide areas. The learning performance of EfficientNetV2 was confirmed by indicators when only SAR images were used and when images converted from SAR images to optical images were used, and the following good results were obtained, demonstrating the superiority of the proposed method.
We confirmed that the F1-score and AUC were 0.3396 and 0.2697, respectively, when using only SAR images, but that they were 0.6250 and 0.4109, respectively, with the proposed method, which is 1.52 to 1.84 times higher.
If the damage situation can be detected quickly using machine learning rather than manually checking a large amount of SAR data one by one, there are advantages such as increased observation frequency and detection of small changes. In addition, if this system is developed, analysis can be performed more quickly than the current method for grasping the damage situation, and it is expected that immediate measures and evacuation advisories can be issued, and the number of people affected will decrease. This time, we introduced examples of landslides, slope failures, and landslides caused by the Kumamoto earthquake. However, since the proposed method can also be applied to detect other landslide areas, we hope to increase the number of examples for other landslide items in the future.