With the development of remote sensing technology, a large number of high-resolution satellite data have been obtained, which can be applied to land cover monitoring, marine pollution monitoring, crop yield assessment, and other fields [1
]. However, the presence of clouds, especially very thick ones, can contaminate captured satellite images and cause interference in the observation and identification of ground objects [5
]. As a result, clouds create a lot of difficulties for tasks such as target identification, trajectory tracking, etc. On the other hand, clouds are the most uncertain factor in Earth’s climate system. It is estimated that clouds cover about 67% of the Earth’s surface [8
]. Clouds can affect the energy and water cycles of global ecosystems at multiple scales by influencing solar irradiation transmission and precipitation, and thus have a significant impact on climate change on Earth [10
]. At the same time, cloud observations and forecasts are important for the management of the power sector, as cloud coverage affects the use of solar energy [11
]. Therefore, cloud detection is of great research significance, both in various remote sensing applications and in fields such as earth science research.
Over the years, many cloud detection methods [12
] have been proposed for satellite remote sensing data. Among the many methods, the threshold method is a relatively well-established cloud detection method. It accomplishes cloud detection task based on the radiative difference between cloud and non-cloud pixels. For example, the function of mask (Fmask) [15
] is a typical threshold-based cloud detection method that uses a decision tree to mark each pixel as cloud or non-cloud. In each branch of the decision tree, decisions are given according to a threshold function. In practice, threshold-based methods are often not used in isolation and often rely on multi-spectral and multi-method combinations, such as adding other information (e.g., surface temperature, etc.) or combining other techniques (e.g., superpixel segmentation [18
] and texture analysis [19
], etc.) to improve the detection accuracy of threshold methods. Most of these methods are specific to specific bands of remote sensing data, and the selection of thresholds is often difficult and time-consuming. Although various threshold methods have achieved some success in their respective applications, most methods are not generalizable and require constant adjustment and optimization of the threshold selection.
In recent years, deep learning has made great breakthroughs in the field of computer vision [20
], with remarkable results in areas such as face recognition, object detection and medical image analysis. Inspired by deep learning algorithms, many scholars have developed several approaches to cloud detection using deep learning. Convolutional neural network (CNN) is a widely used deep learning method that has unique advantages in the field of image processing. Shi et al. [5
] utilized superpixel segmentation and deep Convolutional Neural Networks (CNNs) to mine the deep features of cloud. The experimental results show that their proposed model works well for both thin and thick clouds, and has good stability in complex scenarios. Chen et al. [23
] implemented a multilevel cloud detection task for high-resolution remote sensing images based on Multiple Convolutional Neural Network s(MCNNs). Specifically, MCNNs architecture is used to extract multiscale information from each superpixel, next superpixels are classified as thin clouds, thick clouds, cloud shadows, and non-clouds. The results show that the proposed method has an excellent performance in the task of detecting multilevel cloud detection. Segal-Rozenhaimer et al. [24
] proposed a novel domain adaptation CNN-based method, which utilizes the spectral and spatial information inherent in remote satellite imagery to extract the depth invariant features for cloud detection. The method can be better adapted to different satellite platforms in the prediction step without the need to train separately for each platform, improving the robustness of predictions from multiple remote sensing platforms. Ozkan [25
] et al. proposed an efficient neural network model based on a deep pyramid network. In the task of cloud detection, the model can obtain very good classification results from a set of noisy labeled RGB color remote sensing images with accuracy up to pixel level. Francis [26
] et al. proposed a CloudFCN model for cloud detection tasks based on the Fully Convolutional Network architecture. The experimental results show that this model works well in cloud detection, illustrating the great potential of the Fully Convolutional Network architecture for applications in satellite remote sensing data. These studies all show that deep learning methods have great advantages in satellite remote sensing data processing, and cloud detection methods that introduce convolutional neural networks are often superior to traditional cloud detection methods. U-Net [27
] is a very effective image segmentation method that has a remarkable performance in many image segmentation tasks, especially medical image segmentation tasks. Many studies have found that models based on the U-Net architecture also show excellent performance in satellite remote sensing image segmentation [30
]. For instance, Jeppesen [33
] et al. propose a cloud detection algorithm (RS-Net) based on the U-net architecture and use it to detect clouds in satellite images. From the experimental results, it was found that the RS-Net performed better than the conventional method. Wieland et al. [34
] presented an improved U-Net convolutional neural network for the task of multi-sensor cloud and cloud shadow segmentation. Their experimental results show that the model achieves great results on multiple satellite sensors with excellent generalization performance. Besides, adding shortwave-infrared bands can improve the accuracy of the semantic segmentation task of cloud and cloud shadow. In cloud detection tasks, the number and distribution of clouds tend to present very complex randomness. In order to achieve accurate cloud detection, attention should be focused on the areas with clouds during the cloud detection process. In the field of medical image classification, the object detection, etc., attention mechanism is a very effective method [35
] that can allocate more processing resources to the target. Attention mechanism originates from human beings visual cognitive science. When reading text or looking at objects, humans tend to pay more attention to detailed information about the target and suppress other useless information. Similarly, the basic idea of the attention mechanism is that the model learns to focus on important information and ignore the unimportant information. Some studies have shown that attention mechanisms can improve classification effects [39
]. Therefore, attentional mechanisms in computer vision should be introduced into the construction of cloud detection models. In this study, the Cloud-AttU model is proposed on the basis of U-Net, which introduces the attention mechanism. In the experiment, it was found that the results of cloud detection were significantly improved as the attention mechanism guided the model to learn more cloud-related features to detect clouds.