The world’s tropical forests are essential places for environmental sustainability and the future of our planet. They combine high biodiversity and significant carbon storage with ecological services [1
]. Since the 1980s, the world’s tropical forests have undergone substantial change. Agricultural expansion worldwide happened at the expense of tropical forest areas [3
]. In particular, the Brazilian Amazon rain forest has suffered significant deforestation. According to Brazil’s National Institute for Space Research (INPE), deforestation has reached 20% of the Amazon rain forest in the country [4
]. Producing qualified assessments of land-use and land cover change in Amazonia is essential for evidence-based policies that can protect the forest [5
Earth observation data is the primary source of assessments of deforestation in Amazonia. Since the late 1980s, INPE produces annual estimates of clear-cut areas of forest with the PRODES project and daily indications of deforestation alerts with the DETER initiative [7
]. PRODES and DETER are the authoritative sources of information that support Brazil’s actions in protecting Amazonia [8
]. As a companion to the monitoring of clear-cut areas, INPE and its partners also produce periodic maps of land-use change in deforested areas in Amazonia with the TerraClass project [10
]. Universities and research groups complement INPE’s work [11
]. Altogether, there is a substantial amount of land-use change information on the Brazilian Amazonia derived from remote sensing data.
Despite the widespread data availability, the complexity of land-use transitions in Amazonia requires continuous improvements in image classification. Law enforcement actions by the Brazilian federal government managed to reduce deforestation from 2.7 Mha (million hectares) in 2004 to an average of 0.6 Mha between 2009 and 2018. Despite such reduction, deforestation in Amazonia is still at a relatively high level. To understand the complex interplay between crop production, cattle ranching, and land speculation, ever more detailed data is required.
To improve monitoring of the Amazon forest from satellites, researchers are investigating the use of big data analytics [13
]. Such methods rely on the increase of data provided by the new generation of satellites such as Sentinel–2 [15
]. However, to use these large data sets in tropical forest areas, researchers need suitable methods of automated cloud detection in optical imagery.
Traditional alternatives for dealing with cloud cover include combining information from various dates and selecting a “best pixel” for an extended period [16
]. These methods lead to the loss of temporal information required to identify crop types [14
] and pasture management [18
]. To capture temporal information, many researchers prefer methods that identify cloud-covered pixels and replace them with interpolated estimates [20
]. When different satellites are combined to produce a denser time series [21
], replacing cloudy pixels by interpolated values becomes feasible. For this reason, automated cloud detection algorithms are a necessary complement to big Earth observation data analytics.
Cloud detection algorithms are an active research field [20
]. Each algorithm has specific characteristics and ad hoc techniques; thus, comparing them on a theoretical basis is hard. In practice, performance assessment is done by selecting representative images and assessing how well each algorithm performs in each image. As an example, Baetens et al. [26
] compared three cloud detection methods (MAJA, Sen2Cor, and Fmask) using 32 images from 10 different locations. As a reference for comparison, the authors use a machine learning method to identify clouds for Sentinel–2 images. Given the different land cover, the diversity of sensors, and the advances in detection methods, such comparisons serve as general guidance only.
In this paper, we approach the problem of comparing different cloud detection algorithms from a regional viewpoint. Given the importance of monitoring land change in Amazonia, we consider cloud detection methods for Sentinel-2 MSI images in this region. We consider four cloud detection algorithms: Fmask 4 [25
], MAJA [23
], Sen2Cor 2.8 [22
], and s2cloudless [27
]. Cloud formation in Amazonia is distinct from most continental areas [28
]. The forest produces its own rain [29
]. The rainforest generates the aerosols that make up the cloud condensation nucleus in the region [30
]. The probabilities of cloud coverage in satellite imagery depend not only on the month of the year but also on the location inside Amazonia [31
]. Cloud types are heterogeneous in the Amazon biome; the southern region of the Amazonia has high aerosol concentration, whereas the northern and northwestern regions have low aerosol concentration and high precipitation [30
]. These characteristics indicate different processes in cloud formation in subregions of the Amazon biome.
During the wet season, the precipitating clouds in the Amazon basin are either low-level stratus type clouds (up to 2–5 km altitude) or high-level convective systems (more than 6 km altitude) [30
]. The different land cover influences the amount and type of clouds. Deep clouds are commonly found over the forest while shallow clouds are frequent over deforested areas [33
]. Water bodies absorb visible and near-infrared radiation diminishing the reflectivity of the thin clouds above [35
]. The high reflectivity of artificial surfaces induces commission errors in cloud detection over urban areas [36
]. Such differences pose a challenge for cloud detection algorithms in Amazonia; they need to consider many types of cloud formations and associated shadows. These singular characteristics suggest that it is useful to take images over Amazonia as a study case when comparing cloud detection methods.
The rest of this article is organized as follows. We first introduce the study area and the sample regions. Then we present the cloud detection algorithms and how we configured them. Later we show how the classes resulting from each cloud detection algorithm compares to the others. Finally, we introduce our results and then we discuss some implications of this work.
In our experiments, Fmask 4 has the best overall accuracy, followed by Sen2Cor, MAJA, and S2cloudless (see Table 4
). Fmask 4 consistently outperforms the other algorithms in overall, user’s and producer’s accuracies for all classes. It also has the best results considering individual tiles. For cloud shadow detection, Fmask 4 has a better performance than Sen2Cor. Although MAJA has the ability to detect cloud shadows, in practice the methods extend its cloud mask to include shadows. MAJA is a conservative method that uses dilation operators to improve the user’s accuracy of the clear sky. Therefore, no shadows are reported by MAJA. This is observable in either the images themselves or Figure 2
, in which MAJA consistently detects more cloud pixels than the other methods. Furthermore, Fmask 4 and Sen2Cor also distinguish cirrus clouds from clouds, as shown in Figure 2
. For accuracy assessment, we merged both types of clouds for computing the information in Table 4
and Table 5
to be able to compare Fmask 4 and Sen2Cor with the other methods.
When comparing the overall accuracy of Sen2Cor with that of MAJA, their design choices stand out. MAJA has a better user’s accuracy for clear sky pixels than Sen2Cor; for producer’s accuracy of this class, Sen2Cor is superior. Conversely, MAJA has a better producer’s accuracy than Sen2Cor for cloud pixels; for the user’s accuracy, the situation is inverted. This situation also holds in individual tiles (see Table 5
). The designers of MAJA have chosen to maximize the probability that pixels labeled as a clear sky are correct.
The behavior of S2cloudless is erratic; sometimes it produces results visually similar to those of Fmask 4 or Sen2Cor (see Figure 3
), while in some other occasions it misclassifies clear pixels as clouds (see Figure 2
). For example, for tile T21LXH on 28 March 2017 and tile T22MCA on 28 June 2018, S2cloudless has a particularly poor performance. Figure 4
shows tile T21LXH on 28 March 2017, a case where S2cloudless performs differently from the other methods.
These results showed that the four algorithms produce their best results on images with few well-defined (crisp) clouds. Except for s2cloudless, the algorithms agree on the shape and the number of areas classified as either cloud or clear. However, they cannot adequately approximate the shape of clouds and their shadows on thin semi-transparent cirrus or tightly packed clouds (see Figure 5
). The accurate detection of cloud shadows is challenging because dark surfaces, such as wetlands, burned areas, and terrain shadows can be easily confused with cloud shadows [25
The pictures shown in Figure 4
confirm the results discussed above. While Fmask4 has the best performance, it is interesting to compare MAJA with Sen2Cor. MAJA uses squircles (i.e., a shape between a circle and a rectangle) to fill in the cloud shape, ensuring the total coverage of each cloud in detriment of cloud shadows. Thus, MAJA sometimes incorporates clear pixels in its cloud mask, as also reported by [26
]. By contrast, Sen2Cor approximates the shape of the cloud from the inside, filling in the clouds’ boundaries with saturated labels—particularly with thin cirrus clouds—which produces rough borders (see Figure 5
). Furthermore, Sen2Cor cannot detect small clouds which are correctly identified by Fmask 4 and MAJA (e.g., see the small clouds at the center of Figure 3
The shadow masks produced by Fmask 4 are displaced regarding to the shadows visible in the images. This is a consequence of the coarse spatial resolution of the digital elevation model used by the method. As for their shapes, Fmask 4 matches well the cloud shadows respect to the clouds producing them, while in Sen2Cor the cloud shadows have smoother and different boundaries than their clouds (see Figure 5
). Our results confirm the work of Qiu et al. [25
] and Baetens et al. [26
], who report that Fmask 4 works better in detecting clouds and cloud shadows than Sen2Cor for Sentinel–2A images.
We could not compute the accuracies for cloud shadow detection for MAJA and S2cloudless. This is expected from s2cloudless but it comes as a surprise in the case of MAJA. An explanation is MAJA’s greedy behavior regarding clouds; it tends to tag pixels as clouds in disregard of their shadows (see Figure 2
). An alternative explanation is due to our interpretation of the MAJA’s bit mask where we prioritized clouds over cloud shadows (see Section 2.6
Sen2Cor tags many pixels as saturated, defective, or unclassified which we labelled as other
(see Table 3
). A visual inspection reveals most of the saturated pixels are the external border of clouds (see Figure 3
). On the other hand, Sen2Cor in tile T21LXH for 08 March 2017 (see Figure 4
) mistakenly displays cloud and cloud shadow pixels along the riverbank, almost perfectly profiling the whole river; this could be caused by suspended matter in the water. Sen2Cor problems to detect cloud cover over water were also reported by [25
]. As discussed above, the shapes of clouds in the Sen2Cor mask are rougher compared to the smoother results using the other algorithms. Despite these issues, Sen2Cor is a reliable method for cloud detection. Its producer’s accuracies for clear sky and clouds are respectively 89% and 88%. If its errors in detecting cloud shadows can be tolerated by the user, its efficiency and ease of use may justify its choice for bulk processing.
S2cloudless erratically mixes land features and clouds, particularly on images with few clouds. It does not spot cloud shadows. We could not confirm the claims made by the authors of this method [27
] about the good performance of this algorithm. One explanation is that the clouds in tropical forests such as Amazonia are not adequately included in the S2cloudless training set.
The results of this study show that the Fmask 4 algorithm consistently performs better than the alternatives for Sentinel-2 images of the Amazon rain forest. Fmask 4 had an overall accuracy of 90%, followed by Sen2Cor (79%), MAJA (69%) and s2cloudless (52%). Our results are different from those of Baetens et al. [26
] who concluded that MAJA and Fmask 4 perform similarly with an overall accuracy around 90%, while Sen2Cor had an overall accuracy of 84%. We now consider some hypothesis that could account for such significant differences.
As noted by Baetens et al. [26
], for satellites without thermal bands cloud detection methods use thresholds. Different thresholds are set for the visible bands, the
band, and the Normalized Difference Snow Index. These approximations address important challenges for cloud detection methods: distinguishing clouds from snow, mountain tops, bright deserts, and large built-up objects. Since each cloud detection method relies on different ad hoc
hypotheses, its usefulness varies from scene to scene. For this reason, no single study can provide definitive guidance. Studies that target specific regions, such as the current paper, provide valuable advice even though its results cannot be generalized to non-forest areas.
The comparison done by Baetens et al. [26
] uses 10 different sites, including equatorial forests, deserts and semi-deserts, agricultural areas, mountains, and snowy areas. Their results provide a balance between different targets that could be confused with clouds. By contrast, our study deals only with forest and agriculture areas; the images tested have no deserts, mountains or snow. By focusing on the Amazon biome, our results are intended as guidance for experts interested in measuring land change in the region. Given its focus, these results cannot be generalized to non-forest regions.
A further consideration that could explain part of the differences between our work and that of Baetens et al. [26
] is the choice of training data sets. While we use random sampling, those authors preferred to rely on active learning. An active learning model uses a few good quality samples instead of a large ensemble of random points. These good quality samples are used to train a machine learning model (random forest) whose output provides labels to a large set point for classification. In theory, this method has the advantage of being able to provide a larger number of points to test the algorithm. Machine learning models have a tendency to overfit their training data, which could cause wrong predictions [52
]. The alternative is to use random samples, which rely on standard statistical assumptions. However, random samples can miss some cloud properties. Clouds come in different shapes, sizes, and transparencies; it is often hard to distinguish overlapping clouds at different heights from images. Random samples can also misrepresent minority labels such as cloud shadows. Therefore, both random sampling and active learning have their advantages and shortcomings for evaluating cloud detection algorithms. Further testing and comparison are required to evaluate these approaches.
Another source of divergence between our result and that of Baetens et al. [26
] is due to class relabeling. Cloud detection algorithms codify their results using different levels of detail. To enable comparisons, we had to recode them to the same set of labels. This process implies a loss of information, in particular for MAJA, which provides the most detailed data about its detection process. Thus, our recoding process could have had a negative impact on our evaluation of MAJA.
Despite the differences discussed above, there are points of convergence between our work and earlier papers such as Baetens et al. [26
] and Qiu et al. [25
] related to Fmask 4 performance. The overall user’s and producer’s accuracy values for Fmask 4 are broadly consistent in the three studies. Qiu et al. [25
] report producer’s accuracies for clouds, shadows and clear pixels to be 93%, 70%, and 97%, while our results are 96%, 75%, and 90%. Thus, we consider that Fmask 4 to be a reliable method that we recommend to be used for cloud detection in Sentinel-2 images of the Amazon rain forest.
In this work, we compared four cloud detection algorithms on Sentinel–2A images of the Amazon tropical forest, and we found that Fmask 4 performs the best. We tested four cloud detection algorithms—FMask 4, Sen2Cor, MAJA, and S2cloudless—on 20 images with a different amount of cloud coverage, spread over five regions of Amazonia. We validated the results of the cloud detection algorithms using the criteria of experts on remote sensing who classified approximately 400 random points on each image. To determine the best algorithm, we computed the F1 score and the overall, user, and producer accuracies. We found that FMask 4 has an overall accuracy of 90% to detect clouds, while Sen2Cor’s OA is 79%, MAJA’s OA is 69%, and S2cloudless’s OA is 52%. Based on these results, we recommend the use of Fmask 4 for cloud detection of Sentinel-2 images of the Amazon region.
The choice of method depends on the intended use. Therefore, users should consider the benefits of each method before making their choices. Since MAJA reduces the number of false positives by design, users that aim to improve the producer’s accuracy should consider its use. These characteristics could make MAJA suitable, for example, to build cloud-free monthly mosaics. Despite the poor performance of S2cloudless in our study, we consider that the use of machine learning methods for cloud detection is a promising way forward. As more good quality samples become available, its performance will improve. Finally, Sen2Cor is an efficient method to detect clouds in Sentinel-2 images. Despite not having the best performance, its ease of use may appeal to those that need fast processing of large data sets.
We expect our work to impact on the building of data cubes of analysis-ready data from satellite imagery, like those currently under construction by the Brazil Data Cube project. (Brazil Data Cube project http://brazildatacube.org/
) Another application is for improving the time series analysis of Land Use and Land Cover change of deforested areas, which is particularly hard because of cloud coverage. Given the performance of FMask 4, space agencies and committees such as CEOS should consider the value of working together to develop a standardized best quality cloud detection methods that could be shared and used for remote sensing optical imagery.