Next Article in Journal
Aerial LiDAR Data Augmentation for Direct Point-Cloud Visualisation
Next Article in Special Issue
Optimal Interpolation for Infrared Products from Hyperspectral Satellite Imagers and Sounders
Previous Article in Journal
A New Capacitance Sensor for Measuring the Void Fraction of Two-Phase Flow Through Tube Bundles
Previous Article in Special Issue
Characterization of the Observational Covariance Matrix of Hyper-Spectral Infrared Satellite Sensors Directly from Measured Earth Views
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cloud Detection: An Assessment Study from the ESA Round Robin Exercise for PROBA-V

by
Umberto Amato
1,
Anestis Antoniadis
2 and
Maria Francesca Carfora
3,*
1
Istituto di Scienze Applicate e Sistemi Intelligenti ‘E. Caianiello’ CNR, 80131 Napoli, Italy
2
Laboratoire Jean Kuntzmann, Department of Statistics, Université Joseph Fourier, 38000 Grenoble, France
3
Istituto per le Applicazioni del Calcolo ‘Mauro Picone’ CNR, 80100 Napoli, Italy
*
Author to whom correspondence should be addressed.
Sensors 2020, 20(7), 2090; https://doi.org/10.3390/s20072090
Submission received: 28 February 2020 / Revised: 4 April 2020 / Accepted: 6 April 2020 / Published: 8 April 2020
(This article belongs to the Special Issue Advanced Hyper-Spectral Imaging, Sounding and Applications from Space)

Abstract

:
A Round Robin exercise was implemented by ESA to compare different classification methods in detecting clouds from images taken by the PROBA-V sensor. A high-quality dataset of 1350 reflectances and Clear/Cloudy corresponding labels had been prepared by ESA in the framework of the exercise. Motivated by both the experience acquired by one of the authors in this exercise and the availability of such a reliable annotated dataset, we present a full assessment of the methodology proposed therein. Our objective is also to investigate specific issues related to cloud detection when remotely sensed images comprise only a few spectral bands in the visible and near-infrared. For this purpose, we consider a bunch of well-known classification methods. First, we demonstrate the feasibility of using a training dataset semi-automatically obtained from other accurate algorithms. In addition, we investigate the effect of ancillary information, e.g., surface type or climate, on accuracy. Then we compare the different classification methods using the same training dataset under different configurations. We also perform a consensus analysis aimed at estimating the degree of mutual agreement among classification methods in detecting Clear or Cloudy sky conditions.

1. Introduction

Despite the large existing literature, cloud detection from images taken by sensors onboard satellites is still an area of very active research. This is essentially due to three main reasons: (i) Cloud detection is an important preliminary step of remotely sensed image processing because clouds affect sensor measurements of radiance emitted by surface up to make data unreliable for a wide range of remote-sensing applications that use optical satellite images; (ii) Cloud detection by itself is a difficult problem (even for experts attempting to visually detect clouds from signatures and/or images) in some conditions as transparent or semi-transparent clouds and in general when the contrast between the cloud and the underlying surface is poor; and (iii) Despite consolidated guidelines for cloud detection algorithms (e.g., use of infrared bands, preliminary removal of noninformative bands), development of new sensors with different hardware capabilities in terms of spatial, spectral and temporal resolution claims for specific algorithms or adaption of existing ones (e.g., re-estimate of new thresholds). In particular, a dramatic progress in the technology was availability of hyper-spectral sensors able to take images up to 8K bands [1,2]. While these instruments promise to gather unprecedented information from surface and atmosphere, however, they challenge low-dimensional conventional algorithms for cloud detection not so much for scalability and computational resources required but as for physical and theoretical implications of hyper-spectrality. In particular, really informative bands must be selected in advance to face the curse of dimensionality inherent in statistical estimation (dimensionality reduction). Actually, a manual selection of spectral bands and estimate of thresholds for bands themselves or some couples of theirs is not conceivable anymore, therefore innovative methods for automatic feature extraction from the hyper-spectral images are sought.
In the case of low-dimensionality, as in PROBA-V considered in the present paper, the dimension reduction problem is generally not a concern and all spectral bands are considered (Experiments not reported in this paper confirm that best accuracy is achieved when all PROBA-V spectral bands are considered). However, due to the limited amount of information with a so reduced number of bands, it is important to extract relevant features for cloud detection as effectively as possible.
In all cases, also considering the recent explosive emerging methods, a problem of validation of methods arises that could help not only in comparing their accuracy but also to understand strengths and weaknesses of the general cloud detection problem.
Classification exercises are sometimes organized where different algorithms are challenged to estimate a cloud mask from radiance detected by a specific sensor. Radiance is endowed with labels on the Clear or Cloudy condition accurately assigned by experts that are blind to the algorithms, so to be used as a validation of the algorithms themselves. While the main purpose of such exercises is to develop accurate operational algorithms for specific sensors onboard satellites, an important side effect is comparison of state-of-art methods on a same, very accurate dataset. In this respect we mention the Landsat comparison exercise [3] and the ESA Round Robin exercise for PROBA-V sensor [4]. Such comparisons are an exceptional way not only to compare algorithms, but especially to discover their weakness in particular climatic/surface conditions and, finally, to progress knowledge of cloud mask detection.
One of the authors participated in the ESA Cloud Detection Round Robin exercise (https://earth.esa.int/web/sppa/activities/instrument-characterization-studies/pv-cdrr). Such exercise was intended for the PROBA-V sensor onboard the PROBA ESA platforms and suited for land use and classification, including vegetation, crop monitoring, food security and scarcity prediction, disaster and biosphere monitoring. PROBA-V has a small number of spectral bands (Blue, Red, NIR and SWIR) and in particular it lacks a Thermal Infrared band that could have been useful to detect cirrus clouds. This makes cloud detection from its images challenging. ESA and the Belgian Science Policy Office organized a dedicated Round Robin exercise to inter-compare the performances of different cloud detection algorithms for PROBA-V. The Round Robin exercise provided the participants with a large dataset of PROBA-V images (331 for almost 8 Billion scenes) covering all seasons, most surface types, different world zones and most cloud types (In this paper, we shall refer to pixel as the single element of the image matrix provided by the sensor with its field of view, including the corresponding geographical coordinates, and to scene as the set of corresponding information for that pixel, namely spectral radiance, sky condition, surface, climatic zone). The key data of the exercise is a set of 1350 scenes, blind to participants that have been manually labelled by experts, claimed to sample the most important types of clouds (gold standard). A more detailed description of the exercise and main conclusions are reported in [4].
The framework proposed by the authors for the Round Robin exercise includes a statistical classification method (Cumulative Discriminant Analysis, CDA [5]), a training set semi-automatically obtained from cloud masks estimated for concurrent sensors, and grouping data in almost homogeneous surface types. In particular, our framework was the only one within the exercise that did not use a manual dataset obtained by expert annotation to train the classification. Instead it was relying on a semi-automatic training obtained as the result of consolidated and acknowledged as reliable cloud masks with comparable spatial resolution as the target cloud mask (MODIS and SEVIRI). The only intervention required is spatial and temporal co-registration of the training cloud mask with the target cloud mask. In view of the fact that this training cloud mask is not obtained by expert judgement but by another algorithm, it will be defined as a silver standard. On the one hand quality of the silver standard cannot be compared with the accuracy of a gold standard. However, in this respect we also recall that even very accurate cloud masks annotated by experts are affected by judgement error that in best cases is estimated around 4–7% [3]. On the other hand, the much larger size of the training dataset and its wide coverage can represent a much larger number of cloud and surface conditions. This result cannot be obtained by manual training, naturally limited by human resources. This strategy appears in our opinion as a natural path when methods requiring large training datasets are involved. It is mainly the case of deep learning algorithms. In this respect we mention [6] who use the results of an algorithm (CFMask [3]) to train their deep RS-Net model for Landsat 8 images, and [7] based on AVIRIS cloud mask. Massive use of such a silver standard dataset for cloud detection was pioneered in [8] in our knowledge.
Another qualifying part of our framework was grouping of scenes into homogeneous zones selected basing on the surface type. It is frequent that algorithms for cloud detection are trained separately for different types of the underlying surface (e.g., land or water); other approaches are possible, for example introducing climatic information as in [5].
Aim of the present paper is first to show a full and detailed analysis of our framework and of the results within the Round Robin exercise, assessing its performance under several cloud and surface conditions.
In addition the availability of a very accurate gold standard allows one to quantitatively analyze weaknesses and strengths of cloud detection under a framework in which also the same training dataset is shared among classification methods. In particular, we address the following questions: (a) to compare prototypes of selected cloud detection algorithms well known in the literature; (b) to assess the feasibility of a silver standard to train cloud detection; (c) to assess the role of surface and/or climatic information on the accuracy of cloud detection. Finally, a consensus analysis is performed aimed at estimating the degree of mutual agreement among classification methods in detecting Clear or Cloudy sky conditions.

2. Data and Methods

2.1. PROBA-V Data

PROBA-V (PRoject for On-Board Autonomy-Vegetation) is a global vegetation monitoring mission [9], launched in 2013 to assure the succession of the Vegetation instruments onboard the French SPOT-4 and SPOT-5 Earth observation missions. The satellite follows a Sun-synchronous orbit at a height of 820 km, achieving a daily global coverage, except the equatorial region (within 35 of the Equator) where coverage is guaranteed every two days. The optical instrument onboard provides from 1/3 km to 1 km-resolution data products. It captures a Blue band (centered at 463 nm), a Red band (centered at 655 nm), a Near-Infrared band (centered at 845nm), and a Short-wave Infrared band (centered at 1600 nm). The data of the traditional Vegetation products, as provided by PROBA-V, are freely accessible for all users. The new, higher resolution products of PROBA-V elder than 1 month share the same full, free and open data policy. Details on the technical characteristics of PROBA-V of interest for the present work are reported in Table 1.
We consider as input data 331 images released by ESA and provided by the organizers of the Round Robin exercise. These images are PROBA-V Level 2A products with Top-of atmosphere reflectance (the four PROBA-V bands radiometrically and geometrically corrected and resampled at 333 m). They conform a complete globe acquisition from four different dates covering the four seasons in 2014. PROBA-V scenes are endowed with a sea/land mask and an algorithm for snow/ice detection. The total number of valid scenes available in the 331 files is 7,731,538,861, the remaining ones being off sensor view, sun glint or missing reflectance.

2.2. Validation Dataset (Gold Standard)

All participants of the Round Robin exercise for PROBA-V cloud detection algorithm were provided with a small dataset consisting of 1350 scenes, manually collected, classified and labelled by an expert user. Since obtained by an expert, the dataset can be fully defined as a gold standard. The scenes are a subset of the full PROBA-V dataset collected from 4 different images and labelled with the following categories: Clouds (totally Cloudy, opaque clouds; semi-transparent clouds; other turbid atmosphere, e.g., dust, smoke); Clear sky (over water; snow/ice; other cases); spatially mixed clouds (over land; water; ice).
This small dataset comprises 30% totally Cloudy, 32% semi-transparent and 38% Clear cases. The relationship between land and water scenes is about 70:30 (land:water). The detailed distribution of categories is reported in Table 2. Figure 1 shows the world distribution of the dataset.
We mention that a second dataset of 53,000 scenes was prepared during the ESA project. It served as a basis for comparing algorithms participating in the Round Robin exercise [4]. However, this dataset has never been revealed to participants, even after the end of the exercise [10].

2.3. Training and Validation Dataset

To produce the training and validation sets in the PROBA-V exercise needed by the classification methods we rely on the cloud masks provided by consolidated algorithms endowed with SEVIRI and MODIS data. We assign labels to all the PROBA-V scenes for which both algorithms can provide labels, as reported in the following in detail.

2.3.1. SEVIRI Cloud Mask

Processing data from SEVIRI (Spinning Enhanced Visible and Infrared Imager) sensor onboard MSG satellites provides a cloud mask dataset at 15 min temporal resolution [11] with a spatial resolution of 3Km sub-satellite that degrades far from the equator and from the Greenwich meridian. The data are provided with regional coverage within a radius of about 60 degrees around the point at zero latitude and longitude. No data are provided for the hemisphere including Americas, Oceania and most Asia. The grid of the SEVIRI cloud mask is 3712 × 3712 pixels; the number of valid cloud mask pixels is about 12M.
SEVIRI cloud mask provides four different labels: Clear over sea, Clear over land, Cloudy, uncertain. We considered for the training set only Clear and Cloudy SEVIRI scenes, so to represent the distribution of Clear and Cloudy conditions more accurately without influence of the other conditions. Moreover, since SEVIRI and PROBA-V scenes need to be co-located, we resampled the SEVIRI grid of pixels to a uniform grid in latitude and longitude. This choice preserves the original space resolution close to the center of the SEVIRI images. Of course, far from the center the finer resolution of the new grid is fake and the SEVIRI cloud mask is simply repeated within the coarser grid. Technically the procedure is equivalent to a Nearest Neighbor interpolation. Then we include in the training dataset only the PROBA-V scenes for which the closest SEVIRI pixel with respect to time and space provides a certain (i.e., Clear or Cloudy) label. Figure 2 shows an example of SEVIRI cloud mask.

2.3.2. MODIS Cloud Mask

MODIS (MODerate-resolution Imaging Spectroradiometer) sensor flies onboard Earth Observation System Terra and Aqua satellites on polar orbits. There are two MODIS Cloud Mask L2 data product files: MOD35 [12], containing data collected from the Terra platform, and MYD35 [13], with data collected from the Aqua platform. The MODIS cloud mask provides fifteen classes (such as confident Clear over different surface types; possibly Clear; uncertain Clear; Cloudy) that provide basic surface and cloud type information.
We selected all the MODIS images for the four days of the PROBA-V exercise at 1Km spatial resolution. Their latitude and longitude grid is not uniform and it is given at a coarser resolution than the PROBA-V grid. For this reason, the grid of pixels is first interpolated to the full resolution of the sought PROBA-V cloud mask. Moreover, since the grid of the MODIS images depends on the granule and therefore is not fixed in time or space, we co-located the coarser MODIS grid into the PROBA-V uniform grid. Since MODIS orbits are polar, there is not always a good match in time with a PROBA-V pixel; we considered to be simultaneous only pixels whose acquisition differs at most by 30 min. We tested indeed shorter overlaps of time intervals (15 min and 7 min), and obtained comparable accuracy in the results. Clearly, the interpolation procedure can generate mixed/uncertain labels on some scenes in the finer grid. However, as already done for the SEVIRI cloud mask, we included in the training and validation set only PROBA-V scenes co-located to MODIS ones labelled as Cloudy and confident Clear. Figure 3 shows an example of MODIS cloud mask.

2.4. Combined SEVIRI and MODIS Cloud Mask

A single cloud mask is obtained from the SEVIRI and MODIS cloud masks of Section 2.3.1 and Section 2.3.2 by selecting all scenes for which there was an agreement of both. This choice enhances purity of the Clear and Cloudy classes for the purpose of classification. Since MSG points at 0 longitude, it can take images only of Europe, Africa and part of Asia, therefore the merged database completely misses Americas, almost full Asia and Oceania for which several scenes of the gold standard are present (see Figure 1). The full joint SEVIRI and MODIS dataset is composed of 65M scenes, of which 40% labelled as Clear and 60% as Cloudy. This proportion is consistent with the gold standard in Table 2 (62% Cloudy scenes and 38% Clear ones). An extensive analysis through experiments analogous to the ones described in this paper (not reported for the sake of brevity) confirms that accuracy of the combined cloud mask outperforms the single ones. In [4] some comparisons are shown for the gold standard dataset.

2.5. Surface Type

Among the ancillary information that can improve cloud mask detection, a significant contribution can be given by the characterization of the underlying surface. To sort scenes into homogeneous regions with respect to their spectral behavior, one can use the sea/land mask endowed with PROBA-V scenes and an algorithm for snow/ice detection. Moreover, other and more detailed surface classifications can be considered, such as the GlobCover surface map. GlobCover [14] is a 2005 ESA initiative jointly with the Joint Research Center of the European Commission, the European Environment Agency, the Food and Agriculture Organization of the United Nations, the United Nations Environment Program, the Global Observation of Forest and Land Cover Dynamics and the International Geosphere-Biosphere Program. The aim of the project is to provide land cover maps from the 300 m MERIS sensor onboard the ENVISAT satellite mission. We consider the map that covers the period January–December 2009. The GlobCover map is provided as an image 55 , 800 × 129,600 pixels in an equispaced grid with range [−65 ,80 ] for latitude and [−180 ,180 ] for longitude. The map sorts surface into 22 different classes with proper codes; a sample map is shown in Figure 1. To retain the most significant surface characteristics while avoiding unnecessary details, we group the GlobCover surface classes into the following five types: Water, Vegetation, Bare Land, Urban, Snow/Ice. They are shown in Table 3 together with their numerosity and percentage of Clear and Cloudy scenes both in the full training dataset and in the gold standard one.

2.6. Climate

We also consider grouping of the scenes into homogeneous climatic zones as in [5]. Specifically, we discriminate several regions based on latitude (Tropics, Mid and High Latitude); season (Winter, Summer); Hemisphere. We also rely on information from Elevation and Surface Temperature maps, as obtained from NASA (ASTER Global Digital Elevation Model, [15]) and the European Center for Medium-Range Weather Forecast (ERA-5 land hourly data from the Copernicus Climate Data Store, [16]) public databases, respectively. The specification of the climatic regions we consider is reported in Table 4. Due to the limited world coverage of the SEVIRI data, the number of climatic zones is less than [5] because some of them are empty.

3. Classification Methods

Cloud detection can be formally considered to be a binary supervised classification problem. As such, methods for its solution need a representative set of data with labels considered to be “certain” (training dataset). They evaluate patterns in different features and assign data into one of the two classes (Clear or Cloudy). The classification procedure also involves collection and evaluation of a validation dataset. Once trained, each classifier applies a decision rule to determine if validation data are more likely to have originated from one class or another. This rule partitions the n-dimensional feature space into 2 regions corresponding to the Clear and Cloudy conditions.
Operational cloud-masking algorithms on low/moderate-resolution sensors such as AVHRR and MODIS were mainly based on empirically tuned thresholds from several spectral channels. Also, for higher spectral resolution sensors (Landsat), thermal channel-based spectral thresholding along with prior knowledge of land surface properties has been the most common approach for automatic cloud detection. However, the payload of several recent sensors does not include thermal channels in which cloud-masking strategies have previously relied, so that different approaches, somewhat relying on complementary information have been pursued. Among the many methods and available implementations from the recent literature, we mention here a few significant ones. Taravat et al. [17] propose a Multilayer Perceptron for automatic classification of SEVIRI MSG images trained on the cloud mask by the European Organization for the Exploitation of Meteorological Satellites; Chen et al. [18] implement a neural network classifier driven by extensive radiative transfer simulations and validate it through collocated CALIOP and MODIS data. In [19], a Support Vector Machine classifier is trained on the Gabor energy characteristics of cloud superpixels from GF-1 images, while in [20] algorithms for Sentinel-2 MSI focused on Decision Trees and classical Bayesian classification are considered. Sedano et al. [21] propose a method based on the estimation of Clear/Cloudy radiance density distributions in a data fusion framework, followed by a region growing process and validate their results against both cloud masks generated by statistical methods and Landsat operational cloud mask. Finally, deep learning methods are increasingly considered, with several different approaches: in [22] a multi-modal, pixel-level Convolutional Neural Network-based classifier is introduced for detecting clouds in medium- and high-resolution remote-sensing images, which relies on a large number of per-pixel cloud masks digitized by experts; Francis et al. [23] use multi-scale features, based on a Fully Convolutional Network architecture, and report results on manually annotated images from two high-resolution sensors. An image-based approach is described in [24], relying on multi-modal, high-resolution satellite imagery (PlanetScope, Sentinel-2) at the scene level. Many of these methods rely on expert intervention for labelling training data. As mentioned in the Introduction, we focus instead on automatic means of assigning labels to the training and validation datasets (silver standard), allowing for adjustments of decision boundaries independent of subjective and costly human intervention. In addition, they can cover more general cases than manually possible ones and with a much larger extent, of course to the detriment of the decreased accuracy of labels.
Among the different approaches reported in the literature, we consider and compare for the present study seven different supervised classifiers. They fall into the categories usually labelled as Statistical and Machine Learning and are based on different principles, as Discriminant Analysis, Neural Networks, Nearest Neighbor. We mention that Neural Networks are the basis of Artificial Intelligence methods of present strong interest when the number of features is very high. In the following we briefly describe them.
  • Linear Discriminant Analysis (LDA). It applies the Bayes rule to each scene to select the Clear/Cloudy class so to maximize the posterior probability of the class for a scene given the actual reflectance in that scene. LDA assumes that reflectance follows Gaussian distributions for the Clear and Cloudy classes sharing the same covariance matrix;
  • Quadratic Discriminant Analysis (QDA), which generalizes LDA assuming that also covariance matrix depends on the class (Clear or Cloudy);
  • Principal Component Discriminant Analysis (PCDA) [25]: the hypothesis of Gaussian distribution of reflectance is released in favor of a generic distribution estimated by nonparametric regression; in addition the original reflectances are transformed into uncorrelated Principal Components before classification;
  • Independent Component Discriminant Analysis (ICDA) [25]: similar to PCDA, but with the original reflectances transformed into Independent Components before nonparametric estimation of the densities; this makes such components independent also for non-Gaussian distributions;
  • Cumulative Discriminant Analysis (CDA) [5]: the decision rule for classification depends on a single threshold for each feature (spectral band), based on the empirical distribution function, which discriminates scenes belonging to the Clear and Cloudy classes; the threshold is estimated so to minimize at the same time the false positive and false negative rates on the training or on a validation dataset.
  • Artificial Neural Networks (ANN) [26,27,28]. We use a two-layer feed-forward network, with sigmoid hidden and SoftMax output neurons for pattern recognition. The network is trained with scaled conjugate gradient backpropagation.
  • K-Nearest Neighbor (KNN) [28,29] that labels each considered scene based on a voting strategy among the labels assigned to the K closest neighboring scenes belonging to the training dataset. We used K = 50 throughout this study.
Methods LDA, QDA, PCDA, ICDA and CDA require estimate of the statistical distribution of radiance. We mention that other methods are available in the literature; results of some of them have not been reported because of poor accuracy on other sensors (Logistic Regression) or unfeasible computational time (Support Vector Machine).
All the above methods are pixel-wise, i.e., they treat pixels separately without taking account of spatial correlations among them or local features that are instead typical of images. Among the classification methods that use spatial features of images we mention [30] (Markov chains), [31] (Discriminant Analysis), [32] (relying on PCANet and SVM). We also mention the special case of Artificial Intelligence Deep Learning algorithms (e.g., [6,23]).
Finally, we mention that the method proposed for the Round Robin exercise was CDA [4].

4. Results

This section fully analyzes accuracy of the classification methods introduced in Section 3. Different configurations of classification are considered depending on the use of ancillary information.

4.1. Basic Classification

We consider as input data the reflectances of the scenes at the four spectral bands of PROBA-V. We extract a random training set of 3M scenes from the training dataset described in Section 2.3, whose cloud mask is assigned by the agreement of two consolidated algorithms (joint SEVIRI and MODIS silver standard). After the training phase, we classify the validation set composed of all the PROBA-V scenes for which a joint SEVIRI and MODIS cloud mask is available (65M scenes).
We remark that formally the training and validation datasets should be kept distinct, whereas the former is a subset of the latter. However, due to the large number of available scenes the features of the system are accurately learnt from the various methods in all conditions and are the same both in distinct and overlapping datasets. From the practical point of view all error indicators obtained keeping training and validation datasets distinct or overlapping have the same values up to 3 decimal digits. In addition, for the same reason a size of 3M scenes as a training set is sufficient for classification methods to accurately learn all the features of the data, so that no larger datasets are needed. In practice accuracy of the methods does not change when increasing the size of the training set even up to cover all the 65M available scenes. Using a smaller dataset for the training phase has the advantage of reducing computational time, which would otherwise be practically unfeasible for some methods such as ANN and KNN. Finally, we also mention that accuracy does not depend on the random choice of the 3M scenes subset within 3 decimal digits.
To estimate accuracy of the methods, we consider the following indicators: if N Clear and N Cloudy represent the number of Clear and Cloudy scenes in the validation dataset according to the MODIS and SEVIRI silver standard, respectively, and N ^ Clear and N ^ Cloudy the corresponding values estimated by any classification method, then we consider the global success accuracy A, A Clear and A Cloudy as
A = N ^ Clear + N ^ Cloudy N Clear + N Cloudy , A Clear = N ^ Clear N Clear , A Cloudy = N ^ Cloudy N Cloudy .
Assuming Positive as the Cloudy condition, A Clear is also known as Specificity (True Negative rate) and A Cloudy as Sensitivity (True Positive rate).
Accuracy indicators A, A Clear and A Cloudy are shown in Table 5 for the entire dataset and the classification methods of Section 3; they are also sorted by the surface types introduced in Section 2.5. We remark that in this experiment the training and validation phases are both performed without using any surface information; surface sorting is performed after classification only to assess the accuracy level on different surface types.
As a complementary exercise, we compute accuracy indicators when the validation dataset is the gold standard of Section 2.2. The performance of all the algorithms is reported in Table 6. We advise that accuracy for some surfaces, namely Urban and Snow/Ice, is biased from the very low number of representative scenes and from lack of Clear or Cloudy scenes (compare with Table 3).
Analysis of results on the entire dataset (Table 5) shows first of all that ANN and KNN generally outperform methods based on the direct estimate of distributions, being the only ones to reach accuracies beyond 90% globally and in most cases. We also observe that CDA, which was used for the Round Robin exercise, is by far outperformed by ANN and KNN. This is mainly due to its very nature of forcing the same accuracy in both Clear and Cloudy conditions so to reach equal I- and II-Type errors globally (row Global in the table). This is paid when accuracy is assessed on a finer scale of surface type after classification has been made. As an example Bare Land has the most unbalanced proportion of Clear and Cloudy scenes, with latter ones being 8% of the total (see Table 3), therefore Clear scenes that globally have a lower frequency (40%) are strongly penalized. We also observe that accuracy is strongly dependent on the surface type for all methods; in particular it is higher on Land than on Water. This result is consistent among all methods participating the Round Robin exercise [4] and was there justified with a more accurate training. In this experiment this is equivalent to the fact that Land scenes are 2/3 of Total ones in the global training set, so that classification naturally tends to better represent them. On the other side main applications of PROBA-V sensor are for Land and vegetation, particularly.
Results are roughly consistent when we limit validation to the gold standard dataset for which the true Clear/Cloudy condition is known (Table 6). Accuracy is lower than the training set case by 7–8% for all methods. A noteworthy exception is CDA, whose global accuracy remains the same (actually, marginally better), even though no longer equibalanced between Clear and Cloudy conditions, the latter being increased to 87.1% at detriment of the former (74.4%). We again observe that ANN and KNN algorithms outperform methods based on the direct estimate of distributions, with accuracy rarely going beyond 90%, and that particularly for KNN is globally 2% higher than ANN, mostly due to a better detection of Cloudy condition (81.6% vs. 75.2%).

4.2. Use of Ancillary Information

Experiment of Section 4.1 considers the entire joint SEVIRI and MODIS dataset for the training (apart from the random subset selection). However, it is well known that capability of detecting clouds heavily depends on the underlying surface because of the different contrast between clouds and surface. Therefore, a possible useful strategy is to disaggregate the scenes into groups homogeneous as possible within, at a finer level than the land/water often used in operational cloud detection methods. For this purpose, we consider two possible disaggregations based on the type of surface and on the climate.

4.2.1. Surface as Ancillary Information

We train the algorithms separately on different surface types. From the entire dataset we randomly extract a training set for each of the five surface types considered in Section 2.5. The size of these training sets is fixed as the minimum between 3M scenes and the subset size. Validation is performed on all the scenes of the same surface type in the full dataset. Table 7 reports the accuracy indicators. Table 8 refers to accuracy estimated using the gold standard as a validation dataset.
Comparison of Table 7 with Table 5 does not provide unique indications: most methods show a certain degree of decrease of accuracy around 1% globally (only LDA reaching −17%). Notable exceptions are ANN (+1%) and, especially, CDA (+5.8%). This is to be expected for CDA since the constraint of equal I- and II-Type errors now applies separately to each surface type and much better adapts to the frequency of Clear/Cloudy conditions that globally depends on the underlying type of surface. Improvement of CDA occurs for all surface types; in particular it is remarkable for that sky condition (Clear or Cloudy) with the worse accuracy when a unique classification was made without using ancillary information; however this improvement occurs in a less extent at detriment of the other sky condition (Cloudy or Clear, respectively).
If we compare Table 8 with Table 6 again we observe a drop in accuracy around 6–9% (now including CDA), somewhat smaller for KNN (4.4%).
Summarizing there is not a clear evidence from these quantitative indicators that separate classification problems tailored to different surface types improve accuracy. We will investigate further this matter later in Section 4.3.1 through visual inspection of the cloud masks provided by the different algorithms on some specific images.

4.2.2. Climate as Ancillary Information

We train the classification methods separately on different climatic zones, as described in Section 2.6 and introduced in [5]. From the entire dataset a sample is extracted for each climatic zone as a training, with a size as the minimum between 3M and the subset size. As in Section 4.2.1 validation is performed on the entire dataset or on the gold standard one. Accuracy is shown in Table 9 and Table 10, respectively.
Also, this experiment does not provide a conclusive answer on the improvement of accuracy considering separate classifications for different climatic zones. Global increase of accuracy, when occurring (ANN among the best performing methods) is not spread over Clear and Cloudy conditions, rather a slight decrease in Clear conditions is observed. In the same way as separate classifications by surface type we observe a mix of better (ICDA and KNN) and worse (CDA and ANN) accuracies also spread over Clear and Cloudy conditions. The same conclusions can be drawn also on the gold standard dataset, with the remarkable better performance of CDA that is indeed the best performing method.

4.2.3. Ancillary Information with ANN

Neural Networks are claimed to successfully mix information of different nature (e.g., nonnumeric). Our framework includes reflectance as a numeric information and surface type or climatic zone as a categorical variable. In Section 2.5 and Section 2.6 the categorical variable was dealt independently, in the sense that separate cloud masks were obtained with separate different training sets for each surface type or climatic zone. By its nature, ANN instead easily allows one to introduce the type of surface and/or climatic zone directly within the training dataset as a further fifth variable besides reflectance of the four spectral bands. The present Section aims at estimating if and how accuracy improves or degrades when compared with ancillary information used as separate classifications.
Results are shown in Table 11, where for the sake of brevity only Global results are shown, not disaggregated by type of surface or climate.
Comparison of Table 11 with Table 5, Table 7 and Table 9 shows that keeping separate classification for the different types of ancillary information improves accuracy of ANN.

4.3. Opaque and Semi-Transparent Clouds

This Section is intended to assess behavior of the considered classification methods with opaque and semi-transparent clouds and border identification. For this purpose, we consider two images already used in [4] and a quantitative analysis based on the gold standard dataset.

4.3.1. Visual Analysis of Opaque and Semi-Transparent Clouds

The first image was acquired on 21 June 2014 at 14:42 UTC over Bolivia overlying a land surface. It includes both opaque and semi-transparent clouds, as can be seen from the composite RGB image in the first panel of Figure 4. The other panels of the Figure show the cloud mask predicted by LDA, PCDA, CDA, ANN and KNN. All cases refer to the configuration of a unique classification independent of surface type or climate. An indirect comparison with other methods of the PROBA-V Round Robin exercise as reported in [4] can be made because the covered region is the same.
Figure 5 shows the same results obtained when classification is made separately for each type of surface.
Figures clearly show that KNN and ANN are more cloud conservative than other methods. Despite the nonconclusive quantitative analysis on the role of separate classifications for different surface types or climatic zones, Figure 4 shows that a unique classification independent of surface or climate is prone to mistakes in interpreting Clear condition over water. Actually, some of the lakes in the Bolivia region (green color in the RGB image of the Figure) are misrecognized as Cloudy by 3 classification methods out of 5, including one of the best performing (ANN). On the contrary when classification is separately made by surface type only PCDA still misrecognizes some of the lakes as Cloudy, all other methods correctly detecting Clear sky conditions.
We also remark that KNN, though more conservative with respect to transparent clouds, is less prone to spurious Cloudy isolated scenes (see in particular the top-right part of Figure 5, bottom-right panel), while still preserving sharpness of the clouds.
The second image, also considered in [4], was acquired on 21 December 2014 at 02:29 UTC over Northern Australia and includes both land and water surface. Figure 6 shows the RGB image and the cloud mask retrieved by LDA, PCDA, CDA, ANN and KNN (orderly from left to right and from top to bottom) when classification is made separately for each surface type.
The Figure confirms that ANN and KNN are more cloud conservative and semi-transparent clouds are detected as Cloudy.

4.3.2. Quantitative Analysis of Semi-Transparent Clouds

To quantitatively analyze behavior of the classification methods with respect to opaque and semi-transparent clouds, Table 12 reports the percentage of semi-transparent scenes identified as Clear and Cloudy for all the considered methods. Table 12 quantitatively confirms the more conservative character of ANN and KNN throughout all the different experiments settings.

4.4. Consensus Analysis

This Section makes a consensus analysis aimed at assessing the mutual agreement of classification methods in estimating cloud mask. It can provide a further accuracy indicator, independent of any gold standard eventually available, and advice on the reliability of the cloud masks obtained by classification methods. Given a set of methods, we define consensus on Clear and Cloudy sky condition as the percentage of scenes for which all methods predict Clear and Cloudy conditions, respectively. The global consensus is defined as the direct sum of both Clear and Cloudy ones, and represents the percentage of scenes for which all methods agree in predicting the same sky condition, irrespective of the type, Clear or Cloudy. The percentage is computed on the entire set of 65M scenes of the dataset. We consider a subset of the classification methods considered in Section 3 according to the following guidelines: a) the methods have to be as independent as possible (however some degree of dependence cannot be avoided because all methods rely on the same training data set); b) the number of considered methods has to be low; in fact, the higher the number of methods, the higher probability that any of them predicts the status of the sky differently from the other ones, and therefore the lower the consensus. As a consequence we select only one representative method from the Discriminant Analysis group, namely PCDA that gives better accuracy in the analysis of Section 4.1, Section 4.2.1 and Section 4.2.2, keeping all the other methods (CDA, ANN, KNN). Should the number of methods be much larger, then a more elaborated definition of consensus should be devised.
First, we compute the consensus according to the results of Section 4.1 where a unique training dataset is considered independently of surface or climatic zone. The four methods agree on identifying 33% of the scenes as Clear and 49% as Cloudy, resulting in a full concordance on 82% of the scenes.
Then we consider the case when different separate classifications are performed for each type of surface (Section 2.5), whose results are reported in Table 13. When additional information on surface type is given, global concordance of the methods is slightly higher (83%), with a slightly better concordance on Clear scenes at detriment of Cloudy ones. Values disaggregated by surface type and for each sky condition are reported in the same Table: the highest consensus is obtained for Vegetation (87%) and the lowest one on Water (78%).
When separate classifications are made for each climatic zone, global consensus drops to 78%, mainly due to a lower consensus for Clear sky conditions (30.5%). Values for different climatic zones range from 73% (Tropical zone) to 89% (Mid Latitude Winter NH). Table 14 reports full results disaggregated by climatic zone and by sky condition.
Summarizing we can state that the highest consensus is reached when different classifications are made by surface type.
Finally, we investigate which of the selected classification methods deviates most from the consensus expressed from the other methods. For this purpose, we consider scenes for which only three of the 4 methods agree on the Clear or Cloudy condition and calculate the frequency of the methods that deviate from the other three ones.
In the basic classification (without additional information) the method that more often disagrees with the other three ones in predicting a Cloudy sky condition is PCDA. Specifically, when three methods out of four agree on Cloudy scenes, PCDA disagrees with them 41% of the times. When three methods agree in predicting a Clear sky condition, then CDA mostly disagrees (61% of scenes). The situation is similar when separate classifications are performed for each surface type (see row Global in Table 15): PCDA mostly disagrees with the other three methods (50.3% of scenes); this result is due to a very poor agreement in Cloudy sky conditions (disagreement in 87.9% of scenes), whereas in Clear sky conditions it is CDA that reaches the highest disagreement with 52.7% of scenes. This also occurs on the different surface types, with PCDA showing the highest disagreement for all types of surfaces but Bare Land, due to the highest disagreement in Cloudy conditions, whereas in Clear sky conditions CDA shows the highest disagreement on Vegetation and Bare Land, ANN on Water and Urban surface and KNN on Snow/Ice.
When classification is trained separately on different climate zones, as reported in Table 16, the situation is quite similar for PCDA and CDA, with ANN showing a greater disagreement in more cases.
Summarizing, we can say that PCDA and CDA show the highest disagreement with respect to the other methods and on the contrary KNN the lowest disagreement.

5. Conclusions

The paper shows a detailed analysis of the method presented by one the authors in a Round Robin exercise organized by ESA for detecting clouds from images taken by the PROBA-V sensor. Availability of a common high-quality dataset of scenes labelled as Clear or Cloudy by experts (gold standard) is a unique benchmark for comparing different methods for detecting clouds and investigating on questions still open.
We considered some prototypes of methods and compared them under different frameworks but using a common training dataset for all of them. We demonstrated that CDA, chosen for participating in the Round Robin, was adequate yielding good accuracy. However, ANN and, particularly, KNN can both improve accuracy and better detect scenes with semi-transparent clouds. In addition, a silver standard training dataset, semi-automatically obtained by algorithms developed for other sensors, was proved to be effective in detecting clouds, yielding high accuracy. We stress that the silver standard dataset considered in this paper covers only a portion of the globe and that the majority of gold standard pixels are outside the region covered by the silver standard. Indeed a silver standard dataset is the only feasible solution to get very large training datasets needed to train Artificial Intelligence methods (see, e.g., [7]).
Then, even though it was not possible to give a conclusive quantitative answer whether separate classifications based on ancillary information as surface types and/or climatic zones improve accuracy, a qualitative analysis shows that introducing such information reduces probability to misinterpret Clear/Cloudy condition on Water.
Finally, we performed a consensus analysis aimed at estimating the degree of mutual agreement among classification methods in detecting Clear or Cloudy sky. The result was that a selection of 4 classification methods agree on the status of Clear or Cloudy sky for about 83% of scenes. PCDA and CDA show the highest disagreement with the consensus of the other 3 methods and KNN the lowest disagreement. This result is consistent with the findings on accuracy.
Results shown in the paper strictly refer to sensors with a very low number of spectral bands. Other sensors, especially hyper-spectral, or at higher/lower spatial resolutions need a specific similar analysis that will probably give very different results.
Results of the paper suggest possible future investigations. a) The use of different methods for different surfaces or climatic zones: this study demonstrated that some methods could be more suited for particular surfaces or climatic zones when we consider global accuracy or specific accuracy for Clear or Cloudy conditions. b) Balancing of cloud/Clear conditions: this largely depends on the training dataset and on the fraction of Clear/Cloudy scenes included. Then the classification methods will generally and naturally favor the most populated class when Clear and Cloudy features overlap, so to improve global accuracy. However commonly remotely sensed images can refer to conditions that are prevalently Clear or prevalently Cloudy that could benefit from a training dataset or a classification method that weights scenes according to the different proportion of Clear/Cloudy conditions in the region. CDA is an attempt into this direction, even though in a global way, and Discriminant Analysis is naturally prone to include such weights, simulating a balancing of Clear and Sky conditions different from the training dataset. c) To use images instead of independent pixels in the classification, so to exploit spatial correlation (that clouds indeed possess) and/or equivalently spatial features. In this respect Artificial Intelligence methods, already available in the literature but not considered in this paper, become interesting also for a small number of spectral bands.

Author Contributions

Conceptualization, M.F.C. and U.A.; methodology, U.A. and A.A.; software, U.A. and M.F.C.; validation, M.F.C.; data curation, U.A. and M.F.C.; writing—original draft preparation, U.A. and M.F.C.; writing—review and editing, U.A., A.A. and M.F.C.; funding acquisition, U.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the European Space Agency grant number WP-3310-5-1.

Conflicts of Interest

The authors declare no conflict of interest. The funders designed the Round Robin exercise and collected main data; they had no role in the analysis or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
ANNArtificial Neural Network
ASTERAdvanced Spaceborne Thermal Emission and Reflection Radiometer
AVHRRAdvanced Very-High-Resolution Radiometer
AVIRISAirborne Visible/InfraRed Imaging Spectrometer
CALIOPCloud-Aerosol LiDAR with Orthogonal Polarization
CDACumulative Discriminant Analysis
ERA-5ECMWF Reanalysis 5th Generation
ESAEuropean Space Agency
ICDAIndependent Component Discriminant Analysis
KNNk-Nearest Neighbor
LDALinear Discriminant Analysis
MERISMEdium Resolution Imaging Spectrometer
MODISMODerate-resolution Imaging Spectroradiometer
MSGMeteosat Second Generation
MSIMultiSpectral Instrument
NASANational Aeronautics and Space Administration
NHNorthern Hemisphere
NIRNear-InfraRed
PCDAPrincipal Component Discriminant Analysis
PROBA-VPRoject for On-Board Autonomy – Vegetation
QDAQuadratic Discriminant Analysis
SEVIRISpinning Enhanced Visible and Infrared Imager
SHSouthern Hemisphere
SPOTSatellite Pour l’Observation de la Terre
SWIRShort-Wave InfraRed
UTCCoordinated Universal Time

References

  1. Masiello, G.; Serio, C.; Venafra, S.; Liuzzi, G.; Poutier, L.; Goettsche, F. Physical retrieval of land surface emissivity spectra from hyper-spectra infrared observations and validation with in situ measurements. Remote Sens. 2018, 10, 976. [Google Scholar] [CrossRef] [Green Version]
  2. Serio, C.; Masiello, G.; Liuzzi, G. Demonstration of random projections applied to the retrieval problem of geophysical parameters from hyper-spectral infrared observations. Appl. Opt. 2016, 55, 6576–6587. [Google Scholar] [CrossRef]
  3. Foga, S.; Scaramuzza, P.L.; Guo, S.; Zhu, Z.; Dilley, R.D.; Beckmann, T.; Schmidt, G.L.; Dwyer, J.L.; Hughes, M.J.; Laue, B. Cloud detection algorithm comparison and validation for operational Landsat data products. Remote Sens. Environ. 2017, 194, 379–390. [Google Scholar] [CrossRef] [Green Version]
  4. Iannone, R.Q.; Niro, F.; Goryl, P.; Dransfeld, S.; Hoersch, B.; Stelzer, K.; Kirches, G.; Paperin, M.; Brockmann, C.; Gómez-Chova, L.; et al. Proba-V cloud detection Round Robin: Validation results and recommendations. In Proceedings of the 2017 9th International Workshop on the Analysis of Multitemporal Remote Sensing Images (MultiTemp), Brugge, Belgium, 27–29 June 2017; pp. 1–8. [Google Scholar] [CrossRef]
  5. Amato, U.; Lavanant, L.; Liuzzi, G.; Masiello, G.; Serio, C.; Stuhlmann, R.; Tjemkes, S.A. Cloud mask via cumulative discriminant analysis applied to satellite infrared observations: Scientific basis and initial evaluation. Atmos. Meas. Tech. 2014, 7, 3355–3372. [Google Scholar] [CrossRef]
  6. Jeppesen, J.H.; Jacobsen, R.H.; Inceoglu, F.; Toftegaard, T.S. A cloud detection algorithm for satellite imagery based on deep learning. Remote Sens. Environ. 2019, 229, 247–259. [Google Scholar] [CrossRef]
  7. Sun, L.; Yang, X.; Jia, S.; Jia, C.; Wang, Q.; Liu, X.; Wei, J.; Zhou, X. Satellite data cloud detection using deep learning supported by hyperspectral data. Int. J. Remote Sens. 2019, 41, 1349–1371. [Google Scholar] [CrossRef] [Green Version]
  8. Amato, U.; Antoniadis, A.; Cuomo, V.; Cutillo, L.; Franzese, M.; Murino, L.; Serio, C. Statistical cloud detection from SEVIRI multispectral images. Remote Sens. Environ. 2008, 112, 750–766. [Google Scholar] [CrossRef]
  9. Dierckx, W.; Sterckx, S.; Benhadj, I.; Livens, S.; Duhoux, G.; Achteren, T.V.; Francois, M.; Mellab, K.; Saint, G. PROBA-V mission for global vegetation monitoring: Standard products and image quality. Int. J. Remote Sens. 2014, 35, 2589–2614. [Google Scholar] [CrossRef]
  10. Toté, C.; Swinnen, E.; Sterckx, S.; Adriaensen, S.; Benhadj, I.; Iordache, M.D.; Bertels, L.; Kirches, G.; Stelzer, K.; Dierckx, W.; et al. Evaluation of PROBA-V Collection 1: Refined Radiometry, Geometry, and Cloud Screening. Remote Sens. 2018, 10, 1375. [Google Scholar] [CrossRef] [Green Version]
  11. Derrien, M.; Gléau, H.L. MSG/SEVIRI cloud mask and type from SAFNWC. Int. J. Remote Sens. 2005, 26, 4707–4732. [Google Scholar] [CrossRef]
  12. Ackerman, S.A.; Frey, R. MODIS Atmosphere L2 Cloud Mask Product. NASA MODIS Adaptive Processing System, Goddard Space Flight Center: Greenbelt, MD, USA, 2015. Available online: http://dx.doi.org/10.5067/MODIS/MOD35_L2.006 (accessed on 25 March 2020).
  13. Ackerman, S.A.; Frey, R. MODIS Atmosphere L2 Cloud Mask Product. NASA MODIS Adaptive Processing System, Goddard Space Flight Center: Greenbelt, MD, USA, 2015. Available online: http://dx.doi.org/10.5067/MODIS/MYD35_L2.006 (accessed on 25 March 2020).
  14. Loveland, T.R.; Reed, B.C.; Brown, J.F.; Ohlen, D.O.; Zhu, Z.; Yang, L.; Merchant, J.W. Development of a global land cover characteristics database and IGBP DISCover from 1 km AVHRR data. Int. J. Remote Sens. 2000, 21, 1303–1330. [Google Scholar] [CrossRef]
  15. METI (Ministry of Economy Trade and Industry of Japan) and NASA (US National Aeronautics and Space Administration). ASTER Global Digital Elevation Map, v.2, 2011. Available online: https://asterweb.jpl.nasa.gov/gdem.asp (accessed on 25 March 2020).
  16. Copernicus Climate Change Service. C3S ERA5-Land reanalysis. Available online: https://cds.climate.copernicus.eu/cdsapp#!/home (accessed on 4 October 2019).
  17. Taravat, A.; Proud, S.; Peronaci, S.; Del Frate, F.; Oppelt, N. Multilayer perceptron neural networks model for meteosat second generation SEVIRI daytime cloud masking. Remote Sens. 2015, 7, 1529–1539. [Google Scholar] [CrossRef] [Green Version]
  18. Chen, N.; Li, W.; Gatebe, C.; Tanikawa, T.; Hori, M.; Shimada, R.; Aoki, T.; Stamnes, K. New neural network cloud mask algorithm based on radiative transfer simulations. Remote Sens. Environ. 2018, 219, 62–71. [Google Scholar] [CrossRef] [Green Version]
  19. Sui, Y.; He, B.; Fu, T. Energy-based cloud detection in multispectral images based on the SVM technique. Int. J. Remote Sens. 2019, 40, 5530–5543. [Google Scholar] [CrossRef]
  20. Hollstein, A.; Segl, K.; Guanter, L.; Brell, M.; Enesco, M. Ready-to-use methods for the detection of clouds, cirrus, snow, shadow, water and clear sky pixels in Sentinel-2 MSI images. Remote Sens. 2016, 8, 666. [Google Scholar] [CrossRef] [Green Version]
  21. Sedano, F.; Kempeneers, P.; Strobl, P.; Kucera, J.; Vogt, P.; Seebach, L.; San-Miguel-Ayanz, J. A‘cloud mask methodology for high resolution remote sensing data combining information from high and medium resolution optical sensors. ISPRS J. Photogramm. Remote Sens. 2011, 66, 588–596. [Google Scholar] [CrossRef]
  22. Li, Z.; Shen, H.; Cheng, Q.; Liu, Y.; You, S.; He, Z. Deep learning based cloud detection for medium and high resolution remote sensing images of different sensors. ISPRS J. Photogramm. Remote Sens. 2019, 150, 197–212. [Google Scholar] [CrossRef] [Green Version]
  23. Francis, A.; Sidiropoulos, P.; Muller, J.P. CloudFCN: Accurate and Robust Cloud Detection for Satellite Imagery with Deep Learning. Remote Sens. 2019, 11, 2312. [Google Scholar] [CrossRef] [Green Version]
  24. Shendryk, Y.; Rist, Y.; Ticehurst, C.; Thorburn, P. Deep learning for multi-modal classification of cloud, shadow and land cover scenes in PlanetScope and Sentinel-2 imagery. ISPRS J. Photogramm. Remote Sens. 2019, 157, 124–136. [Google Scholar] [CrossRef]
  25. Amato, U.; Antoniadis, A.; Grégoire, G. Independent component discriminant analysis. Int. J. Math. 2003, 3, 735–753. [Google Scholar]
  26. McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
  27. Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 1958, 65, 386–408. [Google Scholar] [CrossRef] [Green Version]
  28. Duda, R.; Hart, P.; Stork, D. Pattern Classification; John Wiley & Sons: Hoboken, NJ, USA, 2012; p. 688. [Google Scholar]
  29. Stone, C.J. Consistent Nonparametric Regression. Ann. Stat. 1977, 5, 595–620. [Google Scholar] [CrossRef]
  30. Besag, J. On the Statistical Analysis of Dirty Pictures. J. R. Stat. Soc. Ser. B (Methodol.) 1986, 48, 259–279. [Google Scholar] [CrossRef] [Green Version]
  31. Cutillo, L.; Amato, U. Localized empirical discriminant analysis. Comput. Stat. Data Anal. 2008, 52, 4966–4978. [Google Scholar] [CrossRef]
  32. Zi, Y.; Xie, F.; Jiang, Z. A Cloud Detection Method for Landsat 8 Images Based on PCANet. Remote Sens. 2018, 10, 877. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Global surface map as provided by GlobCover (22 classes). The superimposed black dots represent the pixels belonging to the gold standard dataset in the PROBA-V Round Robin exercise.
Figure 1. Global surface map as provided by GlobCover (22 classes). The superimposed black dots represent the pixels belonging to the gold standard dataset in the PROBA-V Round Robin exercise.
Sensors 20 02090 g001
Figure 2. Example of SEVIRI cloud mask.
Figure 2. Example of SEVIRI cloud mask.
Sensors 20 02090 g002
Figure 3. Example of MODIS cloud mask co-located to PROBA-V image PROBAV_L2A_20140321_000512_3_333M_V001 over South Australia.
Figure 3. Example of MODIS cloud mask co-located to PROBA-V image PROBAV_L2A_20140321_000512_3_333M_V001 over South Australia.
Sensors 20 02090 g003
Figure 4. RGB image acquired on 21 June 2014 at 14:42 UTC over Bolivia (top-left panel). Other panels show cloud masks obtained by LDA, PCDA, CDA, ANN and KNN (orderly from left to right and from top to bottom). A single classification was made on the full joint training dataset SEVIRI and MODIS.
Figure 4. RGB image acquired on 21 June 2014 at 14:42 UTC over Bolivia (top-left panel). Other panels show cloud masks obtained by LDA, PCDA, CDA, ANN and KNN (orderly from left to right and from top to bottom). A single classification was made on the full joint training dataset SEVIRI and MODIS.
Sensors 20 02090 g004
Figure 5. RGB image acquired on 21 June 2014 at 14:42 UTC over Bolivia (top-left panel). Other panels show cloud masks obtained by LDA, PCDA, CDA, ANN and KNN (orderly from left to right and from top to bottom). Classification was made separately for each type of surface.
Figure 5. RGB image acquired on 21 June 2014 at 14:42 UTC over Bolivia (top-left panel). Other panels show cloud masks obtained by LDA, PCDA, CDA, ANN and KNN (orderly from left to right and from top to bottom). Classification was made separately for each type of surface.
Sensors 20 02090 g005
Figure 6. RGB image acquired on 21 December 2014 at 02:29 UTC over North Australia (top-left panel). Other panels show cloud masks obtained by LDA, PCDA, CDA, ANN and KNN (orderly from left to right and from top to bottom). Classification was made separately for each type of surface.
Figure 6. RGB image acquired on 21 December 2014 at 02:29 UTC over North Australia (top-left panel). Other panels show cloud masks obtained by LDA, PCDA, CDA, ANN and KNN (orderly from left to right and from top to bottom). Classification was made separately for each type of surface.
Sensors 20 02090 g006
Table 1. Technical characteristics of PROBA-V sensor.
Table 1. Technical characteristics of PROBA-V sensor.
CharacteristicValue
OrbitSun-synchronous polar orbit, 98.73 inclination, 820 km altitude
Coverage35 –75 N and 35 –56 S daily
35 N–35 S every 2 day
Field of view102 , 2250 km swath
Resolution100 m at Nadir, 350 m full field of view
      visual and infrared detector
Blue band447–493 ν m
Red band610–190 ν m
Near-Infrared band773–893 ν m
Product resolution1 km, 1/3 km
      short-wave infrared detector
Short-Wave Infrared band1570–1650 ν m
Product resolution1 km, 2/3 km
Table 2. Distribution of the labelled scenes in the gold standard dataset provided to all the participants in PROBA-V Round Robin exercise.
Table 2. Distribution of the labelled scenes in the gold standard dataset provided to all the participants in PROBA-V Round Robin exercise.
Category# ScenesPercentage
Totally Cloudy40030%
Semi-transparent clouds43832%
Total Cloudy scenes83862%
Clear sky water19214%
Clear sky land20515%
Clear sky snow/ice675%
Turbid atmosphere (no cloud) cases232%
Other Clear cases252%
Total Clear scenes51238%
Floating ice675%
Glint594%
Cloud shadow463%
Total1350100%
Table 3. Surface types considered in the paper, their numerosity and percentage of Clear and Cloudy scenes in both the full joint MODIS and SEVIRI and the gold standard dataset.
Table 3. Surface types considered in the paper, their numerosity and percentage of Clear and Cloudy scenes in both the full joint MODIS and SEVIRI and the gold standard dataset.
Full DatasetGold Standard Dataset
Surface Type# ScenesClearCloudy# ScenesClearCloudy
Water20,746,553 (31.8%)19%81%326 (24.1%)61%39%
Vegetation33,978,411 (52.1%)37%63%921 (68.2%)28%72%
Bare Land9,927,876 (15.2%)92%8%83 (6.1%)51%49%
Urban235,546 (0.4%)24%76%9 (0.07%)100%0%
Snow/Ice41,570 (0.06%)39%61%10 (0.07%)0%100%
Total65,238,704 (100%)40%60%1350 (100%)38%62%
Table 4. Climatic zones considered in the paper and their numerosity in both the joint MODIS and SEVIRI dataset and the gold standard dataset.
Table 4. Climatic zones considered in the paper and their numerosity in both the joint MODIS and SEVIRI dataset and the gold standard dataset.
Full DatasetGold Standard Dataset
Climatic Zones# ScenesClearCloudy# ScenesClearCloudy
Tropical3,480,8034 (53.3%)55%45%728 (53.9%)34%66%
Mid-Latitude Summer NH18,610,636 (28.5%)29%71%285 (21.1%)42%58%
Mid-Latitude Summer SH270,858 (0.4%)1%99%
Mid-Latitude Winter NH8,173,055 (12.5%)12%88%
Mid-Latitude Winter SH573,416 (0.9%)8%92%
High-Latitude Summer NH2,240,391 (3.4%)7%93%226 (16.7%)39%61%
Ice on sea NH4950 (0.008%)3%97%17 (1.3%)24%76%
Ice on sea SH1804 (0.003%)0%100%
Ice over land NH343,163 (0.5%)6%94%90 (6.7%)60%40%
Ice over land SH1398 (0.002%)0%100%
Ice over elevated land NH210,958 (0.3%)22%78%4 (0.3%)0%100%
Total65,238,704 (100%)40%60%1350 (100%)38%62%
Table 5. Accuracy indicators A, A Clear and A Cloudy of classification methods of Section 3 when validation is made on the full dataset and no ancillary information on surface or climate is used. Percentage values are reported for the entire dataset (row Global) and sorted for different surface types.
Table 5. Accuracy indicators A, A Clear and A Cloudy of classification methods of Section 3 when validation is made on the full dataset and no ancillary information on surface or climate is used. Percentage values are reported for the entire dataset (row Global) and sorted for different surface types.
LDAQDAPCDAICDACDAANNKNN
A Clear 70.796.491.594.295.081.587.1
Water A Cloudy 81.568.077.374.275.994.296.1
A79.573.480.078.079.591.894.3
A Clear 97.896.193.995.291.789.793.5
Vegetation A Cloudy 69.879.285.283.487.291.391.7
A80.185.488.487.788.890.792.4
A Clear 99.899.598.899.163.597.699.2
Bare Land A Cloudy 54.668.572.870.788.876.374.3
A96.397.296.896.965.596.097.3
A Clear 97.093.986.592.078.272.679.4
Urban A Cloudy 76.783.788.886.890.594.193.6
A81.586.188.288.087.689.090.2
A Clear 96.080.280.086.394.193.488.8
Snow/Ice A Cloudy 59.065.667.168.959.481.293.8
A73.671.372.275.773.186.091.9
A Clear 94.497.495.396.482.191.294.5
Global A Cloudy 74.374.081.379.082.192.393.2
A82.283.286.985.982.191.993.8
Table 6. Accuracy indicators A, A Clear and A Cloudy of classification methods when validation is made on the gold standard dataset of Section 2.2 and no ancillary information on surface or climate is used. Percentage values are reported for the entire gold standard dataset (row Global) and sorted for different surface types.
Table 6. Accuracy indicators A, A Clear and A Cloudy of classification methods when validation is made on the gold standard dataset of Section 2.2 and no ancillary information on surface or climate is used. Percentage values are reported for the entire gold standard dataset (row Global) and sorted for different surface types.
LDAQDAPCDAICDACDAANNKNN
A Clear 62.581.071.067.585.068.580.5
Water A Cloudy 84.979.483.384.180.293.784.9
A71.280.475.873.983.178.282.2
A Clear 68.668.265.564.864.877.480.8
Vegetation A Cloudy 79.882.685.985.888.690.592.1
A76.778.580.179.881.986.888.9
A Clear 90.585.778.681.078.688.188.1
Bare Land A Cloudy 34.148.848.841.580.556.156.1
A62.767.563.961.579.572.372.3
Urban A Clear 100100100100100100100
Snow/Ice A Cloudy 100100100100100100100
A Clear 68.675.269.367.874.475.281.6
Global A Cloudy 78.680.783.983.587.189.489.4
A74.878.678.477.682.384.086.44
Table 7. Accuracy indicators A, A Clear and A Cloudy of classification methods of Section 3 when validation is made on the full dataset and classifications are separate for each surface type. Percentage values are reported for the entire dataset (row Global) and for the different surface types.
Table 7. Accuracy indicators A, A Clear and A Cloudy of classification methods of Section 3 when validation is made on the full dataset and classifications are separate for each surface type. Percentage values are reported for the entire dataset (row Global) and for the different surface types.
LDAQDAPCDAICDACDAANNKNN
A Clear 97.794.592.293.489.080.892.8
Water A Cloudy 68.171.475.775.489.096.293.2
A73.775.878.878.889.093.293.1
A Clear 99.797.395.595.788.890.795.0
Vegetation A Cloudy 48.474.481.380.788.891.990.1
A58.182.886.586.288.891.592.1
A Clear 98.997.897.397.282.799.594.5
Bare Land A Cloudy 62.176.278.279.482.771.389.5
A69.196.295.895.882.797.494.1
A Clear 99.696.996.396.387.780.394.7
Urban A Cloudy 50.678.080.781.187.792.087.8
A59.982.584.484.787.789.289.5
A Clear 99.992.492.691.086.586.991.8
Snow/Ice A Cloudy 32.874.174.484.686.589.194.4
A45.681.381.587.286.588.393.4
A Clear 99.197.095.695.986.792.394.5
Global A Cloudy 57.173.278.878.488.893.391.4
A64.882.685.585.387.992.992.7
Table 8. Accuracy indicators A, A Clear and A Cloudy of classification methods of Section 3 when validation is made on the gold standard dataset of Section 2.2 and classifications are separate for each climatic zone. Percentage values are reported for the entire dataset (row Global) and for the different surface types.
Table 8. Accuracy indicators A, A Clear and A Cloudy of classification methods of Section 3 when validation is made on the gold standard dataset of Section 2.2 and classifications are separate for each climatic zone. Percentage values are reported for the entire dataset (row Global) and for the different surface types.
LDAQDAPCDAICDACDAANNKNN
A Clear 90.073.063.561.573.070.591.0
Water A Cloudy 77.881.784.984.186.593.786.5
A85.376.471.870.378.279.489.3
A Clear 74.768.665.564.063.677.483.1
Vegetation A Cloudy 78.280.385.085.690.392.391.8
A77.277.079.579.582.788.189.4
A Clear 90.573.864.371.481.085.764.3
Bare Land A Cloudy 48.858.558.558.565.956.163.4
A69.966.361.565.173.571.163.9
Urban A Clear 100100100100100100100
Snow/Ice A Cloudy 100100100100100100100
A Clear 82.471.365.264.369.375.885.0
Global A Cloudy 76.879.683.884.188.590.789.6
A79.076.476.776.681.385.087.9
Table 9. Accuracy indicators A, A Clear and A Cloudy of classification methods of Section 3 when validation is made on the full dataset and classifications are separate for each climatic zone of Section 2.6. Percentage values are reported for the entire dataset (row Global) and for the different climatic zones.
Table 9. Accuracy indicators A, A Clear and A Cloudy of classification methods of Section 3 when validation is made on the full dataset and classifications are separate for each climatic zone of Section 2.6. Percentage values are reported for the entire dataset (row Global) and for the different climatic zones.
LDAQDAPCDAICDACDAANNKNN
A Clear 90.698.095.696.175.793.095.0
Tropical A Cloudy 80.370.681.378.376.089.193.6
A85.985.589.188.075.891.394.3
A Clear 97.496.694.296.487.188.193.7
Mid-Lat Summer NH A Cloudy 66.472.577.575.987.193.591.9
A75.579.682.481.987.191.992.4
A Clear 10098.998.698.694.995.398.3
Mid-Lat Summer SH A Cloudy 85.888.991.391.794.397.896.9
A85.989.091.391.894.397.896.9
A Clear 84.795.994.795.491.283.594.9
Mid-Lat Winter NH A Cloudy 92.086.788.390.591.397.795.4
A91.187.889.191.191.396.095.3
A Clear 17.299.199.098.898.280.993.8
Mid-Lat Winter SH A Cloudy 98.575.878.883.083.597.394.9
A92.277.680.384.284.696.094.8
A Clear 71.283.682.188.175.039.690.8
Hi-Lat Summer NH A Cloudy 88.977.480.081.482.896.189.5
A87.777.880.281.882.392.289.6
A Clear 087.777.496.189.078.797.4
Ice on sea NH A Cloudy 99.398.298.292.698.899.993.6
A96.297.997.692.798.599.393.7
A Clear 0000000
Ice on sea SH A Cloudy 99.999.799.197.799.999.997.1
A99.999.799.197.799.999.997.1
A Clear 19.026.024.135.020.025.974.6
Ice over land NH A Cloudy 96.896.697.095.799.099.295.1
A92.192.392.692.094.294.893.9
A Clear 0000000
Ice over land SH A Cloudy 10010010099.010010099.6
A10010010099.010010099.6
A Clear 35.442.142.454.135.322.477.9
Ice-over-elev-land NH A Cloudy 95.695.795.891.998.099.291.4
A82.283.883.983.584.182.188.4
A Clear 0000000
Ice-over-elev-land SH A Cloudy 100100100100100100100
A100100100100100100100
A Clear 91.497.495.096.078.691.194.6
Global A Cloudy 78.775.081.480.283.392.893.2
A83.783.986.886.481.592.193.7
Table 10. Accuracy indicators A, A Clear and A Cloudy of classification methods of Section 3 when validation is made on the gold standard dataset of Section 2.2 and classifications are separate for each climatic zone of Section 2.6. Percentage values are reported for the entire dataset (row Global) and for the different climatic zones.
Table 10. Accuracy indicators A, A Clear and A Cloudy of classification methods of Section 3 when validation is made on the gold standard dataset of Section 2.2 and classifications are separate for each climatic zone of Section 2.6. Percentage values are reported for the entire dataset (row Global) and for the different climatic zones.
LDAQDAPCDAICDACDAANNKNN
A Clear 64.886.268.068.488.772.578.9
Tropical A Cloudy 86.585.287.986.989.890.291.5
A79.185.681.280.689.484.287.2
A Clear 85.770.667.268.166.462.274.8
Mid-Lat Summer NH A Cloudy 59.061.468.766.983.184.383.7
A70.265.368.167.476.175.180.0
A Clear 86.487.584.175.083.040.975.0
Hi-Lat Summer NH A Cloudy 89.195.795.790.689.910089.9
A88.192.591.284.587.277.084.1
A Clear 0000000
Ice on sea NH A Cloudy 100100100100100100100
A76.576.576.576.576.576.576.5
A Clear 90.710010094.494.474.190.7
Ice over land NH A Cloudy 86.138.922.252.888.997.288.9
A88.975.668.977.892.283.390.0
A Clear 0000000
Ice-over-elev-land NH A Cloudy 10010010010075.0100100
A10010010010075.0100100
A Clear 75.683.673.471.782.464.377.9
Global A Cloudy 81.780.582.882.388.591.289.7
A79.481.779.378.386.281.085.3
Table 11. Accuracy of ANN when ancillary information is introduced as a fifth variable. Global accuracy is shown for the entire dataset (A) and separately for Clear ( A Clear ) and Cloudy ( A Cloudy ) scenes when no ancillary information are used, and surface type, climate and both are used.
Table 11. Accuracy of ANN when ancillary information is introduced as a fifth variable. Global accuracy is shown for the entire dataset (A) and separately for Clear ( A Clear ) and Cloudy ( A Cloudy ) scenes when no ancillary information are used, and surface type, climate and both are used.
A Clear A Cloudy A
Only reflectances91.292.391.9
With surface information92.192.092.1
With climatic information89.091.090.2
With both surface and climatic information92.092.092.0
Table 12. Percentage of semi-transparent clouds detected as Clear or Cloudy by classification methods in the three classification configurations (no ancillary information; with surface information; with climatic information). The total number of semi-transparent scenes in the gold standard dataset is 438.
Table 12. Percentage of semi-transparent clouds detected as Clear or Cloudy by classification methods in the three classification configurations (no ancillary information; with surface information; with climatic information). The total number of semi-transparent scenes in the gold standard dataset is 438.
No Ancillary InformationSurfaceClimate
ClearCloudyClearCloudyClearCloudy
LDA396177233466
QDA376339613565
PCDA316931693268
ICDA326830703268
CDA257522782278
ANN208018821783
KNN208020802080
Table 13. Global Consensus of PCDA, CDA, ANN, KNN (last row) and separately on different surface types when different separate classifications are performed for each surface type.
Table 13. Global Consensus of PCDA, CDA, ANN, KNN (last row) and separately on different surface types when different separate classifications are performed for each surface type.
ClearCloudyTotal
Water17.2660.7878.04
Vegetation35.3551.9287.27
Bare Land72.865.8578.71
Urban23.9461.5785.51
Snow/Ice34.3040.5174.81
Global35.5947.4983.08
Table 14. Global Consensus of PCDA, CDA, ANN, KNN (last row) and separately on different climatic zones when different separate classifications are performed for each climatic zone. Only zones with sample size of the training set larger than 1M scenes are shown.
Table 14. Global Consensus of PCDA, CDA, ANN, KNN (last row) and separately on different climatic zones when different separate classifications are performed for each climatic zone. Only zones with sample size of the training set larger than 1M scenes are shown.
ClearCloudyTotal
Tropical39.5433.9173.45
Mid-Latitude Summer NH27.1254.8481.96
Mid-Latitude Winter NH11.2677.3288.58
High-Latitude Summer NH5.7971.7577.54
Global30.5547.6178.16
Table 15. Deviation of each classification method from the consensus expressed by other ones in the case classification is made separately for each surface type. Figures report the fraction of times a method disagrees with the other three ones that agree on the sky status. The higher the figure, the higher deviation from the consensus. Methods showing the highest disagreement for each surface type and sky conditions are indicated as boldface.
Table 15. Deviation of each classification method from the consensus expressed by other ones in the case classification is made separately for each surface type. Figures report the fraction of times a method disagrees with the other three ones that agree on the sky status. The higher the figure, the higher deviation from the consensus. Methods showing the highest disagreement for each surface type and sky conditions are indicated as boldface.
SurfaceClear SkyCloudy SkyTotal
PCDACDAANNKNNPCDACDAANNKNNPCDACDAANNKNN
Water9.623.0284.422.9393.232.270.194.3173.412.4520.163.98
Vegetation1.8251.1719.7327.2784.671.040.9613.3352.4920.518.2518.75
Bare Land6.3075.75017.950.815.9391.831.436.0672.714.0017.23
Urban1.9012.0179.037.0687.773.271.667.3044.697.6640.487.18
Snow/Ice18.7226.4519.8934.9467.5421.214.197.0659.0922.126.9111.88
Global5.5752.7223.9617.7587.921.862.198.0350.3125.0912.1312.47
Table 16. Deviation of each classification method from the consensus expressed by other ones in the case classification is made separately for each climatic zone. Figures report the fraction of times a method disagrees with the other three ones that agree on the sky status. The higher the figure, the higher deviation from the consensus. Methods showing the highest disagreement for each climatic zone and sky conditions are indicated as boldface.
Table 16. Deviation of each classification method from the consensus expressed by other ones in the case classification is made separately for each climatic zone. Figures report the fraction of times a method disagrees with the other three ones that agree on the sky status. The higher the figure, the higher deviation from the consensus. Methods showing the highest disagreement for each climatic zone and sky conditions are indicated as boldface.
ClimaticClear skyCloudy skyTotal
zonePCDACDAANNKNNPCDACDAANNKNNPCDACDAANNKNN
Tropical3.2373.098.3415.3416.8273.104.325.766.2073.097.4613.25
Mid-Latitude
 Summer NH
9.4240.4741.049.0794.460.440.805.3040.4416.5917.036.82
Mid-Latitude
 Winter NH
3.9910.0777.898.0577.344.212.3416.1139.047.2741.7811.91
High-Latitude
 Summer NH
0.051.2491.037.6939.4210.020.2750.2919.885.6645.3129.15
Global3.9666.0116.9214.1052.2435.732.579.4519.5655.5612.2912.60

Share and Cite

MDPI and ACS Style

Amato, U.; Antoniadis, A.; Carfora, M.F. Cloud Detection: An Assessment Study from the ESA Round Robin Exercise for PROBA-V. Sensors 2020, 20, 2090. https://doi.org/10.3390/s20072090

AMA Style

Amato U, Antoniadis A, Carfora MF. Cloud Detection: An Assessment Study from the ESA Round Robin Exercise for PROBA-V. Sensors. 2020; 20(7):2090. https://doi.org/10.3390/s20072090

Chicago/Turabian Style

Amato, Umberto, Anestis Antoniadis, and Maria Francesca Carfora. 2020. "Cloud Detection: An Assessment Study from the ESA Round Robin Exercise for PROBA-V" Sensors 20, no. 7: 2090. https://doi.org/10.3390/s20072090

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop