Bayesian Cloud Detection over Land for Climate Data Records

Bulgin, Claire E.; Embury, Owen; Maidment, Ross I.; Merchant, Christopher J.

doi:10.3390/rs14092231

Open AccessArticle

Bayesian Cloud Detection over Land for Climate Data Records

¹

Department of Meteorology, University of Reading, Reading RG6 6AL, UK

²

National Centre for Earth Observation, Space Park Leicester, Leicester LE4 5SP, UK

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(9), 2231; https://doi.org/10.3390/rs14092231

Submission received: 14 March 2022 / Revised: 29 April 2022 / Accepted: 3 May 2022 / Published: 6 May 2022

(This article belongs to the Section Earth Observation Data)

Download

Browse Figures

Versions Notes

Abstract

:

Cloud detection is a necessary step in the generation of land surface temperature (LST) climate data records (CDRs) and affects data quality and uncertainty. We present here a sensor-independent Bayesian cloud detection algorithm and show that it is suitable for use in the production of LST CDRs. We evaluate the performance of the cloud detection with reference to two manually masked datasets for the Advanced Along-Track Scanning Radiometer (AATSR) and find a 7.9% increase in the hit rate and 4.9% decrease in the false alarm rate when compared to the operational cloud mask. We then apply the algorithm to four instruments aboard polar-orbiting satellites, which together can produce a global, 25-year LST CDR: the second Along-Track Scanning Radiometer (ATSR-2), AATSR, the Moderate Resolution Spectroradiometer (MODIS Terra) and the Sea and Land Surface Temperature Radiometer (SLSTR-A). The Bayesian cloud detection hit rate is assessed with respect to in situ ceilometer measurements for periods of overlap between sensors. The consistency of the hit rate is assessed between sensors, with mean differences in the cloud hit rate of 4.5% for ATSR-2 vs. AATSR, 4.9% for AATSR vs. MODIS, and 2.5% for MODIS vs. SLSTR-A. This is important because consistent cloud detection performance is needed for the observational stability of a CDR. The application of a sensor-independent cloud detection scheme in the production of CDRs is thus shown to be a viable approach to achieving LST observational stability over time.

Keywords:

Bayesian cloud detection; climate data records; land surface temperature; stability; ATSR; MODIS; SLSTR

1. Introduction

Satellite data are used to produce climate data records (CDRs) for 21 essential climate variables (ECVs) [1] within the European Space Agency (ESA) Climate Change Initiative (CCI) Program [2,3]. These data records of 15–40 years enable the assessment of long-term trends in geophysical variables that can be attributed to changes in the Earth’s climate, e.g., [4,5,6]. For many CDRs, cloud detection is an essential pre-processing step in data production [7,8,9,10]. Clouds and aerosols modify the Earth’s radiance as viewed from space at thermal infrared and reflectance wavelengths. As such, many CDRs require the identification of data that are adequately ‘clear’ for valid retrieval of surface or atmospheric properties [9,10]. The desired endpoint of this process is retrieval-dependent: surface property retrievals will typically discard observations not classified as clear-sky, while the retrieval of cloud or aerosol properties will necessarily require identification of clouds. “Clear-sky” in this paper is shorthand for “containing negligible cloud and aerosols for the purpose of valid LST retrieval”.

Long data records are essential for assessing climate changes [3] and CDRs are typically constructed using data from a series of satellite instruments, with a typical lifetime of 3–8 years. In order to identify long-term signals, data stability is essential [11,12], which includes consistency in the cloud detection applied to different sensors within a CDR and consistency in the retrieval methods.

Here we focus on cloud detection to retrieve land surface temperature (LST). Bayesian methods have been successfully applied to the production of sea surface temperature (SST) CDRs [7,13] from many different sensors, with very good results compared to threshold-based techniques [14]. Bayesian methods are, therefore, an ideal candidate for cloud detection over land, where identification of appropriate cloud detection thresholds for threshold-based techniques is further complicated by underlying heterogeneous land cover types.

In this paper, we present the Bayesian cloud detection formulation used for SST retrieval [7] adapted for LST retrievals. Cloud detection algorithms typically work on the premise that clouds are both bright and cold [15,16], enabling them to be distinguished from the underlying Earth surface at thermal infrared and reflectance wavelengths. This premise does not always hold (even over the ocean, which is typically dark and relatively warm [17]) and over land is further complicated by heterogeneous land cover, potentially significant diurnal cycles in surface temperature, and surfaces that are themselves both cold and bright, for example, snow- and ice-covered ground [18]. These particular challenges make a fully Bayesian approach to calculating a probability of clear-sky well-suited to the problem, because of the dynamic aspect of the cloud detection which exploits prior information that accounts for spatiotemporal variability [19]. The Bayesian formulation is sensor-independent and directly accounts for the differences in instrumental spectral response functions through radiative transfer simulations. It can easily be applied to multiple sensors used to create a long-term CDR.

The remainder of this paper proceeds as follows: in Section 2, we describe the Bayesian cloud-detection algorithm and the algorithmic developments relevant to cloud screening over land. In Section 3, we evaluate the cloud detection performance with respect to manually masked satellite imagery using data from the Advanced Along-Track Scanning Radiometer. In Section 4, we discuss the application of Bayesian cloud detection to data from other instruments and the benefits of this approach to the generation of CDRs using data from multiple sensors. We conclude the paper in Section 5.

2. Bayesian Cloud Detection over Land

The Bayesian approach to cloud detection as applied to SST retrieval has been widely documented [7,13,14,17,19], so we present here only a brief overview. This overview covers the main points relevant to understanding the evolutions of the algorithm for use over land, which is the subject of this paper.

Bayes’ theorem, as applied to the cloud detection problem, can be used to calculate the probability that an observation corresponds to clear-sky (

P (c | y^{o}, x^{b})

), given knowledge of the observation vector (

y^{o}

) and the prior background state (

x^{b}

) as shown in Equation (1).

P (c | y^{o}, x^{b}) = {[1 + \frac{P (\bar{c}) P (y^{o} | x^{b}, \bar{c})}{P (c) P (y^{o} | x^{b}, c)}]}^{- 1}

(1)

The observation vector contains the satellite observations for all channels used in cloud detection. At night, this includes data from infrared wavelengths only, typically centered on 3.7, 11, and 12 μm (where these channels are available for a given instrument). During the day, reflectance channels can also be used, for example, wavelengths of 0.6, 0.8 and 1.6 μm, in addition to the 11 and 12 μm infrared wavelengths. The background state vector contains information on the prior surface and atmospheric conditions specified by the ERA-5 numerical weather prediction (NWP) analysis fields [20].

P (\bar{c})

and

P (c)

are the prior probabilities of not-clear and clear-sky conditions respectively. The ERA-5 NWP cloud fraction informs the prior probabilities. However, the cloud fraction is only a proxy for probability, and the probability of clear-sky is constrained to the range of 0.5–0.95, to ensure that the Bayes’ calculation is not fully pre-conditioned by the prior [14]. To put it another way, the probability of there being a cloud in an observation when the NWP cloud fraction is zero may well be low, but it is not zero.

P (y^{o} | x^{b}, c)

is the probability of the observations given the background state vector under clear-sky conditions. It is calculated from the difference between observations and the prior clear-sky simulations of the fast forward model RTTOV [21], in light of the expected uncertainties. For cloudy conditions,

P (y^{o} | x^{b}, \bar{c})

is determined using an empirical look-up table, developed as a pragmatic alternative to more computationally expensive cloudy-sky simulations [22].

The remainder of this section details the evolutions and settings particular to optimizing performance over land. All modifications are made with a view to producing multi-sensor CDRs. Sensor-specific dependencies are limited to the spectral response functions (accounted for in the radiative transfer simulations) and the uncertainty assumptions.

2.1. Radiative Transfer and Numerical Weather Prediction (NWP) Data

Radiative transfer simulations of clear-sky conditions use version 12.3 of the fast-forward model RTTOV [21], with the latest RTTOV coefficients (LBLRTM v12.8, v9 predictors) [23]. RTTOV is used to simulate observations at both reflectance and infrared wavelengths, with prior surface and atmospheric conditions constrained by the closest NWP analysis data from the hourly ERA-5 products [20]. For channels with solar reflectance (visible and near-infrared), a Rayleigh single-scattering approximation is used [24].

The radiative transfer model is run at atmospheric profile locations commensurate with the resolution of the NWP data (~30 km), with the outputs then bilinearly interpolated to the higher resolution of the satellite imagery (typically ~1 km at nadir). Higher-resolution surface data are used to specify the local conditions for a given satellite observation (Section 2.4).

2.2. Uncertainty Specification

The radiative transfer model outputs are used in the following equation to calculate the probability of observations given the background state vector for clear-sky conditions.

P (y_{s}^{o} | x^{b}, c) = \frac{e^{(- \frac{1}{2} Δ y^{t} {(H^{T} B H + R)}^{- 1} Δ y)}}{2 π {| H^{T} B H + R |}^{0.5}}

(2)

Uncertainties in the background state are propagated through the model by the

H^{T} B H + R

term.

B

contains the background error covariance matrix, calculated for a reduced state vector: surface temperature, total column water vapour (TCWV), and aerosol optical depth (AOD, discussed further in Section 2.6). The uncertainty in the surface temperature is given by the spread in the ERA5 skin temperature ensemble. TCWV uncertainty is described by an exponentially decaying curve from 45% to 5% over a range of 0–65 kg m^−2.. Tropospheric dust aerosol uses an uncertainty of 10% and stratospheric sulphate an uncertainty of 32%. Off-diagonal terms of the

B

matrix are set to zero. The

H

matrix contains the forward model tangent linears for each satellite observation channel used in the cloud detection and each element of the reduced state vector. The

R

matrix contains two components:

R^{m}

, which characterizes the error in the forward model and

R^{o}

, which characterizes the error in the satellite observations.

R^{o}

contains the noise equivalent differential temperature (NEdT) for each satellite observation used in the cloud detection. The model uncertainty,

R^{m}

, is of order 0.15–0.17 K for the infrared channels, 9% for the 1.6 and 0.8 μm reflectance wavelengths and 8% for the 0.6 μm wavelength. The reflectance wavelength uncertainties were calculated by assessing observation minus simulation differences for clear-sky MODIS scenes.

2.3. Gaussian Mixed Model

Over land, we found that the ERA5 NWP reanalysis data does not fully capture the extent of the diurnal warming over surfaces such as deserts. Our clear-sky PDFs as simulated using RTTOV have a Gaussian distribution and as such, cases where the observations were much warmer than the simulation (due to an underestimated prior surface temperature) could fall outside of the clear-sky PDF and be erroneously flagged as cloud. To address this, we use a Gaussian mixture model to extend the PDF on the warm side. This calculates a second Gaussian, centered on the sum of the prior surface temperature and the associated uncertainty. This second Gaussian has a sigma twice as large as the uncertainty on the original Gaussian. Adding the two Gaussians, as illustrated in Figure 1 (in a toy example), extends the clear-sky PDF on the warm side whilst having a small impact on the cold side of the original Gaussian distribution. This makes the Bayesian cloud detection more robust to the underestimation of the diurnal temperature cycle in the ERA5 data.

2.4. Surface Characterisation

Surface emissivity at the satellite image pixel location is specified using the combined ASTER MODIS Emissivity over Land (CAMEL) v002 monthly climatology compiled from observations made between 2000–2016 [25], which is specifically designed to be integrated with RTTOV v12.3 simulations. It has a spatial resolution of 0.05 × 0.05°, with an improved representation of fractional snow cover compared with v001 [25]. Surface reflectance is defined using the bidirectional reflectance distribution function (BRDF) atlas developed for RTTOV v11 [26]. The atlas is at 0.1° resolution, covering wavelengths of 0.4–2.5 μm and derived from the MODIS BRDF-kernel product [26].

Surface elevation is specified using the ASTER Global Digital Elevation Model (DEM) [27]. The native resolution of the DEM is 1 arc-second in latitude and longitude (~30 m) and this has been regridded globally to a resolution of 1/125 degrees (~900 m), commensurate with the resolution of observations from many polar-orbiting satellites used for the generation of LST CDRs (e.g., ATSR-2, AATSR, MODIS and SLSTR).

2.5. Cloudy Probability Distribution Functions (PDFs)

Empirical probability density function (PDF) look-up tables (LUTs) are used to specify

P (y^{o} | x^{b}, \bar{c})

, the probability of cloud (not clear) given the observations and background state vector. For cloud detection over land, new sets of PDF LUTs were defined using 11 years of data from the MODIS Terra instrument aboard the Earth Observing System (EOS). MODIS Terra is a good choice for deriving LUTs due to the long data record, a wide range of available channels, and an equator overpass time of 10:30, similar to other instrument series (e.g., ATSRs, SLSTRs). Thus, a single set of LUTs can be used to apply Bayesian cloud detection to a wide range of sensors.

Table 1 shows the three PDF LUTs used as applied to the ATSR-2, AATSR, MODIS and SLSTR sensors in this study. Infrared and reflectance channels are treated independently, ensuring an adequate population of the PDF observation space. There are two infrared channel PDFs, as the 3.7 μm channel is not used during the day due to solar contamination. The PDF LUTs are multidimensional, binned by observations, viewing geometry and surface characteristics. Over land, the radiative response of the Earth’s surface will be dependent on the land cover and as such, the PDFs are binned by biome, using the land cover classification from the European Space Agency (ESA) LST Climate Change Initiative (CCI). Land cover is stratified using the ESA Land Cover CCI classification [28], coupled with extended differentiation of bare soil types using the ATSR Land Biome Classification [29]. Table 2 summarizes which of the classes from each classification fall within the condensed classification used to generate the PDF LUTs.

Figure 2 demonstrates the biome dependence of the empirical PDFs. It shows the difference in the 11.0 μm minus NWP surface temperature against the 11.0–12.0 μm channels for daytime and near-nadir observations. The shape of the PDF is biome-dependent, with larger differences in the 11.0–12.0 μm channels for urban areas, cropland, and flooded ground than over bare soil and snow-and-ice surfaces. The largest positive differences in the 11.0 μm minus NWP surface temperature dimension are seen for snow and ice, bare soil and shrubland. As viewed from this slice, the cropland, flooded ground and urban PDFs were the most compact, the highest densities occurred where the 11.0–12.0 μm differences were close to zero and the 11.0 μm minus NWP surface temperatures were slightly negative.

Empirical PDFs derived from MODIS data can be used when processing other sensors by applying a spectral shift to the observations prior to the look-up in order to make them ‘look like’ MODIS [14]. The spectral shifts are calculated using RTTOV 137-level NWP profile data, including two sets of profiles: 5000 diverse in temperature and 5000 diverse in specific humidity [30]. Of the 10,000 profiles in these two datasets, approximately 2000 are located over land, and these are used to calculate the shifts. Each shift is piecewise linear in total column water vapour, with a dependency on atmospheric path length. Shifts are provided for each channel, with reflectance wavelength shifts having an additional dependence on the solar zenith angle. For infrared wavelengths, the shift is provided in the form of an offset, and for reflectance wavelengths, a scale factor. These spectral shifts are stored in a look-up table and can be applied prior to indexing the empirical PDFs.

2.6. Aerosol Characterisation

Aerosol information comes from two sources: tropospheric aerosol from the Copernicus Atmosphere Monitoring Service (CAMS) reanalysis [31] and the stratospheric sulphate aerosol dataset used in ESA SST CCI [7] for periods affected by major volcanic eruptions (1982–1984 and 1991–1993). The CAMS aerosol climatology [32,33] provides monthly mean profiles of aerosol mass mixing ratios for sea salt (three size bins), mineral dust (three size bins), organic matter and black carbon. These are interpolated to the location of the NWP analysis fields and the full profile is included in the background state vector passed to RTTOV, which includes support for all CAMS aerosol species. Total column dust mass and stratospheric sulphate are included in the reduced state vector, so their uncertainties (10% and 32%, respectively) will affect the shape of the clear-sky PDF (see Section 2.2).

When processing data that overlap the full CAMS analysis (2003 onwards), it is also possible to download the daily column integrated masses and scale the climatological aerosol profiles to match the daily data. This is currently done for the three mineral dust components, while other components use the climatological values.

3. Results

The performance of the Bayesian cloud mask can be assessed with reference to a manual cloud mask generated by expert inspection of satellite imagery. Two manually masked datasets are currently available for Advanced Along-Track Scanning Radiometer (AATSR) imagery. The first was generated as part of the SYNERGY project, which developed a cloud mask scheme using AATSR data in conjunction with observations from the Medium Resolution Imaging Spectrometer (MERIS) aboard the same satellite platform [34,35]. The dataset consists of 21 daytime scenes between 2002–2007 over five different regions: Abracos Hill (Brazil), Cart Site (North America), Ouagadougou (Burkino Faso), Mongu (Zambia) and Tomsk (Russia) [19]. The second dataset was developed within the European Space Agency (ESA) Data User Element (DUE) GlobTemperature project as part of a cloud-detection round-robin exercise [36]. This dataset consists of 10 scenes between 2007 and 2011, including both day and nighttime data, one scene over ice in Antarctica and a dust-affected case over Algeria. The details of the 31 scenes used to evaluate the cloud mask performance are given in Table 3, summarized from more detailed information provided in [19,36]. Collectively these datasets are designed to cover a wide range of atmospheric and surface conditions, different solar illuminations and differing cloud types.

Data from these 31 scenes were used to calculate performance metrics [20,32] for the Bayesian cloud detection in comparison with the operational Standard ATSR Cloud Detection (SADIST) cloud mask [37,38], which used a series of threshold tests to determine the presence of cloud-affected observations. The performance metrics are as follows: (1) the percentage of perfect classification (the percentage of pixels across the whole scene that are correctly classified), (2) the hit rate (the percentage of cloudy pixels that are correctly masked), (3) the false alarm rate (the percentage of clear-sky pixels falsely flagged as cloud) and (4) the true skill score (hit rate minus the false alarm rate). The results given in Table 4 show that the Bayesian cloud detection consistently improves on the operational cloud mask across all four metrics. The percentage of perfect classification across all scenes is 86.6% for the Bayesian mask compared with 81.0% for the operational mask. The hit rate is significantly higher for the Bayesian mask (93.9% compared with 86.0%), whilst the false alarm rate is lower (16.7% compared with 21.6%). As a result of the higher hit rate and lower false alarm rate, the Bayesian true skill score is 77.2%, compared with 64.7% for the operational mask.

Table 5 provides the confusion matrix for the Bayesian and operational cloud masks across the 31 scenes. The performance metrics were calculated using 4,958,105 observations with 1,530,831 cloudy pixels and 3,427,274 clear-sky pixels. The numbers in bold are the observations correctly classified by each algorithm, whilst the off-diagonal terms are the number of misclassified observations. Despite the larger number of clear-sky observations compared to cloud, which in terms of performance metrics would typically benefit an algorithm with a tendency to under-flag (missing some cloud, but with a low false alarm rate), the strength of the Bayesian mask is seen in the larger numbers of correctly classified pixels in both classes—cloud and clear-sky (total 4,294,347)—compared with the operational mask (4,014,623).

Any cloud detection algorithm may produce a poor cloud mask for particular scenes, but there are also characteristic differences between cloud detection algorithms, repeated across many images, that underly the overall performance metrics, particularly with respect to the false alarm rate. The examples in Figure 3 typify the performance of the Bayesian and operational cloud masks relative to the manually masked datasets. Here, we see the manual mask on the left of the image, the Bayesian mask in the centre and the operational mask on the right. Pixels masked as cloud are shown in white, clear-sky-over-land in green and water pixels (where the masks are not compared) are shown in blue.

The Bayesian cloud mask is often (but not exclusively) more conservative with respect to the edges of cloud features than the manual mask, extending the footprint of the cloud. This is true for both datasets, manually masked using different systems and by different experts. This will contribute to some false flagging of clear-sky pixels. In the operational cloud mask (Figure 3, right), there is a tendency to mistakenly flag surface features as clouds, as seen in the centre and on the left side of the image. This contributes in a large part to the higher false alarm rate in the operational cloud mask when compared to the Bayesian mask. The differences in hit rates arise from the operational mask missing some cloud features and underestimating the cloud extent for others.

4. Application to CDR Generation

The applicability of the Bayesian cloud detection to many sensors is advantageous in the generation of Climate Data Records (CDRs). Since sensor differences are represented by the forward model and a few uncertainty parameters, the Bayesian framework is not dependent on specific instrument characteristics, equator overpass time or channel availability at specific wavelengths, meaning that it can be readily applied to data from different sensors. This is often required in the generation of CDRs spanning 20–30+ years, constructed using observations from multiple sensors.

For CDR generation, a similar cloud mask performance over time and between sensors is needed for observational stability, i.e., to avoid apparent changes in LST statistics arising in reality from cloud masking discontinuities. Does consistency in the cloud masking technique (Bayesian) across sensors give consistent cloud mask performance?

We investigated the consistency of the Bayesian cloud detection algorithm across multiple sensors by considering an LST CDR constructed using data from the second Along-Track Scanning Radiometer (ATSR-2), AATSR, the Moderate Resolution Infrared Spectroradiometer (MODIS Terra) and the Sea and Land Surface Temperature Radiometer (SLSTR-A). These sensors are aboard polar-orbiting satellites with equator overpass times of 10:00 (AATSR and SLSTR) or 10:30 (ATSR-2 and MODIS) local time. Each sensor pair in this CDR (ATSR-2 and AATSR, AATSR and MODIS, MODIS and SLSTR) has some overlap, and in this overlap period we compared the cloud detection hit rate for each sensor using the Bayesian algorithm.

The hit rate was calculated with reference to a series of in situ ground stations with ceilometers, which measure cloud base height. The ceilometer locations are globally distributed, as shown in Table 6. Each ceilometer record has a different length, with three sites (North Slope, Ny Alesund, and Southern Great Plains [39,40,41]) covering all three satellite-overlap periods and the remainder covering only a subset of the overlaps. The satellite data were matched to the ceilometer observations with a spatial separation of 1 km and within a time window of 5 min. The temporal difference in the match-up is related to the frequency of the in situ observations. The mean and median time differences were typically within 30 s for all ground stations, with the exception of Ny Alesund and Sabana, where the in situ observations are less frequent. The ceilometer observation is matched to the satellite pixel in which the cloud base is observed, taking into account the cloud base height and viewing geometry of the sensor. The satellite zenith angle in the possible matches is limited to +/−22 degrees to ensure consistency in comparison between the narrow-swath ATSR sensors and the MODIS and SLSTR instruments. These matches were then used to calculate the cloud hit rate.

The hit rates for the sensor overlap periods are compared in Table 7, with the corresponding number of matches from which the hit rate was calculated shown in Table 8. For ATSR-2 and AATSR the overlap period is from 09/2002–02/2003, for AATSR and MODIS, 02/2011–03/2012 and for MODIS and SLSTR 06/2016–07/2017. The full overlap period is used for each location, where ceilometer data are available. The latter two comparison periods typically have more matches, as the overlap period between sensors is longer. The number of matches in Puerto Rico was small due to the relative infrequency of both the in situ observations and satellite overpasses (the location is closer to the equator than many of the other ground stations) in addition to the in situ data record finishing in 12/2016, partway through the overlap period.

The hit rates were assessed across all sensor pairs, over a wide range of geographical locations and surface biomes, and at different points in the CDR time series. The mean difference in hit rate is 4.5% between ATSR-2 and AATSR, 4.9% between AATSR and MODIS, and 2.5% between MODIS and SLSTR, the latter comparison benefitting from a greater number of available satellite to in situ matches. In comparison, the mean difference in the hit rate for the operational cloud mask between ATSR-2 and AATSR was greater at 12.1% (not shown), suggesting a greater difficulty in obtaining threshold-based algorithm consistency across different satellite sensors (this is the only overlapping pair sharing the same operational cloud-mask formulation). Differences in the absolute hit rate for both cloud masks did occur between sites (ranging between 70–100% for the Bayesian and 33.3–100% for the operational), reflecting some remaining challenges in optimising cloud detection across all land surfaces.

The largest inter-sensor differences for a single match-up location occurred for the North Slope of Alaska, as seen for the overlap between AATSR and MODIS. Here, we see a step-change in the performance of the cloud detection between the two ATSR instruments and the other sensors. From these data alone, it was difficult to determine the exact reason for the magnitude of the difference at this specific location, but some variation was expected as the hit rate was calculated from an independent set of matches for each sensor, and there was a 30 min time difference in the equator overpass time for consecutive sensors in the time series.

Figure 4 compares the hit rate difference across all sensor pairs and locations for the Bayesian and operational masks. All of the operational masks used threshold-based techniques: the SADIST mask [37,38] is used for ATSR-2 and AATSR, the MOD35_L2 products for MODIS Terra [46] and the summary cloud mask for SLSTR-A [47] (which is an evolution of the SADIST mask applied to the ATSR instruments). The median percentage difference was 2.7% for the Bayesian cloud detection compared with 10.3% for the operational masks. The interquartile range was smaller for the Bayesian mask (3.9% compared with 10.95% for the operational masks), with the 75th percentile of the Bayesian hit rate comparison (5.1%) falling below the 25th percentile for the operational masks (6.7%). The smaller percentage differences in the hit rate between overlapping sensors and the reduced data spread in the Bayesian mask comparison suggest that applying this cloud mask to all sensors in a CDR will give better observational stability than using the operational cloud masks for each sensor.

5. Conclusions

In this paper we present a Bayesian cloud detection scheme as applied to satellite observations over land for the purpose of generating LST CDRs. We assessed the Bayesian algorithm using two manually masked datasets for the AATSR sensor in comparison with the operational threshold-based algorithm. The Bayesian cloud detection hit rate was 7.9% greater, and the false alarm rate was 4.9% smaller than the operational cloud mask. This mean improvement in cloud detection skill was found across a wide range of conditions present in the validation dataset.

We also considered the suitability of the Bayesian algorithm for application to multiple satellite sensors when generating climate data records. The cloud detection hit rates between the sensors in overlapping time periods were consistent on average to about 5% using the Bayesian algorithm, compared to 22.5% for the operational cloud masks. Observational stability between sensors is important to avoid apparent changes in any CDR variable (not just LST) that results from the discontinuities in data processing over time, including cloud-masking. The assessment of cloud mask consistency between sensors as presented in this paper would be recommended when producing CDRs dependent on cloud masking as a pre-processing step. This could be achieved by using in situ ceilometer data for validation as presented here, or by using other forms of data as a reference, e.g., manually masked scenes, or sets of pre-classified pixels.

Some further work remains to ensure uniformity of cloud mask performance across different geographical locations and land surface types, as the absolute hit rate is variable between the validation sites (this is true of both the Bayesian and operational cloud masks). The Bayesian cloud mask performance can be increased over cold and/or bright surfaces, including snow, ice and deserts. More work on cloud detection in these regions is required to provide global consistency in cloud-masking skill.

Author Contributions

Author contributions to this article are as follows: Conceptualization, C.E.B. and C.J.M.; methodology, C.E.B., C.J.M. and O.E.; software, C.E.B. and O.E.; validation, C.E.B.; formal analysis, C.E.B.; investigation, C.E.B.; data curation, R.I.M.; writing—original draft preparation, C.E.B. and O.E.; writing—review and editing, C.E.B., C.J.M., O.E. and R.I.M.; project administration, C.E.B.; funding acquisition, C.E.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the European Space Agency within the framework of the Land Surface Temperature project under the Climate Change Initiative, grant number 4000123553/18/I-NB.

Data Availability Statement

The satellite cloud masks for matches to in situ cloudy ceilometer observations are available upon request (ceilometer data are not redistributed). 10.5281/zenodo.6351705.

Acknowledgments

Ceilometer data were provided courtesy of the Meteorology Department at the University of Reading. Bingley and Coleshill ceilometer data were provided courtesy of the Met Office, UK.

Conflicts of Interest

The authors declare no conflict of interest.

References

World Meteorological Organisation. The Global Observing System for Climate: Implementation Needs. GCOS-200. 2016. Available online: https://library.wmo.int/doc_num.php?explnum_id=3417 (accessed on 13 March 2022).
Plummer, S.; Lecomte, P.; Doherty, M. The ESA Climate Change Initiative (CCI): A European contribution to the generation of the Global Climate Observing System. Remote Sens. Environ. 2017, 203, 2–8. [Google Scholar] [CrossRef]
Hollmann, R.; Merchant, C.J.; Saunders, R.; Downy, C.; Buchwitz, M.; Cazenave, A.; Chuvieco, E.; Defourny, P.; de Leeuw, G.; Forsberg, R.; et al. THE ESA CLIMATE CHANGE INTIATIVE Satellite Data Records for Essential Climate Variables. Bull. Am. Meteorol. Soc. 2013, 94, 1541–2663. [Google Scholar] [CrossRef] [Green Version]
Legeais, J.F.; Ablain, M.; Zawadzki, L.; Zuo, H.; Johannessen, J.A.; Scharffenberg, M.G.; Fenoglio-Marc, L.; Fernandes, M.J.; Andersen, O.B.; Rudenko, S.; et al. An improved and homogeneous altimeter sea level record from the ESA Climate Change Initiative. Earth Syst. Sci. Data 2018, 10, 281–301. [Google Scholar] [CrossRef] [Green Version]
Mousivand, A.; Arsanjani, J.J. Insights on the historical and emerging global land cover changes: The case of ESA-CCI-LC datasets. Appl. Geogr. 2019, 106, 82–92. [Google Scholar] [CrossRef]
Popp, T.; de Leeuw, G.; Bingen, C.; Bruhl, C.; Capelle, V.; Chedin, A.; Clarisse, L.; Dubovik, O.; Grainger, R.; Heckel, A.; et al. Development, Production and Evaluation of Aerosol Climate Data Records from European Satellite Observations (Aerosol_cci). Remote Sens. 2016, 8, 421. [Google Scholar] [CrossRef] [Green Version]
Merchant, C.J.; Embury, O.; Bulgin, C.E.; Block, T.; Corlett, G.K.; Fiedler, E.; Good, S.A.; Mittaz, J.; Rayner, N.A.; Berry, D.; et al. Satellite-based time-series of sea-surface temperature since 1981 for climate applications. Sci. Data 2019, 6, 223. [Google Scholar] [CrossRef] [Green Version]
Sogacheva, L.; Kolmonen, P.; Virtanen, T.H.; Rodriguez, E.; Saponaro, G.; de Leeuw, G. Post-processing to remove residual clouds from aerosol optical depth retrieved using the Advanced Along Track Scanning Radiometer. Atmos. Meas. Tech. 2017, 10, 491–505. [Google Scholar] [CrossRef] [Green Version]
Lattanzio, A.; Fell, F.; Bennartz, R.; Trigo, I.F.; Schulz, J. Quality assessment and improvement of the EUMETSAT Meteosat Surface Albedo Climate Data Record. Atmos. Meas. Tech. 2015, 8, 4561–4571. [Google Scholar] [CrossRef] [Green Version]
Wilson, A.M.; Parmentier, B.; Jetz, W. Systematic land cover bias in Collection 5 MODIS cloud mask and derived products—A global overview. Remote Sens. Environ. 2014, 141, 149–154. [Google Scholar] [CrossRef]
Preimmesberger, W.; Scanlin, T.; Su, C.H.; Gruber, A.; Dorigo, W. Homogenization of Structural Breaks in the Global ESA CCI Soil Moisture Multisatellite Climate Data Record. IEEE Trans. Geosci. Remote Sens. 2021, 59, 2845–2862. [Google Scholar] [CrossRef]
Melin, F.; Vantrepotte, V.; CChuprin, A.; Grant, M.; Jackson, T.; Sathyendranath, S. Assessing the fitness-for-purpose of satellite multi-mission ocean color climate data record: A protocol to OC-CCI chlorophyll-a data. Remotes Sens. Environ. 2017, 203, 139–151. [Google Scholar] [CrossRef] [PubMed]
Merchant, C.J.; Harris, A.R.; Maturi, E.; MacCallum, S. Probabilistic physically-based cloud screening of satellite infra-red imagery for operational sea surface temperature retrieval. Q. J. R. Meteorol. Soc. 2005, 131, 2735–2755. [Google Scholar] [CrossRef] [Green Version]
Bulgin, C.E.; Mittaz, J.P.D.; Embury, O.; Eastwood, S.; Merchant, C.J. Bayesian Cloud Detection for 37 Years of Advanced Very High Resolution Radiometer (AVHRR) Global Area Coverage (GAC) Data. Remote Sens. 2018, 10, 97. [Google Scholar] [CrossRef] [Green Version]
Zavody, A.M.; Mutlow, C.T.; Llewellyn-Jones, D. Cloud Clearing over the Ocean in the Processing of Data from the Along-Track Scanning Radiometer (ATSR). J. Atmos. Ocean. Technol. 2000, 17, 595–615. [Google Scholar] [CrossRef]
Frey, R.A.; Ackerman, S.A.; Liu, Y.; Strabala, K.I.; Zhang, H.; Key, J.R.; Wang, Z. Cloud Detection with MODIS. Part I: Improvements in the MODIS Cloud Mask for Collection 5. J. Atmos. Ocean. Tech. 2008, 25, 1057–1072. [Google Scholar] [CrossRef]
Bulgin, C.E.; Eastwood, S.; Embury, O.; Merchant, C.J.; Donlon, C. Sea surface temperature climate change initiative: Alternative image classification algorithms for sea-ice affected oceans. Remote Sens. Environ. 2015, 162, 396–407. [Google Scholar] [CrossRef]
Simpson, J.J.; Gobat, J.I. Improved cloud detection for daytime AVHRR scenes over land. Remote Sens. Environ. 1996, 55, 21–49. [Google Scholar] [CrossRef]
Bulgin, C.E.; Sembhi, H.; Ghent, D.; Remedios, J.J.; Merchant, C.J. Cloud-clearing techniques over land for land-surface temperature retrieval from the Advanced Along-Track Scanning Radiometer. Int. J. Remote Sens. 2014, 35, 3594–3615. [Google Scholar] [CrossRef]
Herbach, H.; Bell, B.; Berrisford, P.; Biavati, G.; Horányi, A.; Muñoz Sabater, J.; Nicolas, J.; Peubey, C.; Radu, R.; Rozum, I.; et al. ERA5 hourly data on single levels from 1979 to present. Copernic. Clim. Chang. Serv. (C3S) Clim. Data Store (CDS) 2018. [Google Scholar] [CrossRef]
Saunders, R.; Hocking, J.; Turner, E.; Rayer, P.; Rundle, D.; Brunel, P.; Vidot, J.; Roquet, P.; Matricardi, M.; Geer, A.; et al. An update on the RTTOV fast radiative transfer model (currently at version 12). Geosci. Model Dev. 2018, 11, 2717–2737. [Google Scholar] [CrossRef] [Green Version]
Mackie, S.; Embury, O.; Old, C.; Merchant, C.J.; Francis, P. Generalized Bayesian Cloud Detection for Satellite Imagery: Part 1: Technique and Validation for Night-Time Imagery Over Land and Sea. Int. J. Remote Sens. 2010, 31, 2573–2594. [Google Scholar] [CrossRef]
ECMWF. RTTOV v12 Coefficient File Downloads. EUMETSAT NWP SAF. 2021. Available online: https://nwp-saf.eumetsat.int/site/software/rttov/download/coefficients/coefficient-download/ (accessed on 20 May 2019).
Hocking, J.; Rayer, P.; Rundle, D.; Saunders, R.; Matricardi, M.; Geer, A.; Brunel, P.; Vidot, J. RTTOV v12 Users Guide. NWP SAF. 2019. NWPSAF-MO-UD-037. Available online: https://nwp-saf.eumetsat.int/site/download/documentation/rtm/docs_rttov12/users_guide_rttov12_v1.3.pdf (accessed on 20 May 2019).
Borbas, E.; Feltz, M. The RTTOV CAMEL Module: Updating the CAMEL Surface Emissivity Atlas for RTTOV. EUMETSAT NWP SAF NWP-MO-VS-058. 2019. Available online: https://nwp-saf.eumetsat.int/publications/vs_reports/nwpsaf-mo-vs-058.pdf (accessed on 20 May 2019).
Vidot, J.; Borbas, E. Land surface VIS/NIR BRDF atlas for RTTOV-11: Model and Validation against SEVIRI Land SAF Albedo product. Q. J. R. Meteorol. Soc. 2013, 140, 2186–2196. [Google Scholar] [CrossRef]
Abrams, M.; Crippen, R.; Fujisada, H. ASTER Global Digital Elevation Model (GDEM) and ASTER Global Water Body Dataset (ASTWBD). Remote Sens. 2020, 12, 1156. [Google Scholar] [CrossRef] [Green Version]
Santoro, M.; Kirches, G.; Wevers, J.; Boettcher, M.; Brockmann, C.; Lamarche, C.; Bontemps, S. Land Cover CCI, Product User Guide Version 2.0. CCI-LC-PUGv2. 2017. Available online: https://www.esa-landcover-cci.org/?q=webfm_send/84 (accessed on 20 May 2019).
Ghent, D.J.; Corlett, G.K.; Göttsche, F.-M.; Remedios, J.J. Global land surface temperatures from the Along-Track Scanning Radiometers. J. Geophys. Res. Atmos. 2017, 122, 12167–12193. [Google Scholar] [CrossRef] [Green Version]
Eresmaa, R.; McNally, A.P. Diverse profile datasets from the ECMWF 137-level short-range forecasts. NWP SAF NWPSAF-EC-TR-017. 2014. Available online: https://nwpsaf.eu/site/download/documentation/rtm/nwpsaf-ec-tr-017.pdf (accessed on 31 March 2021).
Inness, A.; Ades, M.; Agustí-Panareda, A.; Barré, J.; Benedictow, A.; Blechschmidt, A.-M.; Dominguez, J.J.; Engelen, R.; Eskes, H.; Flemming, J.; et al. The CAMS reanalysis of atmospheric composition. Atmos. Chem. Phys. 2019, 19, 3515–3556. [Google Scholar] [CrossRef] [Green Version]
Bozzo, A.; Benedetti, A.; Flemming, J.; Kipling, Z.; Rémy, S. CAMS Monthly Mean Aerosol Climatology for Global Models. Available online: https://confluence.ecmwf.int/display/CKB/CAMS+Monthly+Mean+Aerosol+Climatology+for+global+models (accessed on 31 March 2021). [CrossRef]
Bozzo, A.; Benedetti, A.; Flemming, J.; Kipling, Z.; Rémy, S. An aerosol climatology for global models based on the tropospheric aerosol scheme in the Integrated Forecasting System of ECMWF. Geosci. Model Dev. 2020, 13, 1007–1034. [Google Scholar] [CrossRef]
Gomez-Chova, L.; Camps-Valls, G.; Calpe-Maravilla, J.; Guanter, L.; Moreno, J. Cloud-Screening Algorithm for ENVISAT/MERIS Multispectral Images. IEEE Trans. Geosci. Remote Sens. 2007, 45, 4105–4118. [Google Scholar] [CrossRef]
Gomez-Chova, L.; Munnoz-Mari, J.; Izquierdo-Verdiguier, E.; Camps-Valls, G.; Calpe, J.; Moreno, J. Cloud Screening with Combined MERIS and AATSR Images. In Proceedings of the 2009 IEEE International Geoscience and Remote Sensing Symposium, Cape Town, South Africa, 12–17 July 2009; p. 761. [Google Scholar] [CrossRef]
Bulgin, C.E.; Merchant, C.J.; Ghent, D.; Klüser, L.; Popp, T.; Poulsen, C.; Sogacheva, L. Quantifying Uncertainty in Satellite-Retrieved Land Surface Temperature from Cloud Detection Errors. Remote Sens. 2018, 10, 616. [Google Scholar] [CrossRef] [Green Version]
Birks, A.R. Improvements to the AATSR IPF Relating to Land Surface Temperature Retrieval and Cloud Clearing over Land; AATSR Technical Note; Rutherford Appleton Laboratory: Chilton, UK, 2007. [Google Scholar]
Závody, A.M.; Mutlow, C.T.; Llewellyn-Jones, D.T. A Radiative Transfer Model for Sea Surface Temperature Retrieval for the Along-Track Scanning Radiometer. J. Geophys. Res. Oceans 1995, 100, 937–952. [Google Scholar] [CrossRef]
Atmospheric Radiation Measurement (ARM) User Facility. Ceilometer (CEIL). In Southern Great Plains (SGP) Central Facility, Lamont, OK (C1); Morris, V., Zhang, D., Ermold, B., Eds.; 01-09-2002 to 30-07-2017; ARM Data Centre, 1997. Available online: https://www.osti.gov/dataexplorer/biblio/dataset/1181954 (accessed on 20 May 2019).
Maturilli, M.; Herber, A. Ceilometer Cloud Base Height from Station Ny-Alesund from August 1992 to July 2017, Reference List of 290 Datasets. PANGAEA. 2017. Available online: https://doi.pangaea.de/10.1594/PANGAEA.880300 (accessed on 13 March 2022). [CrossRef]
Atmospheric Radiation Measurement (ARM) User Facility. Ceilometer (CEIL). In North Slope Alaska (NSA) Central Facility, Barrow AK (C1); Morris, V., Zhang, D., Ermold, B., Eds.; 01-09-2002 to 30-07-2017; ARM Data Centre, 1997. Available online: https://www.osti.gov/dataexplorer/biblio/dataset/1181954 (accessed on 20 May 2019).
Atmospheric Radiation Measurement (ARM) User Facility. Ceilometer (CEIL). In ARM Mobile Facility (OLI) Oliktok Point, Alaska; AMF3 (M1, C1); Morris, V., Zhang, D., Ermold, B., Eds.; 01-06-2016 to 30-07-2017; ARM Data Centre, 2013. Available online: https://www.osti.gov/dataexplorer/biblio/dataset/1181954 (accessed on 20 May 2019).
Atmospheric Radiation Measurement (ARM) User Facility. Ceilometer (CEIL). In Tropical Wester Pacific (TWP) Central Facility, Manus I. PNG (C1); Morris, V., Zhang, D., Ermold, B., Eds.; 03-09-2002 to 28-03-2012; ARM Data Centre, 1996. Available online: https://www.osti.gov/dataexplorer/biblio/dataset/1181954 (accessed on 20 May 2019).
Haeffelin, M.; Barthós, L.; Bock, O.; Baitel, C.; Bony, S.; Bounial, D.; Chepfer, H.; Chiriaco, M.; Cuesta, J.; Delanoö, J.; et al. SIRTA, a ground-based atmospheric observatory for cloud and aerosol research. Ann. Geophys. 2005, 23, 253–275. [Google Scholar] [CrossRef] [Green Version]
González, G. LCZO—Metrology—ceilometer and meteorological data—Sabana, Bisley—(2000–2016). Hydroshare. 2019. Available online: https://www.hydroshare.org/resources/4a5f560d20004213acf3eb52cca93f67 (accessed on 17 April 2020).
Ackerman, S.A.; Menzel, P.; Frey, R.; Baum, B. MODIS Atmosphere L2 Cloud Mask Product; NASA MODIS Adaptive Processing System, Goddard Space Flight Centre: Greenbelt, MD, USA, 2017. [Google Scholar] [CrossRef]
ACRI-ST IPF Team. Product Data Format Specification—SLSTR Level 1 Products. In Preparation and Operations of the Mission Performance Centre (MPC) for the Copernicus Sentinel-3 Mission; ACRI-ST IPF Team, 2019; Available online: https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-3-slstr/document-library/-/asset_publisher/fgeRuiak1n39/content/id/3402065 (accessed on 17 April 2020).

Figure 1. Gaussian mixture model illustration: the original PDF (blue) and additional PDF (red) are added to give a new PDF (black dashed line). The new PDF is centered on the sum of the mean and sigma of the original PDF and has a sigma twice that of the original PDF.

Figure 2. PDF 1 (Table 1) plots for different biome classifications. Plots show the 11.0 μm-NWP surface temperature (d110-surft) as a function of the 11.0–12.0 μm channel difference (d110–120). PDFs are for near-nadir observations (the first atmospheric path length bin) and summed over all NWP surface temperature bins. The colour bars use consistent shading between plots with different maxima dependent on the data density.

Figure 3. Cloud mask comparison for scene 15 (Table 3) between the manually masked dataset (left), the Bayesian cloud mask (center), and the operational cloud mask (right). The x and y axes labels are the column and row numbers of the cloud mask arrays, included as a reference for image comparison.

Figure 4. Box and whisker plot comparing the hit rate percentage difference between sensor pairs (ATSR2 vs. AATSR, AATSR vs. MODIS and MODIS vs. SLSTR) for all locations in Table 7 for the Bayesian cloud mask (bottom) and the various operational cloud masks (top). The red line shows the median difference and the edges of the boxes represent the 25th and 75th percentiles.

Table 1. PDF LUTs used to determine the probability of ‘not clear’ conditions given the observations and background state vector.

PDF	Channels (μm)	Dimension	Range	No. of Bins	Time
1	11.0, 12.0	Biome	1–8	7	Day
		Solar zenith angle	0–180°	2
		Atmospheric path length	1–2.4	4
		NWP surface temperature	240–330 K	90
		11.0–12.0 μm channel difference	−1–6 K	35
		11.0 μm-NWP surface temperature	−15–15 K	30
2	3.7, 11.0, 12.0	Biome	1–8	7	Night
		Solar zenith angle	90–180°	1
		Atmospheric path length	1–2.4	4
		NWP surface temperature	240–330 K	90
		3.7–11.0 μm channel difference	−6–10 K	80
		11.0–12.0 μm channel difference	−1–6 K	35
		11.0 μm-NWP surface temperature	−15–15 K	30
3	0.6, 0.8, 1.6	Biome	1–8	7	Day
		Solar zenith angle	0–95°	38
		Atmospheric path length	1–2.4	4
		1.6 μm channel	0–1	50
		0.8 μm channel	0–1	50
		0.6–0.8 μm channel difference	−0.5–0.2	35

Table 2. Biome definitions for PDF LUT binning over land surfaces as described by the Land Cover CCI Classes [28] and the ATSR LST Biome Classification (ALB2) classes [29].

Biome	Name	Land Cover CCI Classes	ALB2 Classes
1	Cropland	10, 11, 12, 20, 30, 40, 100, 110	-
2	Forest	50, 60, 61, 62, 70, 71, 72, 80, 81, 82, 90	-
3	Shrubland	120, 121, 122, 130, 140, 150, 151, 152, 153	-
4	Flooded	160, 170, 180	-
5	Urban	190	-
6	Bare Soil	200, 201, 202	20, 21, 22, 23, 24, 25
7	Snow and Ice	220	-

Table 3. SYNERGY and DUE GlobTemperature project manually masked scenes used to calculate cloud mask performance metrics (information collated from [19,36]).

Scene	Location	Date	Orbit Number	Day/Night	Project
1	China	21/05/2007	27304	Night	GlobT
2	Russia	22/05/2007	27314	Day	GlobT
3	Florida, USA	23/05/2007	27333	Day	GlobT
4	UK	23/08/2007	28647	Day	GlobT
5	Algeria	11/06/2010	43290	Day	GlobT
6	Ukraine	05/08/2010	44083	Night	GlobT
7	Antarctica	08/08/2010	44121	Night	GlobT
8	Mauritania	08/08/2010	44128	Night	GlobT
9	Canada	07/06/2011	48474	Day	GlobT
10	Uruguay	11/11/2009	40250	Night	GlobT
11	Brazil	06/06/2004	11858	Day	SYNERGY
12	Brazil	26/05/2005	17369	Day	SYNERGY
13	Oklahoma, USA	09/08/2004	12776	Day	SYNERGY
14	Oklahoma, USA	16/05/2005	16784	Day	SYNERGY
15	Oklahoma, USA	25/07/2005	17786	Day	SYNERGY
16	Oklahoma, USA	05/06/2006	22295	Day	SYNERGY
17	Oklahoma, USA	08/10/2007	29309	Day	SYNERGY
18	Burkina Faso	28/11/2002	03897	Day	SYNERGY
19	Burkina Faso	01/12/2002	03941	Day	SYNERGY
20	Burkina Faso	02/01/2003	04398	Day	SYNERGY
21	Burkina Faso	21/01/2003	04670	Day	SYNERGY
22	Burkina Faso	15/07/2003	07176	Day	SYNERGY
23	Zambia	15/01/2003	04583	Day	SYNERGY
24	Zambia	09/07/2003	07088	Day	SYNERGY
25	Zambia	17/11/2003	08090	Day	SYNERGY
26	Russia	03/06/2004	11810	Day	SYNERGY
27	Russia	12/08/2004	12812	Day	SYNERGY
28	Russia	23/06/2005	17321	Day	SYNERGY
29	Russia	06/10/2005	18824	Day	SYNERGY
30	Russia	21/09/2006	23834	Day	SYNERGY
31	Russia	02/08/2007	28343	Day	SYNERGY

Table 4. Cloud detection performance metrics for Bayesian and operational cloud masks compared to manually masked AATSR data from the SYNERGY and ESA DUE GlobTemperature projects.

Performance Metric	Bayesian Cloud Detection (%)	Operational Cloud Detection (%)
Percentage of Perfect Classification	86.6	81.0
Hit Rate	93.9	86.0
False Alarm Rate	16.7	21.6
True Skill Score	77.2	64.7

Table 5. Confusion matrices for Bayesian and operational cloud masks compared to manually masked AATSR data from the SYNERGY and ESA DUE GlobTemperature projects. The numbers in bold are the observations correctly classified by each algorithm.

Confusion Matrix	Manual Mask Cloud	Manual Mask Clear
Bayesian Cloud	1,437,732	570,659
Bayesian Clear	93,099	2,856,615
Operational Cloud	1,316,944	729,595
Operational Clear	213,887	2,697,679

Table 6. Ground station locations of in situ ceilometer data used to validate the Bayesian cloud detection applied to different satellite sensors [39,40,41,42,43,44,45]. The North Slope of Alaska station has two longitudes as the station location has moved within the data record.

Station	Country	Latitude (Degrees)	Longitude (Degrees)	Frequency of Observations
Bingley	UK	53.8	−1.9	1 min
Coleshill	UK	52.5	−1.7	1 min
North Slope	Alaska	71.3	−156.6/−156.7	35 s
Ny Alesund	Norway	78.9	11.9	5 min
Oliktok Point	Greenland	70.5	−149.9	30 s
Paris	France	48.7	2.2	3 s
Sabana	Puerto Rico	18.3	−65.7	1 h
Southern Great Plains	USA	36.6	−97.5	1 min
Tropical West Pacific	Papua New Guinea	−2.1	147.4	1 min
University of Reading	UK	51.4	0.9	1 min

Table 7. Bayesian cloud detection hit rate percentage with reference to ceilometer observations for overlapping data between different sensors: ATSR-2 and AATSR, AATSR and MODIS, MODIS and SLSTR.

Location	ATSR-2	AATSR	AATSR	MODIS	MODIS	SLSTR
Bingley, UK	-	-	94.4	92.8	93.3	96.1
Coleshill, UK	-	-	89.7	90.9	92.4	92.7
North Slope Alaska	77.5	83.3	78.8	95.2	95.0	91.4
Ny Alesund	86.1	87.2	88.9	95.2	94.6	96.0
Oliktok Point Alaska	-	-	-	-	95.7	94.5
Paris	-	-	-	-	96.2	98.5
Puerto Rico	-	-	-	-	100.0	100.0
Southern Great Plains	91.9	96.3	90.1	87.4	89.9	97.2
Tropical West Pacific	76.5	70.0	83.9	84.8	-	-
University of Reading, UK	-	-	-	-	91.9	95.5

Table 8. Number of cloudy ceilometer observations compared when calculating the hit rate for overlapping data between different sensors: ATSR-2 and AATSR, AATSR and MODIS, MODIS and SLSTR.

Location	ATSR-2	AATSR	AATSR	MODIS	MODIS	SLSTR
Bingley, UK	-	-	36	125	570	563
Coleshill, UK	-	-	29	110	542	547
North Slope Alaska	120	102	170	673	686	455
Ny Alesund	532	501	973	2147	2156	2231
Oliktok Point Alaska	-	-	-	-	515	327
Paris	-	-	-	-	292	326
Puerto Rico	-	-	-	-	10	11
Southern Great Plains	62	54	81	174	189	290
Tropical West Pacific	17	30	87	151	-	-
University of Reading, UK	-	-	-	-	405	445

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bulgin, C.E.; Embury, O.; Maidment, R.I.; Merchant, C.J. Bayesian Cloud Detection over Land for Climate Data Records. Remote Sens. 2022, 14, 2231. https://doi.org/10.3390/rs14092231

AMA Style

Bulgin CE, Embury O, Maidment RI, Merchant CJ. Bayesian Cloud Detection over Land for Climate Data Records. Remote Sensing. 2022; 14(9):2231. https://doi.org/10.3390/rs14092231

Chicago/Turabian Style

Bulgin, Claire E., Owen Embury, Ross I. Maidment, and Christopher J. Merchant. 2022. "Bayesian Cloud Detection over Land for Climate Data Records" Remote Sensing 14, no. 9: 2231. https://doi.org/10.3390/rs14092231

APA Style

Bulgin, C. E., Embury, O., Maidment, R. I., & Merchant, C. J. (2022). Bayesian Cloud Detection over Land for Climate Data Records. Remote Sensing, 14(9), 2231. https://doi.org/10.3390/rs14092231

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bayesian Cloud Detection over Land for Climate Data Records

Abstract

1. Introduction

2. Bayesian Cloud Detection over Land

2.1. Radiative Transfer and Numerical Weather Prediction (NWP) Data

2.2. Uncertainty Specification

2.3. Gaussian Mixed Model

2.4. Surface Characterisation

2.5. Cloudy Probability Distribution Functions (PDFs)

2.6. Aerosol Characterisation

3. Results

4. Application to CDR Generation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI