1. Introduction
Clouds have a major impact on the Earth’s radiation budget, and, thus, play a crucial role in the terrestrial climate system [
1]. They cool the atmosphere by reflecting the incoming solar radiation. Concurrently, they warm the atmosphere by intercepting and radiating back the radiation emitted by the Earth’s surface. The net radiative effect of a cloud depends on its physical properties [
2], and cloud feedbacks are among the most uncertain components of the climate models. Consistent and continuous cloud observations are required to better understand the cloud-climate interactions. Therefore, as part of the United Nations Framework Convention on Climate Change (UNFCCC), the Global Climate Observing System (GCOS) has included cloud properties in the set of essential climate variables (ECVs) [
3] with a special emphasis on satellite-based retrievals [
4].
Polar-orbiting satellites provide a global coverage of cloud information at sub-daily time resolution. This feature has been exploited for deriving cloud climatologies by, for example, the International Satellite Cloud Climatology Project (ISCCP) [
5], Pathfinder Atmospheres Extended (PATMOS-x) [
6,
7], and the EUMETSAT Satellite Application Facility on Climate Monitoring (CM SAF) [
8] to produce the CLoud, Albedo and RAdiation dataset (CLARA-A1) [
9]. Recently, the European Space Agency (ESA) has initiated the ESA-Cloud-CCI project focused on cloud studies in the frame of its Climate Change Initiative (CCI) running over the time period of 2010 to 2016 [
10]. The ESA-Cloud-CCI aims at adapting and developing the state-of-the-art cloud retrieval schemes [
11] to be applied to the longest existing time series of the cloud observations available from polar orbiting satellites with AVHRR and AVHRR-like sensors [
12]. The cloud properties (
i.e., cloud cover, cloud top height and temperature, cloud optical thickness, cloud effective radius, and liquid and ice water paths) are derived by means of an optimal-estimation-based retrieval framework [
13] for: (1) the Advanced Very High Resolution Radiometer (AVHRR) heritage product (1982–2014) comprising (Advanced) Along Track Scanning Radiometer (A)ATSR, AVHRR and Moderate-Resolution Imaging Spectroradiometer (MODIS), and (2) the (A)ATSR-Medium Resolution Imaging Spectrometer (MERIS) product (2002–2012) [
14].
In order to be useful for the climate studies, the satellite-based cloud datasets must fulfil quality requirements defined by GCOS [
4]. The quality assessment is based on a comparison with the ground-based observations or other satellite-based datasets. The active sensors onboard CloudSat [
15] and CALIPSO [
16] have proved beneficial for the validation of passive radiometers with their ability to reveal the vertical cloud structure [
17]. However, their scarce spatio-temporal coverage limits the number of possible collocations, thus, the active sensor data is more useful for cloud retrieval algorithm development than as a reference for cloud climatology datasets. The conventional surface observations (SYNOP) still remain a common reference for the validation of cloud cover from passive sensors [
18,
19,
20,
21,
22,
23,
24].
There are several sources of uncertainty when validating satellite-derived cloud cover with ground-based synoptic observations [
25]. A different viewing perspective (the uppermost cloud layer seen by the satellite
versus the lowest layer observed from the ground) can cause significant discrepancy in case of multi-layer clouds [
26,
27,
28]. Further uncertainty can be caused by a different spatial footprint (the passive sensor’s spatial resolution of 1–5 km
versus synoptic observations limited by the typical range of vision of 30–50 km [
29]), as well as by a different sensitivity of a satellite sensor and the human eye (a cloud of certain optical thickness may be visible for the observer but remain transparent for the sensor, and
vice versa). Moreover, the uncertainty increases towards the edges of the field of view for satellite observation, and towards the horizon for visual observation. Further, satellite-based cloud retrieval algorithms usually provide binary information (cloudy or cloudless), while ground observations report the part of the visible sky covered by clouds with an accuracy of 1/8 (okta). In addition, the okta scale is not linearly related to cloud cover. As soon as a cloud is visible, even covering less than 1/8 of the sky, at least 1 okta is reported. Similarly, a small discontinuity in the cloud cover (clear sky of less than 1/8 of the sky) is reported as 7 okta [
30,
31,
32]. Between 2 and 6 okta, the synoptic observations should reflect the part of the sky covered by clouds. All the mentioned different features of the spaceborne and ground cloud cover observations can affect the validation results, even if both observations match perfectly in time.
However, satellite-based measurements and reference ground-based cloud cover observations are discrete and usually not performed at the same time. Particularly, the observation time difference occurs when comparing cloud retrievals from polar orbiting satellite data with irregular overpass times to 3- or 6-h SYNOP observations. A maximum collocation time difference between these two types of observations has to be chosen. It strongly varies among the validation activities: e.g., 15 min [
33], 1 h [
18], or 4 h [
24]. Fontana
et al. [
20] used an average of the synoptic observations at 9UTC and 12UTC, and of 12UTC and 15UTC to validate cloud cover from the Terra-MODIS morning acquisition and the Aqua-MODIS afternoon acquisition, respectively. Kotarba [
22] set the maximum time difference between the MODIS cloud mask and SYNOP to 30 min, but, in addition, normalized the SYNOP observations to MODIS overpass times using a linear interpolation. To avoid this discrepancy some authors perform the validation only based on the daily or monthly averages [
19].
The choice of a small time difference (e.g., 10 min) ensures that both observations (satellite and SYNOP) reflect the same cloud state. However, such a choice strongly limits the number of satellite overpasses, which have a corresponding SYNOP observation. As a consequence only a subsample of all satellite observations can be used for the validation, which introduces a sampling error. On the other hand, the maximum time shift of 90 min for 3-h SYNOP (180 min for 6-h SYNOP) allows the use of all satellite overpasses and minimizes the sampling error, but at the expense of introducing an error due to the incomparability of the cloud states separated by up to 90 (or 180) min. In this context, defining the optimal maximum time difference between satellite and ground observations requires the compromise between sampling and incomparability errors.
The main objective of this paper is to quantify and demonstrate the impact of this time difference on validation results of satellite-derived cloud cover. This could be studied on the actual satellite-derived cloud cover data (such as the ESA-Cloud-CCI). Then, however, the assessment would be limited to the accuracy of the chosen satellite-based cloud cover; the validation of the ESA-Cloud-CCI is not the scope of this paper. In order to assess the impact for the range of possible accuracies (from perfect to low skill) an idealized study is performed. The validation dataset is composed of 10-min cloud amount estimates, time of ground observations (SYNOP), and real NOAA/AVHRR overpass times. It allows to analyze the impact of the time difference with a 10-min step, which would not be possible using 3-h SYNOP instead. After quantifying the sampling and incomparability errors on validation results, we introduce and evaluate a method for modeling the unbiased (true) validation results, as they would be derived without any time shift between satellite overpass and reference ground observation.
3. Methods
In this section we describe the main steps of the analysis following the flowchart shown in
Figure 2.
Figure 2.
Flowchart of the analysis performed in this study. The boxes represent: time series (grey), time settings used in the analysis (red) and contingency matrices (green). The ellipses indicate cloud cover skill scores. See text for details
Figure 2.
Flowchart of the analysis performed in this study. The boxes represent: time series (grey), time settings used in the analysis (red) and contingency matrices (green). The ellipses indicate cloud cover skill scores. See text for details
3.1. Creating a Synthetic Validation Data Set
For each BSRN site we transformed the three-year APCADA cloud amount at the 10-min resolution into binary cloud cover classifying 0–3 okta as cloudless and 4–8 okta as cloudy conditions. This formed a reference cloud cover time series (ref) defined as:
where
T10m is the time from 1 January 2007 to 31 December 2009 with 10-min intervals (157, 824 elements).
We used the APCADA-based binary cloud cover time series to mimic a satellite-based cloud cover retrieval of specified accuracy. This was achieved by degrading ref through swapping
p percent of the 10-min observations (the cloudy observations became cloudless, and cloudless became cloudy). We used 9 different values of
p: from 0 to 40% with a 5% step. The upper range was chosen empirically, as 40% of swapped observations led to almost no skill. We presumed that the cloud retrieval errors can occur either for isolated observations or, more likely, for several consecutive 10-min observations. The cloud retrieval errors can be related to a specific weather condition (such as a fog, snow cover or sub-pixel convection). Hence, the distribution of the swapped observations was described by the swapping time span (
s), which defined the length of the consecutive erroneous retrievals. We used six time spans (for
p > 0%): 10, 30, and 60 min, and 3, 12 and 24 h. The swapping blocks were then randomly distributed along the ref time series. We defined the degraded time series as:
Thus, for instance,
indicates a degraded reference time series where 5% of the observations in 3-h blocks were swapped.
For each of the 10 sites one reference time series (ref) and 49 degraded time series (deg) were generated: they combined 8 p’s greater than 0% with 6 s’s and were extended by
(equal to ref), which was the artificial cloud retrieval of a perfect skill. The lowest skill was represented by
.
3.2. Validation Procedure
The reference (ref) and degraded (deg) time series were used to analyze the theoretical impact of time difference between satellite overpass and ground observation on the satellite-derived cloud cover performance. The exact times of the NOAA 15–18 overpasses were used, and the SYNOP observations were assumed to be carried out routinely every 3 or 6 h.
The performance of each deg was measured by a skill score commonly referred to as the Hanssen-Kuiper’s Discriminant formulated as [
42]:
where
a (correct detections),
b (false alarms),
c (misses) and
d (correct no-detection) build a contingency matrix (
Table 2). HK can be also formulated as a difference between the hit rate:
H =
a/(
a +
c), and the false alarm rate:
F =
b/(b +
d). We derived HK only for a contingency matrix of the number of samples (
a +
b +
c +
d) equal or greater than 10. A perfect cloud detection receives the score of one, random retrieval the score of zero, and inferior to random a negative score. HK equals zero for the constant detection of the cloudy or cloudless conditions. Furthermore the contribution made by a correct miss or a correct detection increases as the event is more or less likely, respectively [
42]. Thus, HK also reflects the skill of detecting rare events, which makes it more robust than, e.g., the
H and
F alone.
Table 2.
A contingency matrix for the evaluation of the satellite-based cloud cover against reference observations.
Table 2.
A contingency matrix for the evaluation of the satellite-based cloud cover against reference observations.
| | Ground Observation |
---|
| | Cloudy | Cloud-Free |
---|
Satellite | Cloudy | a | b |
Cloud-free | c | d |
The validation procedure was performed for each station and degraded time series (deg). First the unbiased (true) skill score (HK0) was calculated assuming no time difference between deg (simulating the satellite image acquisition) and ref (simulating the reference SYNOP observation). Only a subset of the 10-min time steps, closest to the actual satellite overpass times during three years, were used to derive HK0. We notate “HK0” for simplicity, however the time difference for HK0 was not exactly equal zero, but did not exceed 5 min.
Next we performed the validation of each deg assuming a time difference (
Δt) between satellite overpass and SYNOP from 10 to 90 min (for 3-h SYNOP) or 10 to 180 min (for 6-h SYNOP) with a 10-min step. To assess the impact of
Δt on HK we validated deg
j=i against ref
i+Δt: both the satellite-derived and reference observations were shifted by
Δt. For each
Δt the number of satellite overpasses (
n) corresponding to the SYNOP observations with a time difference below or equal
Δt were determined. The sampling error for
n being lower than the total number of overpasses (
N) was estimated with the bootstrap technique [
43]: HK
Δt was derived 500 times from
n observations randomly chosen from all satellite overpasses. As a result for each site: SYNOP frequency, percent of swapped observations (
p), swapping time span (
s), and time difference (
Δt) 500 HK’s were derived. They were compared with HK
0 to assess the impact of the time difference, sampling size and cloud regime of the site on the validated HK.
3.3. Modeling the Unbiased Skill Score
To examine how accurately the unbiased HK0 can be reconstructed from the HK’s affected by Δt, the validation procedure was modified and performed based on the actual collocations between the satellite overpasses and ground observations for certain Δt’s (while previously only the number of collocations n was used, and ref and deg where shifted by Δt). APCADA-based cloud cover was extracted for each degp,s only at the satellite overpass times, and for each ref only at the SYNOP observation times. Then, HKΔt was calculated for the range of different maximum Δt. This way the sampling error was not assessed (unlike before by the bootstrap), but it impacted each HK derived with a given Δt. HK0 was derived with the method described in the previous section.
Next, we modeled the unbiased skill score (HK
mod) from the 9 (for 3-h SYNOP) or 18 (for 6-h SYNOP) HK
Δt’s. First, using a least square regression we fitted a linear function
f to the HK
Δt’s:
Then HK
mod was calculated as:
To evaluate HKmod the commonly used performance statistics such as the mean bias error, mean absolute error, and root mean square error were calculated against the unbiased skill score HK0. The significance of the differences between performances of the validation methods was tested with a two-sided t-test for unpaired samples.
6. Conclusions
This study reveals the often-disregarded impact of time difference between satellite image acquisition and reference SYNOP observation on the validation accuracy of satellite-based cloud cover. Furthermore a method is presented for reconstructing the unbiased skill score, as it would be derived from the perfectly collocated satellite and reference data sets.
An increase of the maximum time difference between satellite observations and reference SYNOP introduces a collocation error due to the increasing incomparability of the two types of observations. The collocation error can degrade the cloud cover performance statistics, such as Hanssen-Kuiper’s discriminant (HK), by up to 45%. Concurrently, a decrease of this maximum time difference results in less satellite observations having a corresponding SYNOP observation and consequently being used for the validation. This introduces a sampling error, which depends on the length of the validated time series and SYNOP frequency. The combination of the collocation and sampling errors with the increasing maximum time difference can both increase or decrease the validation accuracy.
We present a novel method for reconstructing the unbiased Hanssen-Kuiper’s skill score with the perfect temporal correspondence between the satellite and reference observations. The improvement in the validation accuracy is statistically significant. Since this reconstruction only requires the satellite observations and SYNOP, it can easily be applied to any validation of the cloud climatology data sets from the polar orbiting satellites. The method should further increase comparability of validation results utilizing reference data with different time frequency (e.g., APCADA and 3- or 6-h SYNOP). The R implementation of the method is available from the authors upon request.
We conclude that there is no generally applicable optimal time difference, which guarantees the most realistic validation accuracy compared to SYNOP data. The validation error depends on the length of the validated time series, the SYNOP frequency, as well as on site dependent cloudiness variability. Validation of cloud climatologies derived from polar orbiting satellites should ideally use cloud cover estimates of a high temporal resolution (such as APCADA) in order to minimize both sampling error and collocation difference. The availability of these estimates are currently limited, but for instance the BSRN sites cover a wide range of the global climatic zones, and thus are suitable for global-scale cloud climatology validation. Alternatively, when the SYNOP observations are used, the reconstruction method introduced in this paper can be employed to minimize validation uncertainty.
We have not yet applied the proposed method to real satellite-derived cloud cover estimates. This is planned for the validation of the long-term ESA-Cloud-CCI cloud climatology dataset, which will become available at the end of the project’s second phase (
www.esa-cloud-cci.org).