Global Evaluation of the Suitability of MODIS-Terra Detected Cloud Cover as a Proxy for Landsat 7 Cloud Conditions

Clouds limit the quality and availability of optical wavelength surface observations from Earth Observation (EO) satellites. This limitation is particularly relevant for the generation of systematic thematic products from EO medium spatial resolution polar orbiting sensors, such as Landsat, which have reduced temporal resolution compared to coarser resolution polar orbiting sensors such as the Moderate Resolution Imaging Spectroradiometer (MODIS). MODIS on the Terra satellite is in the same orbit as Landsat 7 with an approximately 30 minute overpass difference. In this study, one year of global Landsat 7 Enhanced Thematic Mapper Plus (ETM+) image cloud fractions over land are compared with collocated MODIS cloud fractions, generated by combining the MODIS-Terra global daily cloud mask product (MOD35) with the Landsat 7 ETM+ image footprints and acquisition calendar. The results show high correlation between the MODIS and Landsat 7 ETM+ cloud fractions (R2 = 0.83), negligible bias (median difference: <0.01) and low dispersion around the median (interquartile range: [−0.02, 0.06]). These results indicate that, globally, the cloud cover detected by MODIS-Terra data can be used as a proxy for Landsat 7 ETM+ cloud cover.


Introduction
Earth Observation (EO) data sensed in the optical portion of the electromagnetic spectrum are widely used for studying the Earth's biosphere, and its dynamics and disturbances from a regional to global scale. Cloud obscuration affects the quality and availability of EO optical data over land which presents a challenge for global land monitoring. Thanks to the systematic acquisition strategy and free data access policy of currently available moderate resolution optical data such as Landsat [1] and Sentinel-2 [2], global systematic land monitoring products at moderate resolution are being developed [3]. Landsat-8, Sentinel-2A, and Sentinel-2B provide 16-day, 10-day and 10-day global median average revisit intervals, respectively, and 2.9 days when combined together [4]. These revisit cycles are not particularly high, particularly for individual sensors, and consequently, cloud obscuration may remain an issue for monitoring applications using these data, for example, for monitoring features that evolve with fast temporal dynamics, such as certain burned areas [5][6][7], ephemeral water bodies [8], or agricultural crops [9].
The near daily global coverage 1 km cloud product derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) [10] has been used in feasibility studies to assess the potential impact of cloud obscuration on other land monitoring satellites, using the MODIS time series to generate a spatially explicit climatology of cloud cover probabilities [11]. Moderate resolution data, such as Landsat, cannot be directly used for generating such a climatology because of the limited number of available acquisitions, due to the low revisit frequency and to the fragmentary state of the archive outside the United States [12]. For example, researchers have used the MODIS global cloud mask product as a proxy to estimate the effect of cloud cover on the HyspIRI mission design [13], and to define the observation requirements for operational cropland monitoring from medium resolution systems [14].
However, the use of MODIS cloud data as a proxy assumes that the MODIS 1 km cloud mask is representative of cloud conditions experienced by other satellites. A recent regional study on the probability of Landsat cloud-free over the Eastern United States found a strong correlation between the cloud cover observed by Landsat 7 ETM+ and MODIS, but the intercomparison was limited to three Landsat path/row locations [15].
Several factors can contribute to differences among satellite cloud products. Wind can displace clouds and other atmospheric constituents in the time between different satellite overpasses [16,17]. Clouds evolve over time, and in many parts of the world, cloud cover increases during the day over land [18]. The MODIS-Terra and Aqua satellites have 10:30 a.m. and 1:30 p.m. equatorial overpass times respectively, and although globally they observe similar cloud amounts, in many regions they reveal different cloud amounts at the time of overpass [18,19]. Observations from the afternoon-train (A-train) polar orbiting CloudSat and CALIPSO sensors, that are designed for cloud monitoring, show good agreement of total horizontal cloud fraction retrievals at 2 • resolution with the A-train MODIS-Aqua 1 km cloud mask product [20]. This is likely because of the small 3 min (CloudSat) and 1 min (CALIPSO) overpass time differences relative to MODIS-Aqua. Other differences are due to the size and spatial distribution of clouds relative to the sensor spatial resolutions. For example, the 1 km MODIS cloud product may not detect sub-pixel or spatially fragmented clouds that are detectable at 30 m Landsat resolution. Differences in sensor spectral resolution may also introduce cloud detection differences. For example Zhao and Di Girolamo [21] compared 15 m cloud detections from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) data with the MODIS cloud product and found that MODIS systematically overestimated the cloud fraction relative to ASTER, with a mean difference of 0.18 and large variations across scenes associated with the spatial cloud patterns. As ASTER is onboard the same Terra platform as MODIS, these discrepancies were attributed to the differences in spatial and spectral resolution of the two instruments. Nominally, clouds have high reflectance at visible and SWIR wavelengths, and generally are cold, and so sensors with visible to SWIR bands and thermal bands provide more reliable cloud detection than sensors that have only a subset of these bands [22]. Finally, cloud detection algorithm differences may result in different cloud masks at coarse [23] and at medium resolution [24].
In this study, we evaluate the assumption that the MODIS-Terra 1 km cloud product can be used globally to predict Landsat 7 ETM+ image cloud fractions over land. This is a reasonable expectation as both sensors are in the same orbit and have an overpass time difference of approximately 30 minutes [25]. The fraction of cloud cover in every global Landsat image acquired in 2002 is compared to cloud cover fractions in the contemporaneous MODIS-Terra cloud cover product (MOD35). The relationship between the MODIS and Landsat 7 ETM+ cloud fractions is characterized using both linear and logistic models, and the corresponding regression coefficients and residual errors are examined.

Landsat Cloud Data
The Landsat 7 is in a polar sun-synchronous orbit with approximately a 705 km altitude, 98.2 • inclination, and 10:00 a.m. ± 15 min descending equatorial overpassing local time [26]. The Landsat 7 carries the Enhanced Thematic Mapper Plus (ETM+) instrument which senses multispectral data over a 15 • field of view. The orbit altitude and sensor field of view result in an approximately 185 km swath and 16-day nadiral overpass revisit frequency. Landsat data are distributed as 185 km × 170 km images defined in the Worldwide Reference System (WRS) that divides the globe into 233 paths (orbital ground-tracks) and 248 rows (latitude parallels) [27].
The most recent Landsat Collection 1 cloud mask information was used; it is generated using the Fmask algorithm [28,29] that was found to have better performance than other algorithms [22]. Fmask is a two-step object-based algorithm. First, a potential cloud layer is generated by combining spectral tests based on cloud physical properties, and a cloud probability mask based on temperature, brightness and spectral variability. The final cloud mask is subsequently generated through segmentation of the potential cloud layer. The percentage of clouds, over only the land pixels in each image defined by the CLOUD_COVER_LAND metadata, was used and scaled in the range 0 to 1. This is henceforth referred to as the "Landsat 7 cloud fraction."

MODIS Cloud Mask and Geolocation Product
The MODIS-Terra satellite is in the same descending orbit as Landsat 7 with a nominal period of 99 minutes, and a 16-days nadiral overpass revisit frequency, and trails Landsat 7 by approximately 30 minutes [25]. MODIS has a 110 • field of view and senses a 2330 km swath with overlapping swaths at latitudes >30 • [30]. The Collection 6 MODIS-Terra 1 km cloud mask product (MOD35) [10] was used. The MOD35 cloud detection algorithm involves five steps: (1) a series of spectral and spatial variability tests to detect the presence or absence of clouds are applied independently; (2) confidence scores are computed for each test applied; (3) the confidence scores of each test are combined into a preliminary overall confidence of the presence or absence of clouds for each pixel; (4) additional spatial and spectral tests ('restoral tests') are applied to reduce commission errors due to particular scene conditions (e.g., sun-glint, presence of bright surfaces); (5) a final cloud detection confidence level is generated, combining the results of all the cloud detection tests.
The MOD35 product classifies each 1 km MODIS observation into four possible states of decreasing confidence of cloud detection: "Cloudy" (confidence clear ≤ 66%), "Probably Cloudy" (66% < confidence clear ≤ 95%), "Probably Clear" (95% < confidence clear ≤ 99%), and "Clear" (confidence clear > 99%). In addition, ancillary data layers that describe the viewing and illumination geometry, and the land/water status of each 1 km observation are included in the product [10]. The MOD35 product is a Level 2 product and so is defined in the MODIS orbit swath geometry. The MODIS Geolocation product (MOD03) that defines the geographic location of the MODIS Level 2 swath products [31,32] was also used in this study.

Spatial and Temporal Extent of the Analysis
The analysis was performed using all available Landsat and MODIS data acquired from 1 January to 31 December 2002. This study period was selected because 2002 was the last complete year of Landsat 7 ETM+ acquisitions before the Scan Line Corrector (SLC) failure that resulted in a systematic failure to sense 22% of each image [33] and reduced the reliability of SLC-off Landsat image cloud metadata information [34]. In 2002, the MODIS-Terra data were globally available as MODIS-Terra was launched in 1999. The greater majority of the global Landsat image archive is acquired over land masses between 70 • S and 70 • N [12], and so the study was restricted to this latitudinal range.

Computation of MODIS Cloud Fractions for Each Landsat Image
The MOD35 1 km cloud and land/water masks for each orbit swath for each global day of 2002 were reprojected into the 1 km MODIS sinusoidal equal area projection [30] using the corresponding MOD03 swath geolocation files. The data were reprojected by nearest neighbor resampling to preserve the input 1 km data values. The Landsat 7 ETM+ ground swaths coincide with the central nadir-looking portion of the MODIS-Terra swath acquired on the same day from the same orbit. The footprint of each Landsat image was reprojected into the gridded 1 km MODIS sinusoidal equal area projection. The 1 km land pixels falling under each Landsat image for each orbit were derived from the reprojected 1 km MOD35 MODIS land/water mask, and the 1 km MODIS cloud mask values were summed to derive the MODIS land cloud fraction for each Landsat image.
The MOD35 land/water mask has four possible values, namely "Land", "Desert", "Coastal area", and "Water." To ensure consistency with the land pixel definition used to compute the Landsat 7 cloud fractions, only the MOD35 1 km pixels labelled as "Land", "Desert" or "Coastal area" in the MOD35 land/water mask were used to derive the MODIS cloud fractions.
The MODIS cloud fractions were defined for each Landsat image in three ways using different combinations of the MODIS cloud mask cloud detection confidence states (Table 1). For each definition, the MODIS cloud fraction was derived as the number of 1 km pixels labeled as cloud, divided by the total number of 1 km land pixels (labelled as "Land", "Desert" or "Coastal area" in the MOD35 land/water mask) encompassing the Landsat image. The three definitions include a conservative cloud definition (#1), an intermediated definition (#2) and a much less conservative definition (#3).

MODIS and Landsat Cloud Fraction Comparison Methodology
First, histograms of the Landsat cloud fractions and the MODIS cloud fractions were computed to verify that the whole range [0, 1] of cloud fractions was represented in the dataset. Subsequently, the cloud fraction difference, defined as the Landsat cloud fraction minus the MODIS cloud fraction, was computed for all the global 2002 study data. The distribution of the cloud differences was summarized through non-parametric summary statistics (median, interquartile range, and 5th-95th quantile range).
The relationship between the Landsat and MODIS cloud fractions was investigated using linear and logistic models, where the Landsat cloud fraction is the response variable and the MODIS cloud fraction is the predictor variable. A logistic regression was used because, in general, coarse resolution detection products, compared to higher resolution products, are affected by saturation issues. Thus, a logistic regression would be appropriate if the cloud fraction is overestimated at high cloud proportions and underestimated at low cloud proportions. The regression residuals were analyzed to investigate the presence of heteroscedasticity in the relationship between Landsat and MODIS cloud fractions.
Standard metrics and coefficients were reported. For the linear models, these include the linear regression β 0 (intercept) and β 1 (slope) coefficients, the standard error of the regression coefficients, the coefficient of determination of the linear regression (R 2 ), and the significance of the regression (p-value). In addition, the standard error of the residuals, i.e., the observed minus the predicted Landsat 7 cloud fraction, were derived. For the logistic models, these include the logistic regression coefficients β 0 (intercept term of the log-odds function) and β 1 (slope term of the log-odds function), the standard error of the regression coefficients, and the significance of the regression (p-value). Since there is no direct analog of R 2 in a logistic regression, the residual deviance was reported as a goodness-of-fit statistic for the model. In addition, the null deviance was reported, i.e., the deviance of a model when only a single coefficient (the intercept term of the log-odds function) is estimated. The difference between the null deviance and the residual deviance provides an indication of the overall performance of the model, with large differences indicating good performance and small differences indicating poor performance.
The presence of spatial patterns in the residuals were investigated by analyzing the distribution of the residuals at each Landsat WRS path/row location. Summary parameters of the distribution (median, interquartile range and 5th-95th quantile range) were calculated as a function of the land fraction, defined as the number of 1 km land pixels over the total footprint of the image, to verify whether the behavior of linear and logistic models was significantly different in coastal and inland areas. Finally, the distribution of the residuals was calculated as a function of the time of acquisition, using three-month seasonal intervals.

Landsat 7 ETM+ Image Availability and Global Cloud Histograms
A total of 109,814 Landsat 7 ETM+ images acquired in 2002 between 70 • S and 70 • N were considered in this study ( Figure 1). Nominally, the Landsat 7 ETM+ overpasses each WRS path/row 22 or 23 times per year, but in many regions, fewer images are available in the Landsat archive for a variety of reasons [32,35]. In 2002, most acquisitions occur over the Conterminous United States (CONUS), much of South America, Central and Southern Europe, Australia and Eastern Asia, whereas the coverage is reduced in parts of Africa, Central America, and Central and Northern Asia. (median, interquartile range and 5th-95th quantile range) were calculated as a function of the land fraction, defined as the number of 1 km land pixels over the total footprint of the image, to verify whether the behavior of linear and logistic models was significantly different in coastal and inland areas. Finally, the distribution of the residuals was calculated as a function of the time of acquisition, using three-month seasonal intervals.

Landsat 7 ETM+ Image Availability and Global Cloud Histograms
A total of 109,814 Landsat 7 ETM+ images acquired in 2002 between 70°S and 70°N were considered in this study ( Figure 1). Nominally, the Landsat 7 ETM+ overpasses each WRS path/row 22 or 23 times per year, but in many regions, fewer images are available in the Landsat archive for a variety of reasons [32,35]. In 2002, most acquisitions occur over the Conterminous United States (CONUS), much of South America, Central and Southern Europe, Australia and Eastern Asia, whereas the coverage is reduced in parts of Africa, Central America, and Central and Northern Asia.   Figure 1. The histograms exhibit a full range of cloud fraction values, i.e., from 0 to 1, and have bimodal distributions. Except for the least conservative MODIS cloud definition (#3), the first mode corresponds to images with 0% cloud fractions and the second mode corresponds to 100% cloud fractions. The MODIS cloud fraction histograms have progressively fewer cloud-free images for the less conservative cloud definitions.   Figure 1. The histograms exhibit a full range of cloud fraction values, i.e., from 0 to 1, and have bimodal distributions. Except for the least conservative MODIS cloud definition (#3), the first mode corresponds to images with 0% cloud fractions and the second mode corresponds to 100% cloud fractions. The MODIS cloud fraction histograms have progressively fewer cloud-free images for the less conservative cloud definitions.  (Table 1).   Table 1). The thick vertical lines show the median of the distribution, the boxes show the interquartile range, and the whiskers show the 5th to 95th quartile range. Figure 4 shows scatterplots comparing the Landsat 7 cloud fractions and the MODIS cloud fractions for the three MODIS cloud fraction definitions, and considers all the study data. The point density distribution, calculated using a 100 × 100 quantization of the axes, is plotted using a rainbow color scale. Linear and logistic regressions are shown plotted as dotted and continuous lines  (Table 1).  (Table 1).   Table 1). The thick vertical lines show the median of the distribution, the boxes show the interquartile range, and the whiskers show the 5th to 95th quartile range. Figure 4 shows scatterplots comparing the Landsat 7 cloud fractions and the MODIS cloud fractions for the three MODIS cloud fraction definitions, and considers all the study data. The point density distribution, calculated using a 100 × 100 quantization of the axes, is plotted using a rainbow color scale. Linear and logistic regressions are shown plotted as dotted and continuous lines  Table 1). The thick vertical lines show the median of the distribution, the boxes show the interquartile range, and the whiskers show the 5th to 95th quartile range. Figure 4 shows scatterplots comparing the Landsat 7 cloud fractions and the MODIS cloud fractions for the three MODIS cloud fraction definitions, and considers all the study data. The point density distribution, calculated using a 100 × 100 quantization of the axes, is plotted using a rainbow color scale. Linear and logistic regressions are shown plotted as dotted and continuous lines respectively. The correlation and regression coefficients are summarized in Table 2. Consistent with the boxplot results presented in Figure 3, the scatterplots indicate that the MODIS cloud fraction is closest to the Landsat cloud fractions when the most conservative MODIS cloud fraction definition is used (#1). Generally, MODIS overestimates the cloud fraction compared to Landsat and there is a progressive overestimation from definition #1 to #3, as shown by the majority of the point distribution (green to purple colors) positioned below the identity line (dashed line) for #2 and #3. This is also reflected by the linear regression lines: the regression lines of #2 and #3 are noticeably further apart from the identity line than the regression line for #1. respectively. The correlation and regression coefficients are summarized in Table 2. Consistent with the boxplot results presented in Figure 3, the scatterplots indicate that the MODIS cloud fraction is closest to the Landsat cloud fractions when the most conservative MODIS cloud fraction definition is used (#1). Generally, MODIS overestimates the cloud fraction compared to Landsat and there is a progressive overestimation from definition #1 to #3, as shown by the majority of the point distribution (green to purple colors) positioned below the identity line (dashed line) for #2 and #3. This is also reflected by the linear regression lines: the regression lines of #2 and #3 are noticeably further apart from the identity line than the regression line for #1.  Table 1). The logistic regression (solid), linear regression (dotted) and the identity line (dashed) are shown superimposed on point density distributions that are generated using a 100 × 100 quantization of the axes, and are displayed with a rainbow logarithmic color scale. Table 2. Regression results of the Landsat 7 and MODIS cloud fraction scatterplots shown in Figure  4. The logistic regression results are consistent with the results of the linear regression, with a similar residual deviance for definitions #1 (18,194) and #2 (18,199) and considerably higher residual deviance for #3 (22,146). For all three MODIS cloud definitions, the residual deviance is considerably smaller than the null deviance, indicating that the MODIS cloud fraction is an effective predictor for the Landsat 7 cloud fraction. For all three MODIS cloud definitions, the standard errors of the coefficients are smaller than the estimated values of the coefficients, and the regressions are significant (p < 0.05).

Definition
For all three MODIS cloud definitions, the linear regressions between the MODIS and the Landsat 7 cloud fractions are significant (p < 0.05). The highest coefficient of determination (R 2 ) is observed for MODIS cloud definition #1 (R 2 = 0.83), with a progressively lower coefficient of determination for definition #2 (R 2 = 0.82) and #3 (R 2 = 0.74). While the β1 (slope) coefficient is similar in the three definitions, with values of 0.89 (#1), 0.87 (#2) and 0.90 (#3), the β0 (intercept) coefficient decreases from 0.02 (#1), to −0.02 (#2) and −0.10 (#3), reflecting that the MODIS cloud fractions become systematically higher than the Landsat cloud fractions when the less conservative cloud definitions #2 and #3 are considered. The standard error of the residuals is also lower for definition #1 (0.149) compared to #2 (0.157) and #3 (0.186).  Table 1). The logistic regression (solid), linear regression (dotted) and the identity line (dashed) are shown superimposed on point density distributions that are generated using a 100 × 100 quantization of the axes, and are displayed with a rainbow logarithmic color scale. The logistic regression results are consistent with the results of the linear regression, with a similar residual deviance for definitions #1 (18,194) and #2 (18,199) and considerably higher residual deviance for #3 (22,146). For all three MODIS cloud definitions, the residual deviance is considerably smaller than the null deviance, indicating that the MODIS cloud fraction is an effective predictor for the Landsat 7 cloud fraction. For all three MODIS cloud definitions, the standard errors of the coefficients are smaller than the estimated values of the coefficients, and the regressions are significant (p < 0.05).
To further investigate the behavior of the linear and logistic models, the regression residuals, i.e., the observed minus the predicted Landsat 7 cloud fractions, are summarized in Figure 5 for the global year of study data. The scatter plots in Figure 5 were obtained by plotting the regression residuals as a function of the MODIS cloud fraction. The scatter plots separately consider the linear (top row) and logistic models (bottom row), for MODIS cloud definitions #1 (left column), #2 (center column) and #3 (right column). In all plots, the majority of the data points (green to purple colors) corresponds to low residuals (i.e., y-axes values close to 0), and only isolated data points (red colors) correspond to large residuals (up to 1.0 and −1.0). Consistent with the results of Figure 4 and Table 2, the most conservative MODIS cloud fraction definition (#1) has residuals that are clustered around 0 on the y-axes, whereas definitions #2 and #3 have more dispersed residuals. To further investigate the behavior of the linear and logistic models, the regression residuals, i.e., the observed minus the predicted Landsat 7 cloud fractions, are summarized in Figure 5 for the global year of study data. The scatter plots in Figure 5 were obtained by plotting the regression residuals as a function of the MODIS cloud fraction. The scatter plots separately consider the linear (top row) and logistic models (bottom row), for MODIS cloud definitions #1 (left column), #2 (center column) and #3 (right column). In all plots, the majority of the data points (green to purple colors) corresponds to low residuals (i.e., y-axes values close to 0), and only isolated data points (red colors) correspond to large residuals (up to 1.0 and −1.0). Consistent with the results of Figure 4 and Table 2, the most conservative MODIS cloud fraction definition (#1) has residuals that are clustered around 0 on the y-axes, whereas definitions #2 and #3 have more dispersed residuals.  Figure 5. These four ranges are used to broadly verify whether the distribution of the residuals changes as a function of the MODIS cloud fraction, i.e., to identify any potential heteroscedasticity of the data, which would violate some of the underlying assumptions of the regression analyses. The residual median and interquartile range values do not significantly change across the MODIS cloud fraction range, indicating that no marked heteroscedasticity is present for both the linear and logistic models, and for any of the three MODIS cloud mask definitions.  Figure 5. These four ranges are used to broadly verify whether the distribution of the residuals changes as a function of the MODIS cloud fraction, i.e., to identify any potential heteroscedasticity of the data, which would violate some of the underlying assumptions of the regression analyses. The residual median and interquartile range values do not significantly change across the MODIS cloud fraction range, indicating that no marked heteroscedasticity is present for both the linear and logistic models, and for any of the three MODIS cloud mask definitions.
The most conservative MODIS cloud fraction definition (#1) has the highest linear regression R 2 , and lowest logistic regression residual deviance (Table 2). Table 3 summarizes for this definition the median and interquartile range of the residuals for both regressions. The linear model has lower residuals than the logistic model for very low cloud fraction (<0.25), but slightly higher residuals at higher cloud fraction; overall, however, the two models have similar performance, with identical value of the median error (−0.003 linear and logistic) and comparable interquartile ranges ([−0.032, 0.085] linear model, [−0.058, 0.091] logistic model). Thus, the most conservative MODIS cloud fraction definition (#1) is an effective predictor of the Landsat 7 cloud fraction, both when fitting a linear and a logistic model.  Figure 6 shows the geographic distribution of the minimum (top row), median (middle row) and maximum (bottom row) residuals for the linear (left column) and logistic (right column) regression using the most conservative MODIS cloud definition (#1). Negative residuals (red, yellow colors) indicate that the linear and logistic models overestimate the observed Landsat cloud fraction, whereas positive residuals (cyan and blue colors) indicate the linear and logistic models underestimate the Landsat cloud fraction. Large minimum and maximum residuals (top and bottom row) are predominantly observed at WRS path/rows locations over coastlines, or at high latitude, or over snow prone mountain ranges. This is discussed further in Section 5.
The minimum and maximum residual values are driven by outliers whereas the median values correspond to the 50th percentile of the residual distribution and are robust to outliers; as indicated by the prevalence of green colors in the Figure 6 middle row figures, the median of the residuals is close to 0 at most locations. This is further illustrated by Figure 7, which summarizes the distribution of the median residuals shown in the middle row of Figure 6        Finally, in order to verify the presence of seasonal pattern, Figure 9 shows  Table 4.  Table 4. Finally, in order to verify the presence of seasonal pattern, Figure 9 shows  Table 4.  While no seasonal pattern is detected in the Southern Hemisphere, in the Northern Hemisphere the residuals exhibit a slight seasonal pattern, with higher residual in the winter months. The median of the residuals changes from 0.020 (linear) and 0.027 (logistic) in July-September to −0.019 (linear) and −0.030 (logistic) in January-March. Furthermore, while the 25th, 75th and 95th quantile are substantially stable, the 5th percentile shows the largest seasonal trend, with a variation from −0.161 (linear) and −0.150 (logistic) in July−September to −0.394 (linear) and −0.390 (logistic). These results indicate that the large minimum residuals at high latitude shown in Figure 6, (top row), are predominantly observed on images acquired in the Northern Hemisphere in January-March, i.e., in the winter months.

Discussion
Linear and logistic models were used to predict the Landsat 7 ETM+ cloud fraction from MODIS MOD35 observations, and a non-parametric analysis of the residuals was performed. Overall, the analysis indicates a high degree of correspondence between the Landsat and MODIS cloud fractions, with the best results observed when only the pixels confidently labeled as cloudy in the MOD35 product are used. This case resulted in the highest coefficient of determination and lowest residuals, when fitting the linear and the logistic models. The difference between the MODIS and Landsat cloud fractions increases when less conservative MOD35 cloud definitions are used, i.e., when MODIS pixels flagged as potentially cloudy and potentially clear are counted in the cloud fraction. Arguably, this can be explained considering that the MOD35 cloud masking algorithm makes use of the multiple thermal bands and the water vapor/CO 2 absorption bands which are available on the MODIS instrument. These bands are missing from the Landsat ETM+ instrument, resulting in low sensitivity to thin clouds [28].
The analysis of the residuals indicates that the performance of the linear and logistic models were substantially similar, with the linear model marginally better at low cloud fraction conditions, and the logistic model marginally better for higher cloud fraction conditions. In the case of the linear model, the intercept of the regression line is small (0.02), the slope is close to unity (0.87), and the median of the residuals is also small (−0.002); this small positive error indicates that the use of MODIS to predict the Landsat cloud fraction results in a small overestimation of the cloud cover. With either model, the MODIS-Terra cloud mask underestimates the Landsat cloud fraction at low cloud levels, and overestimates it at high cloud levels. Arguably this is due to the MODIS coarse resolution: small, sparse clouds are not detected, and similarly small gaps in cloud cover are missed. These findings are consistent with previous studies reporting that the MODIS cloud masking algorithm results in some degree of overestimation on fragmented and optically thin clouds [18,21]. Further, this is a well-known scale issue whereby coarse resolution detection products tend to over and underestimate the area detected, compared to higher resolution products, when the areal proportions being detected are high and low respectively. For example, this has been observed in comparison of MODIS and Landsat derived burned areas [36], snow cover [37], and comparison of AVHRR and Landsat derived forest maps [38].
The geographic analysis of the residuals revealed that the largest differences between the MODIS and Landsat cloud fraction occur along the coastlines, and at high latitude. In the case of coastal areas, further analysis indicated that the errors are mostly observed at WRS path/row locations with land fractions ≤0.05. In these cases, cloud movement in the 30 minutes between the MODIS-Terra and Landsat 7 overpasses may result in large discrepancies of the estimated cloud percentage over land. Furthermore, false positives in the MOD35 cloud mask over coastlines, rivers and lake regions are known issues of the MOD35 product [10,39]. Additionally, the temporal analysis of the residuals showed the presence of a seasonal effect in the Northern Hemisphere: the overestimation of the Landsat cloud fraction by the MODIS cloud mask is greater in winter than in summer. Visual inspection of the MODIS and Landsat images, where large discrepancies were observed, confirmed that, at high latitudes and over mountain ranges, ice and snow were flagged as clouds by the MODIS MOD35 product, also a known issue of the MOD35 product [10,40,41].
The results of this study indicate that the MODIS-Terra 1 km cloud product can be used to predict Landsat 7 ETM+ image cloud fractions over land. Thus, the consistent, long term (2000 onwards) daily cloud observations from the MODIS-Terra instrument can be used to assess the impact of cloud on moderate resolution land monitoring products from the Landsat satellites. Recommendations for the next (after Landsat 9) mission design are currently being developed [3]. Due to the high cost of thermal wavelength sensors, a possible design option may be to have separate satellites carrying thermal infrared (TIR) and visible to shortwave infrared wavelength (VSWIR) detectors. The degree of temporal overpass separation between the TIR and the VSWIR observations will be considered in application specific terms, but also to minimize overpass cloudiness differences. The results of this study indicate that, in the morning sun-synchronous orbit, an approximately 30 minute overpass time difference does not, globally on average, result in significant differences in observed cloud fraction over land. We finally note that some cautions must be used in considering these results, as high residuals of the linear and logistic regression models were observed on individual images, especially in the presence of snow or on coastal areas.

Conclusions
In this paper, a systematic global comparison between the Landsat 7 ETM+ image cloud fraction over land, and equivalent cloud fractions derived from contemporaneous MODIS-Terra (MOD35) cloud observations was undertaken. Three MODIS cloud fraction definitions were considered, including a conservative cloud definition, an intermediated definition and a much less conservative definition. Due to the similar acquisition orbits and overpass times, the MODIS MOD35 global cloud mask product is a good predictor of the Landsat 7 ETM+ image cloud fraction over land. The most conservative MODIS cloud fraction definition was the best predictor of the Landsat 7 ETM+ cloud fraction, resulting in high coefficient of determination (R 2 = 0.83), negligible bias (median difference: <0.01) and low dispersion around the median (interquartile range: [−0.02, 0.06]) of the estimated linear model. These results suggest that it is possible to use the daily probability of cloud cover, as observed by MODIS-Terra, as a proxy of the cloud cover observed by Landsat 7.