A One Year Landsat 8 Conterminous United States Study of Cirrus and Non-Cirrus Clouds

The first year of available Landsat 8 data over the conterminous United States (CONUS), composed of 11,296 acquisitions sensed over more than 11 thousand million 30 m pixel locations, was analyzed comparing the spatial and temporal incidence of 30 m cloud and cirrus states available in the standard Landsat 8 Level 1 product suite. This comprehensive data analysis revealed that on average over a year of CONUS observations (i) 35.9% were detected with high confidence cloud, with spatio-temporal patterns similar to those observed by previous Landsat 5 and 7 cloud analyses; (ii) 28.2% were high confidence cirrus; (iii) 20.1% were both high confidence cloud and high confidence cirrus; and (iv) 6.9% were detected as high confidence cirrus but low confidence cloud. The results illustrate the potential of the 30 m cloud and cirrus states available in the standard Landsat 8 Level 1 product suite but imply that the historical CONUS Landsat archive has about 7% of undetected cirrus contaminated pixels. Systematic cloud detection commission errors over a minority of highly reflective exposed soil/sand surfaces were found and it is recommended that caution be taken when using the currently available Landsat 8 cloud data over similar surfaces.


Introduction
At over 40 years, the Landsat series of satellites provides the longest temporal record of space-based land surface observations and the successful February 2013 launch of the Landsat 8 satellite is continuing this legacy [1].The Landsat 8 payload includes the Operational Land Imager (OLI) that has a new 1375 μm band designed to monitor cirrus clouds and the Thermal Infrared Sensor (TIRS) that together provide improved cloud detection capabilities [2].In addition, the radiometric resolution and the dynamic range of the OLI is improved compared to previous Landsat sensors, reducing band saturation over highly reflective objects, such as snow or cloud [2].Landsat 8 data are nominally processed into 185 km × 180 km scenes and, unlike for previous Landsat sensors, are provided with a spatially explicit 30 m quality assessment band that includes a cloud and cirrus cloud mask [3].Landsat data are less appropriate for quantification of cloud properties and amount compared to near daily global coverage and certain active cloud remote sensing systems [4,5] although Landsat data have been valuable for cloud morphology analyses [6].However, quantification of the incidence of cloudiness for Landsat terrestrial monitoring applications, especially in persistently cloudy regions [7,8], is important because optically thick clouds preclude optical and thermal wavelength remote sensing of the surface [9].
The Landsat sensors are in polar orbiting sun synchronous orbits with a 16 day repeat cycle and considerable global spatial and temporal variability in cloudiness at the time of Landsat overpass [10].Prior to Landsat 8, a number of cloud detection algorithms were developed.The standard algorithm was the automatic cloud cover assessment (ACCA) algorithm used to compute the cloud state of each Landsat pixel and summarized as metadata made available in the standard Landsat 7 product.The ACCA takes advantage of known spectral properties of clouds, snow, bright soil, vegetation, and water, and consists of twenty-six rules applied to five of the Landsat 7 Enhanced Thematic Mapper Plus (ETM+) bands [11].Over the conterminous United States (CONUS) every Landsat 7 acquisition is acquired, and the annual mean Landsat 7 ETM+ cloud cover is reported as about 40% [12].To date the incidence of cirrus clouds in Landsat data over large areas or long time periods has not been documented.Cirrus clouds are usually found in the upper atmosphere and are composed of irregular ice crystals that can be less opaque than non-cirrus clouds at reflective wavelengths but can scatter and absorb radiation significantly [13][14][15].The magnitude of cirrus cloud impacts on reflective wavelength data depends primarily on the cirrus cloud optical depth [16] and studies have shown the relative impact of cirrus clouds on normalized difference vegetation index values of about 15% [15].Globally, cirrus clouds are thought to cover on average about 17% of the surface [17] with more than 50% in the Intertropical Convergence Zone [17][18][19].Low altitude clouds, typically non-cirrus, may co-exist with spatially overlapping cirrus clouds [20] and recent estimates made using active sensors suggest that globally about 30% of low clouds are overlapped by high clouds [21].The inclusion of a 1375 μm band on the Landsat 8 OLI, based on Moderate Resolution Imaging Spectroradiometer (MODIS) sensor design heritage [13], provides the opportunity to examine Landsat cirrus cloud incidence.Therefore this study seeks to examine in Landsat 8 data (i) the incidence of cirrus and also non-cirrus clouds; and (ii) the incidence of cirrus occurring without non-cirrus clouds.The latter is particularly important as it has implications for the magnitude of undetected cirrus clouds present in the historical Landsat archive.In order to provide a comprehensive analysis, the first year of available Landsat 8 data over the CONUS are analyzed.

Data
The Landsat 8 satellite achieved its nominal orbit in April 2013 and began regular data collection in May 2013 [1].In this study all the Landsat 8 scenes sensed for the first year over the conterminous United States from 30 April 2013 to 29 April 2014 were used (a 53 week period).Version LPGS_2.3.0 data that were reprocessed in February 2014 to fix a significant change in the TIRS bands calibration, and so cloud mask product, were used.Landsat 8 scenes are processed nominally to Level 1T with processing that includes radiometric correction, systematic geometric correction, precision correction using ground control, and the use of a digital elevation model to correct parallax error due to local topographic relief [22,23].Certain acquisitions may not have sufficient ground control necessary for precision correction.In these cases, the best level of correction is applied, together with the correction for parallax error due to local topographic relief using a digital elevation model, to Level 1 Gt systematic (L1 Gt) [23,24].Current assessments of Landsat 8 geolocation accuracy indicate a 90% circular error of approximately 18 m and a reduced L1Gt accuracy of approximately 20 m [25].The Landsat 8 scenes are defined in 185 km × 180 km areas in a Worldwide Reference System (WRS) of path (groundtrack parallel) and row (latitude parallel) scene coordinates [26].Every daytime sunlit Landsat 8 overpass of the CONUS is ingested into the U.S. Landsat archive, a total of 455 unique path/rows, with an annual maximum of 22 or 23 acquisitions per path/row.In this study, a total of 11,296 scenes, of which 9491 were Level 1T and 1805 were Level L1Gt, were used.
Landsat 8 cloud mask algorithms are being developed [27] but this paper documents the cloud mask available in the standard Level 1 product [28].The spatially explicit 30 m cloud and cirrus cloud information stored in each Landsat 8 scene quality assessment band [3,28] were used.The cloud information defines for each 30 m pixel a value indicating if the pixel is either (a) high confidence cloud, (b) medium confidence cloud; (c) low confidence cloud; or (d) not observed (i.e., outside the sensor field of view).The cloud detection algorithm was developed pre-launch and uses a combination of the ACCA algorithm [11] adapted for Landsat 8 and a supervised classification algorithm [29].The medium confidence cloud detections occur primarily over cloud edges.The separate per-pixel cloud algorithm results masks are merged together via a weighted voting mechanism that is used to attribute the confidence level [3].Similarly, the cirrus cloud information defines for each 30 m pixel a value indicating if the pixel is either (a) high confidence cirrus cloud; (b) medium confidence cirrus cloud; (c) low confidence cirrus cloud; or (d) not observed.The medium confidence cirrus cloud state is not currently populated.The Landsat 8 cirrus cloud detection algorithm is simple; a pixel is labeled as high confidence cirrus if the OLI 1375 μm reflectance > 0.02 and as low confidence cirrus otherwise.The cirrus algorithm is based on heritage algorithms developed using the MODIS 1375 μm band but does not consider elevation effects or use other spectral bands [30][31][32].

Pre-Processing
The spatially explicit 30 m cloud and cirrus cloud information were extracted from each Landsat 8 scene.The data were grouped temporally into weekly acquisition periods defined by consecutive seven-day periods where the first week was defined as 1 to 7 January.Each week of cloud and cirrus cloud data were reprojected and nearest neighbor resampled into the Web Enabled Landsat data (WELD) Albers equal area projection using the approach described in [24].A weekly period was used as at CONUS latitudes adjacent Landsat 8 orbit paths are acquired seven days apart and so do not overlap spatially within a week, i.e., there is never more than one Landsat acquisition per 30 m Albers pixel location per week.In this way 53 weekly CONUS 30 m Landsat 8 images were generated spanning week 18 of 2013 to week 17 of 2014.

CONUS Weekly Cloud Analysis
For each week of CONUS data the total number of CONUS valid 30 m pixel observations (i.e., sensed by Landsat 8 regardless of cloud conditions) and the total number of 30 m pixels detected for each of the three cloud confidence states and the two cirrus cloud confidence states, and combinations of these states, were counted.The weekly CONUS counts were converted into weekly CONUS percentages by dividing by the total number of CONUS valid 30 m pixel observations and then multiplying by 100.
The weekly cloud state percentages were plotted to examine for any seasonal pattern.In addition, Pearson correlations and reduced major axis (RMA) linear regression between certain weekly percentages were derived.RMA regression was used as it allows for both the dependent and independent variables to have error [33].The means of the 53 weekly CONUS percentages were computed to summarize the average weekly CONUS cloud state over the year of study data.

CONUS Annual Pixel Level Cloud Analysis
The same cloud state counts as for the CONUS weekly analysis were derived but by counting the totals over the 53 weeks at each 30 m pixel location.The counts were converted into percentages by dividing each pixel count by the total number of valid observations (i.e., sensed by Landsat 8 regardless of cloud conditions) over the 53 weeks and then multiplying by 100.Pixel locations where there no valid observations over the 53 weeks were assigned fill values.This provided 30 m CONUS maps of the annual (53 week) percentage of the different cloud states.Histograms of the CONUS percentage values, quantized to the nearest 1% for visual clarity, were derived for the different cloud states.
The annual percentage maps cannot be visualized at 30 m resolution as the CONUS is composed of more than 11 thousand million 30 m pixels [24].Instead the maps were visualized at a reduced resolution by computing the median percentage in n × n km adjacent non-overlapping gridcells.The median rather than the mean was used as it is robust to outliers and because the percentage values in an n × n gridcell may not be normally distributed.Landsat scenes sensed in adjacent orbit paths have increasing across-track overlap further polewards [10].For the year of CONUS data the minimum across-track overlap between adjacent Landsat 8 paths occurred in Southern Florida (31.3 km) and the maximum overlap occurred in northern Washington state (83.1 km).Consequently 15 × 15 km gridcells (composed up to 250,000 30 m pixel values) were used as 15 km is less than half the minimum 31.3km across-track overlap observed between adjacent Landsat paths.

CONUS Weekly Cloud Analysis
Figure 1 shows for each of the 53 weeks the percentage of valid CONUS pixels that were high confidence cloud detections, high confidence cirrus detections, and high confidence cirrus detections that coincided with low confidence cloud detections.Greater percentages of high confidence cloud and high confidence cirrus detections occurred in the winter than in the summer weeks and are similar to previously observed seasonal CONUS Landsat cloud variations [12] and seasonal cirrus active satellite detection variations [17].The majority of CONUS weeks (50 of 53 weeks) had a greater percentage of high confidence cloud than high confidence cirrus detections, and these data had a correlation of 0.896.RMA regression of these data provided a relation of the form high confidence cirrus weekly percentage = 1.135 high confidence cloud weekly percentage − 12.14 (n = 53, R 2 = 0.804).The weekly percentages of high confidence cirrus detections that coincided with low confidence cloud detections were correlated with the weekly percentages of high confidence cirrus detections (0.809) and less correlated with the weekly high confidence cloud detections (0.583).The means of the 53 weekly CONUS percentages illustrated in Figure 1, and for other weekly CONUS cloud state percentages, are summarized in Table 1.The mean 53-week CONUS high confidence cloud percentage (36.5%) is similar to the 40% annual mean Landsat 7 CONUS reported cloud cover [12] and the mean 53-week CONUS high confidence cirrus percentage (29.3%) is higher but not dissimilar to the 17% reported global average cirrus cloud cover [17].The mean 53-week incidence of high confidence clouds that coincided also with high confidence cirrus was on average 21.1%, which is a similar magnitude to the reported 30% global incidence of low clouds overlapped by high clouds [21].The incidence of medium confidence cloud detections is low with only a 3.2% 53-week mean.In all weeks the percentage of high confidence cirrus detections that coincided with low confidence cloud detections was quite small (Figure 1) with a 53-week mean of 7.3% (Table 1) and the percentage of high confidence cirrus detections that coincided with medium confidence cloud detections was smaller with a 53-week mean of 1.0% (Table 1).This implies that about 7% of the pre-Landsat 8 CONUS data may be cirrus contaminated but are undetected as there was no cirrus cloud detection capability on earlier Landsat sensors.

CONUS Annual Pixel Level Cloud Analysis
Figures 2 and 3 show maps of the percentage of high confidence cloud and high confidence cirrus detections, respectively, derived at each 30 m conterminous US Albers projection pixel location over 53 weeks of Landsat 8 observations.The median percentage values in 15 × 15 km gridcells are shown using the same color map for ease of comparison.The geographic distribution of the high confidence cloud detections (Figure 2) reflect distributions observed in Landsat 5 and 7 CONUS cloud studies with a greater percentage of high confidence cloud detections in the North East and North West and smaller percentages in the arid South West [12].The geographic distribution of the high confidence cirrus detections (Figure 3) is similar to the high confidence cloud detections (Figure 2) and the correlation of the illustrated data is 0.57.The percentage of high confidence cirrus detections are generally lower than the percentage of high cloud cirrus detections, as also seen in Figure 1, with a maximum 15 × 15 km median high confidence cirrus percentage of 68% occurring over the Rocky Mountains in Western Colorado.This maximum may be a commission error associated with snow cover and the simplicity of the cirrus detection algorithm, although the occurrence of high cloud cirrus percentages over other snow prone mountainous regions is not apparent.Parts of Salt Lake, Utah and White Sands, New Mexico were detected frequently as high confidence cloud and several of the 15 × 15 km gridcells had median high confidence cloud percentages of 100% (Figure 2).Visual comparison of individual Landsat 8 images and their cloud masks at these locations reveled that this is a cloud detection commission error, likely occurring because the locations are predominantly unvegetated with exposed soil and sand surfaces with similar highly reflective red, green and blue reflectance that are confused with clouds.This is illustrated in Figure 4, which clearly shows the commission error.In addition, there are some commission errors apparent over the highly reflective paved surfaces of the Holloman U.S. airforce base in the South central portion of the image.
Figures 5 and 6 summarize the CONUS annual pixel level cloud analysis showing histograms of the different cloud states and cloud state combinations and are equivalent to histograms of the data illustrated in Figures 2 and 3, respectively, but derived at 30 m resolution.Figure 5 shows histograms of the CONUS high, medium and low confidence cloud (top row) and the high and low confidence cirrus (bottom row) annual detections.Similarly, Figure 6 shows CONUS histograms for the six different possible combinations of coincident cloud and cirrus states.The histograms were computed considering the individual percentages from a total of 11,088,916,742 30 m pixel locations.The histograms are approximately normally distributed and the mean CONUS percentages derived from the histograms are very similar to the weekly CONUS mean values (within 1.5%) that are tabulated in Table 1.The considerable variation in CONUS Landsat 8 cloudiness is evident in the wide interquartile ranges of the high confidence cloud (25th percentile 28%, 75th percentile 46%, mean 35.9%, Figure 5a) and high confidence cirrus (25th percentile 22%, 75th percentile 36%, mean 28.2%, Figure 5d) detections.The incidence of medium confidence cloud detections (Figure 5b) is quite low with a mean of 2.9% and 25th and 75th percentage quartiles of 1% and 5%, respectively.The relative frequency of the cloud detection commission error and the potential cirrus detection error, discussed above with respect to Figures 2 and 3, is evidently negligible at the CONUS level as only a minority of pixels have values close to 100% high cloud confidence (Figure 5a) and close to 68% high cirrus confidence (Figure 5d), respectively.
The histograms of the CONUS percentage values for the six different possible combinations of coincident cloud and cirrus states (Figure 6) have generally narrower distributions than the histograms considering the cloud and cirrus states on their own (Figure 5).In particular, the histograms of the high or low confidence cirrus with medium confidence cloud (Figure 6b,e) are narrow because of the low incidence of medium confidence cloud detections (Figure 5b).The high confidence cirrus and high confidence cloud histogram (Figure 6c) has a mean of 20.1% and 25th and 75th percentiles of 14% and 27%, respectively.Of primary interest is the high confidence cirrus and low confidence cloud histogram (Figure 6a) with a mean of 6.9% and 25th percentile and 75th percentiles of 5% and 10%, respectively, reflecting the likely distribution of undetected cirrus in pre-Landsat 8 CONUS data.

Discussion
This comprehensive analysis considering the first year of available Landsat 8 data documented the typical degree of cloudiness in Landsat 8 CONUS data and revealed the same broad geographic and seasonal patterns of cloudiness as observed by previous Landsat CONUS cloud studies.The availability, for the first time, of a standard per pixel cloud and cirrus mask enabled this analysis to be undertaken at 30 m resolution rather than considering only summary image cloud metadata [10,12].This also enabled quantification of the 30 m coincidence of cirrus and non-cirrus cloud, which has implications for the magnitude of undetected cirrus present in the historical Landsat archive.For the year of CONUS data considered (11,296 scenes; covering more than 11 thousand million 30 m pixels) about 7% of pixels were detected as high confidence cirrus and low confidence cloud.It is unknown if this 7% CONUS value is globally representative, although this is unlikely given the considerable global variations in cirrus state and in coincident high and low cloud occurrence [17,21].Regardless, this finding has implications for processing the Landsat archive as there is not a reliable way to detect the per-pixel cirrus state of pre-Landsat 8 data.In particular, Landsat research concerning per-pixel time series analyses and research to temporally composite the historical Landsat data record or make long-term data records [1] should be cognizant of the likelihood of undetected cirrus contaminated pixels.
This study demonstrates that the at-launch Landsat 8 cloud and cirrus algorithms are broadly working.However, the results also indicate systematic cloud detection commission errors over highly reflective exposed soils/sands in White Sands, New Mexico and Salt Lake, Utah, and it is recommended that caution be taken when using the currently available Landsat 8 cloud data over similar surfaces.In this study it was not possible to study the incidence of cloud detection omission errors.
It is well established that cloud detection algorithms may be unreliable due to the considerable complexity and variability in cloud types and surface backgrounds [11,24,27,32,[34][35][36]. The boundary between defining a pixel as cloudy or clear is sometimes ambiguous, and for example, a pixel may be partly cloudy, or a pixel may appear as cloudy at one wavelength and appear cloud-free at a different wavelength [36].Assessment of the accuracy, i.e., validation, of cloud mask products is difficult because of these issues and because of difficulties in defining unambiguous independent reference data.The Landsat 8 cloud mask products have not yet been validated.However, a pre-launch Landsat 8 cloud mask study [29] validated a number of Landsat cloud classification algorithms using 103 globally distributed Landsat 7 ETM+ cloud mask reference images generated by photointerpretation of the reflective and thermal bands.They found overall cloud classification accuracies of 79.9% for the established ACCA algorithm and 88.5% and 89.7% for improved algorithms that are expected to be comparable to the accuracy provided by the Landsat 8 cloud algorithm [3,29].These accuracies are similar to the object-based F-mask cloud detection algorithm [37] that found classification accuracies ranging from 86% to 100% for 142 of the globally distributed Landsat 7 ETM+ cloud mask reference images.Cirrus clouds are in particular difficult to detect [30][31][32]34] and the introduction of a Landsat 8 1375 μm band that is sensitive to cirrus was specifically included for this purpose [2].The Landsat 8 cirrus cloud detection algorithm has an established provenance but is simple and, for example, does not consider elevation effects or use other spectral bands that may be helpful [30][31][32].In addition, the 1375 μm reflectance threshold may introduce sensitivity to thick upper-level non-cirrus clouds [38].

Conclusions
This study provided an assessment of the standard Landsat 8 Level 1 cloud product and the results underscore the need for per-pixel cloud and cirrus data for improved Landsat 8 terrestrial monitoring.The specific aims of the study were to examine, using the first year of available Landsat 8 conterminous United States (CONUS) data, the incidence of (i) cirrus and also non-cirrus clouds; and (ii) cirrus occurring without non-cirrus clouds.On average over a year of CONUS observations (i) 35.9% were detected with high confidence cloud, with spatio-temporal patterns similar to those observed by previous Landsat 5 and 7 cloud analyses [12]; (ii) 28.2% were high confidence cirrus; (iii) 20.1% were both high confidence cloud and high confidence cirrus; and (iv) 6.9% were detected as high confidence cirrus but low confidence cloud.This implies that about 7% of the pre-Landsat 8 CONUS data may be cirrus contaminated but are undetected as there was no cirrus cloud detection capability on earlier Landsat sensors.Future research to validate and refine the Landsat 8 cloud mask algorithms used to populate the standard Landsat Level 1 product is recommended, including comparisons with cloud products from near-contemporaneous cloud remote sensing systems.

Figure 1 .
Figure 1.Weekly percentage of valid CONUS Landsat 8 pixels detected as high confidence cloud (black diamonds), high confidence cirrus (grey filled circles), high confidence cirrus that coincided with low confidence cloud (grey open circles), for 53 weeks spanning week 18 of 2013 to week 17 of 2014.

Table 1 .
Mean weekly percentages of cloud and cirrus detection states for 53 weeks of Landsat 8 conterminous US (CONUS) data.The "&" denotes coincident detection of two cloud states at a pixel location.The High confidence cloud, High confidence cirrus, and High confidence cirrus & Low confidence cloud values reflect the means of the three data sets illustrated in Figure 1.The percentages for each exhaustive set of cloud states may not sum to 100% as CONUS mean values are tabulated.

Figure 2 .
Figure 2. Percentage of high confidence cloud detections over 53 weeks of Landsat 8 observations derived at each 30 m conterminous US Albers projection pixel location.For visualization purposes the median percentages in 43,593 non-overlapping 15 × 15 km gridcells are shown.

Figure 3 .
Figure 3. Percentage of high confidence cirrus detections over 53 weeks of Landsat 8 observations derived at each 30 m conterminous US Albers projection pixel location.For visualization purposes the median percentages in 43,593 non-overlapping 15 × 15 km gridcells are shown.Results shown using the same color map as Figure 2.

Figure 5 .
Figure 5. (a-c) Histograms of the conterminous US (CONUS) percentage values of the different cloud and (d,e) cirrus annual percentages derived over 53 weeks of Landsat 8 observations at each 30 m conterminous US Albers projection pixel location.Percentages were derived considering 11,088,916,742 30 m CONUS pixel values.Thus, (a,d) reflect histograms of the data illustrated in Figures 2 and 3, respectively, but derived at 30 m resolution.

Figure 6 .
Figure 6.Histograms of the conterminous US (CONUS) percentage values of the (a-f) six different possible combinations of coincident cloud and cirrus states for the annual percentages derived over 53 weeks of Landsat 8 observations at each 30 m conterminous US Albers projection pixel location.Percentages were derived considering 11,088,916,742, 30 m CONUS pixel values.