Analyzing the Probability of Acquiring Cloud-Free Imagery in China with AVHRR Cloud Mask Data

: Optical remote sensing data are used widely in many fields (such as agriculture, resource management and the environment), especially for the vast territory of China; however, the application of these data is usually limited by clouds. Although it is valuable to analyze the probability of acquiring cloud-free imagery (PACI), PACI using different sensors at the pixel level across China has not been reported. In this study, the PACI of China was calculated with daily Advanced Very High Resolution Radiometer (AVHRR) cloud mask data from 1990 to 2019. The results showed that (1) PACI varies dramatically in different regions and months in China. The value was larger in autumn and winter, and the largest figure reached 49.55% in October in Inner Mongolia (NM). In contrast, relatively small values occurred in summer, and the minimum value (5.26%) occurred in June in South China (SC). (2) As the climate changes, the PACI has increased significantly throughout the country and most regions, especially in North China (NC), with a growth rate of 1.9% per decade. The results can be used as a reference for selecting appropriate optical sensors and observation times in areas of interest.


Introduction
Remote sensing technology with a wide monitoring range and strong timeliness has been widely used for investigations, monitoring, analyses and predictions in many fields (such as agriculture, resource management and the environment), especially for China's large landmass [1][2][3]. Among these technologies, the most commonly used data are optical remote sensing imagery [4]. Cloud cover is one of the most serious problems encountered worldwide when using optical remote sensing [5,6], as cloud cover impedes optical instruments from obtaining clear views of the planetary surface [7,8]. Under global climate change, cloud cover changes with changes in other climate parameters (i.e., temperature, precipitation) [9,10]. Besides, fused images from different satellite systems are needed to enable a high spatiotemporal resolution for China, considering its vast territory and complex climate [11]. Therefore, it is necessary to determine the frequencies and spatiotemporal characteristics of cloud cover and their impacts on the acquisition of cloud-free imagery from different Earth observation systems in China. These factors are crucial for choosing the appropriate remote sensing imagery in many studies (i.e., research on the crop growing season in agriculture, the monitoring of resources and the environment) because we need to select optical remote sensing imagery with as little cloud cover as possible in a period of interest for a given study area [12].
The frequencies and variations in cloud-free imagery have been studied around the world [13][14][15]. Two types of methods have been used in these studies: (1) Cloud cover information from scene-based metadata of imagery has been used to calculate spatiotemporal patterns of the probability of acquiring cloud-free imagery (PACI). This method can calculate the PACI under different amounts of cloud cover, but the results are too vague to analyze due to the spatial patterns of clouds within a scene [16][17][18]. (2) The PACI has been calculated using cloud mask data at the pixel level [11,15]. It is possible to clarify the location of the cloud in the scene. Most studies used specific sensors, and these results lack universality and cannot be applied to optical sensors with different revisit intervals (for example, Landsat or MODIS) [19,20]. When the data come from satellite sensors with a low revisit frequency, the cloud cover data lack temporal coverage (such as Landsat) [6,13]. Furthermore, the length of time for the studies was limited by the remote sensing data, leading to a degree of failure in analyzing the changes in cloud cover at the climate scale (such as MODIS) [15,21]. Therefore, we selected the Advanced Very High Resolution Radiometer (AVHRR) with a high revisit frequency and long-term observation.
AVHRR is a sensor carried on the National Oceanic and Atmospheric Administration's (NOAA's) Polar Orbiting Environmental Satellites (POES), beginning with Television and Infra-Red Observation Satellite-N (TIROS-N) in 1978 [22]. AVHRR provides global imagery every day, permitting cloud cover monitoring around the globe on a daily basis [23]. Therefore, it provides an opportunity to explore the spatiotemporal characteristics of cloud cover and estimate the possibility of obtaining cloud-free images from sensors with different revisit frequencies.
In China, the spatial patterns in the PACI have been analyzed at the scene level using Landsat Operational Land Imager (OLI) images acquired from April 2013 to October 2016 [19]. However, what is the PACI with different revisit frequencies at the pixel level? How does the PACI in China change over time under global climate change? This study attempts to answer these questions. Based on AVHRR daily cloud mask data, the spatio-temporal characteristics of PACI are analyzed, including annual and monthly variations characteristics on national and regional scales, and the number of days required to acquire cloud-free imagery monthly with different revisit frequencies is calculated. The results can be used to estimate the applicability and limitations of applying optical remote sensing images in China.

Study Area
China has a vast territory with a unique geographical location and complex landforms. Because of its geography and topography, China's climate is complicated and diversified from region to region [24] (Figure 1). China has four distinct seasons, namely spring (March-May), summer (June-August), autumn (September-November), and winter (December-February of the following year) [19]. In the winter, the climate is mainly cold and dry, while during the summer, precipitation generally increases from northwest to southeast [25]. Eastern China has a hot and humid climate [26]. Cloud cover is related to climate; the rainy season has high cloud cover. Cloud-free imagery is easily obtained in the cold-dry season.

AVHRR Cloud Mask
The daily cloud mask data used in this study were acquired from NOAA's National Climatic Data Center (http://www.ncdc.noaa.gov) and were generated using the cloud flag in the Quality Assessment (QA) layer of AVH09C1. The QA layer has a 15-bit binary value. The second bit provides information of whether the data represent either cloudy (1) or clear (0) conditions. The data are gridded at a resolution of 5 km. The analysis period spans 30 years from January 1990 to December 2019. The sensor constantly collected data during this period.

Data Processing
To calculate the PACI in China, the daily cloud mask data from AVHRR were statistically analyzed. The monthly PACI from 1990-2019 at the pixel level was calculated as shown below. In total, 360 images were collected for the monthly PACI.
where the monthly PACI is the percentage of clear sky occurring each month. The total days of clear sky are the number of "clear (0)" days each month. The total observation days are summarized by accumulating the observed days within a month. The abnormal images in the 360 images for the monthly PACI were then eliminated based on the Pauta criterion using their national average. The basic concept of the Pauta criterion is to set a confidence interval to detect the outliers [27], which can be described as follows: where and are the mean and standard deviation of the sample data, respectively. When the measured data are within the range of ( − 3 , + 3 ), 1% of the data are outliers and are therefore not desirable, so these data should be removed. In this study, we set confidence intervals within ( − 3 , + 3 ) to filter out the errors caused by the sensor measurement.
According to the Pauta criterion, we successfully eliminated 11 abnormal images out of 360 monthly PACI images, including September to December 1994, May to July 1996, January 2014 and October to December 2019. Based on the eligible data, the PACI was analyzed in China.

Statistics for the PACI in China
The PACI and monthly PACI were calculated in China using the eligible data by applying the method of the total average and monthly average, respectively. Based on the monthly PACI, the regional average was calculated to analyze the characteristics of the monthly PACI. China was separated by the Meteorological Geographical Division in China acquired from the China Meteorological Administration (CMA). The Meteorological Geographical Division in China was established based on climate indicators, including 11 regions ( Figure 1).
To show the annual variation characteristics of PACI, the annual PACI and monthly interannual standard deviation were calculated by the annual average and monthly standard deviation of the eligible monthly PACI from 1990-2019, respectively. Subsequently, using the national and regional average of the annual PACI, the annual time series of the PACI was calculated. Furthermore, the tendency rates of the annual PACI were computed using a linear trend by least squares for the study period [26]. The linear correlation coefficient (R) was tested for significance to characterize the significance of the trend for the time series due to its extensive application [28].

Calculating the Availability of Acquiring Cloud-Free Imagery by Sensors with Various Revisit Intervals
For n-day revisit sensors, the days on which cloud-free imagery could be acquired per month are the sum of cloud-free imagery in a month between n days. As the sensor's first visit can be any one of the first n days per month (for a sensor with an n-day revisit cycle), there are n possible cases. As all n cases are possible, the days of cloud-free images are estimated as the average of the n cases in a month. Theoretically, n can be the longest revisit frequency among all optical sensors; in practice, we investigated an n value of up to 16, according to optical Earth observation systems [15]. Therefore, the number of days on which it is possible to acquire cloud-free imagery monthly with various revisit intervals is a simple function related to the revisit interval and the monthly PACI for a 1-day revisiting sensor as the basis. This function can be calculated as follows: where D is the number of days in the month corresponding to the monthly PACI. In this study, February was set to 28.25 days. According to the algorithm, we calculated the curve of the number of days to acquire cloud-free imagery monthly with different revisit frequencies in the months with the highest and lowest PACI throughout the country and regions.

Validation of the Results
Because it is not possible to obtain cloud cover with the same spatial resolution, validating the results is difficult. Therefore, we validated the effectiveness of the results by comparing the PACI with the meteorological data sets. We obtained the monthly average total cloud cover of 848 weather stations in China of 2019 from the monthly meteorological dataset of basic meteorological elements of China National Surface Weather Stations. The annual average total cloud cover in China was calculated by averaging the monthly average total cloud cover of 2019. The Kriging interpolation method was used to interpolate point values onto a raster with the same spatial resolution as the PACI. Kriging is a powerful type of spatial interpolation that uses the spatial correlation between sampled points to interpolate the values in the spatial field. Then, we obtained the PACI of the weather station's location and created a linear regression model for the PACI and the annual average total cloud cover.

The PACI in China
The monthly PACI images showed that the PACI gradually increased from southeast to northwest in China ( Figure 2). The lowest PACI was in the Sichuan Basin, and the value was extremely low throughout the year. In contrast, the highest PACI occurred in the Taklimakan Desert. The PACI in most of China was below 80%. The lowest PACI (15.61%) was in summer in China, while the highest PACI (22.19%) occurred in autumn. The overall national performance was best in November. Therefore, if remote sensing optical images are used to obtain information at the national scale, the best results will be achieved in November. The difference in the PACI was the most obvious in February, which is the best time to use the optical remote sensing image for the east of Inner Mongolia (NM) and southeast of northeastern China (NEC). There were noticeable regional differences in the monthly variations in the PACI (Figure 3). North China (NC), the Huang-Huai region (HHR), the Jianghan region (JhR), the Yangtze-Huaihe region (YHR), the Jiangnan region (JnR) and south China (SC) are located on the eastern coast of China (Figure 1). Their monthly variations of the PACI were similar. The PACI gradually decreased from winter to summer and gradually increased after autumn. From north to south, the PACI and the variation of the PACI gradually reduced. The highest PACI was 38.85% in February for NC, and the lowest was 16.71% in July, with a difference of roughly 20%. The highest PACI was 17.54% in SC, and the lowest was 5.26%, with a difference of approximately 12%. The monthly variation in southwest China (SWC) was smoother than that for the eastern coast. The monthly variations of the PACI in northwest China (NWC), NM and northeast China (NEC) were similar. The higher PACIs were in April, May and October, and the lower PACIs occurred in January, July and December. The variation in Tibet (XZ) differed from that in other areas. The highest PACI for XZ was 28.57% in November. In China, the highest PACI was 49.55% in October for NM, while the lowest was 4.65% in July for SWC.

Interannual Variation of PACI
In this study, the interannual variability was represented by the standard deviation. As evident from the standard deviations of the PACI, the uncertainty varied among months and regions (Figure 4). A lower standard deviation for the PACI occurred in spring and summer, while a higher standard deviation occurred in autumn and winter. In general, the standard deviation was the largest in December. The national average was 13.93%. The linear relationship between the year and the PACI confirmed the general growing trends for the PACI in China ( Figure 5). The national average annual PACI increased by 1% per decade, and the increase was significant. In regions other than HHR and NEC, a significant increase was recorded. The maximum growth rate was 1.9% per decade in NC, which was approximately twice the national rate. In NEC and HHR, slight downward trends were found, but the trends were not statistically significant. Notably, the PACI for XZ was extremely low in 1997. Furthermore, the increasing trend was positively correlated with the correlation coefficient (R) at the regional scale. Figure 5. Annual time series for the probability of acquiring cloud-free imagery throughout the country and regions from 1990 to 2019. *** means that R is significant when the significance level α is set to 0.01; ** means that R is significant when the significance level α is set to 0.05; * means that R is significant when the significance level α is set to 0.1.

The Number of Days to Acquire Cloud-Free Imagery Monthly with Different Revisit Frequencies
Using the algorithm described in Section 2.4.2, the number of days to acquire cloudfree imagery monthly was simulated at national and regional scales with a revisit interval ranging from 1 to 16 days, which represents the revisit capability of the most popular optical observation sensors currently available ( Figure 6). The simulation revealed some general patterns in the availability of acquiring cloud-free imagery with different revisit intervals in regions: the number of cloud-free images decreased and the rate of decline decreased as the revisit interval increased.
For the entire country, in the month with the most availability (October), the 1-day revisit resulted in approximately nine cloud-free images each month, and when the revisit interval increased to 9 days, the number of cloud-free images dropped to about one. However, the best performance was about five or six cloud-free images per month in January (the least availability), and the number gradually approached one as the revisit interval increased to 5 days or more. These results also demonstrated that it was possible to acquire cloud-free imagery as the revisit interval increased to 10 days or even 15 days per month, with the largest opportunities in NEC, NC, NWC, NM and XZ. However, when the revisit interval increased to 3 days, there was no way to obtain cloud-free imagery in a month for the cloudy regions with the least availability (HHR, JhR, YHR, JnR, SC and SWC).

Validation
The spatial patterns of the annual average total cloud cover were opposite to those of the PACI (Figure 7). The northwest had a lower annual average total cloud cover and higher PACI than those of the southeast. The higher annual average total cloud cover rates were in JnR, SC and SWC, which is consistent with the distribution of the lower PACIs. The highest PACI and the lowest annual average total cloud cover occurred in NM. The relationship between the PACI and the annual average total cloud cover was very close (Figure 8). Obviously, there was a good correlation between the two data sets, with a strong and significant negative correlation (R = 0.65, α < 0.01). The greater the PACI, the smaller the annual average total cloud cover became. Therefore, the results we calculated were reasonable. Figure 8. The relationship between the probability of acquiring cloud-free imagery with the annual average total cloud cover. *** means that R is significant when the significance level α was set to 0.01.

The PACI in China
The PACI is closely related to cloud cover. As an essential meteorological factor, cloud cover, in a similar manner to temperature and precipitation, is closely correlated to the regional climate. Therefore, the spatiotemporal characteristics of the PACI vary with the regional climate in China. Due to the monsoon's influence, the cloud cover increased gradually from the eastern coast to the northwestern inland area. The results are similar to those discussed in previous studies [29,30]. There is a high degree of rainfall and low PACI in summer, while there is little rainfall and high PACI in autumn and winter [31].
Cloud formation always requires adequate water vapor in the atmosphere. The eastern coast of China is impacted by monsoons [32]. Because of the significant difference in temperature between the ocean and land, summer monsoons that blow from the ocean toward the land bring moist air inland, which is beneficial for cloud formation [32]. In contrast, winter monsoons blow from the land toward the sea, bringing drought. Therefore, for the eastern coast of China (NC, HHR, JhR, YHR, JnR and SC), the PACI was larger in winter and smaller in summer. Because the monsoon's influence decreases from south to north, the PACI gradually increased [32]. Notably, the monthly variability of the PACI is not significant in JnR or SC, which may be affected by large-scale circulation [33].
The Sichuan Basin is the intersection of the southwesterly airflow and the northwesterly airflow. A southwest vortex is prone to form here, making it easier for clouds to develop [34]. Moreover, because of the many rivers in the Sichuan Basin, water vapor is sufficient. The topography of the basin is closed, and water vapor is not easily diffused. Therefore, the PACI was very low throughout the year, and the monthly variance in SWC reflected the variance in the Yunnan Province. Because of the Asian monsoon [35], most of the cloud cover for Yunnan occurs in summer.
The high mountains block atmospheric circulation, and the long distance to the surrounding oceans causes a dry climate in inland China [36], so insufficient water vapor is the crucial factor leading to the lowest cloud cover in NWC and NM [37]. Part of NM is influenced by summer monsoons, resulting in a smaller PACI in summer. NEC is located in the monsoon area, where the PACI is smaller in summer. In theory, due to climate change, there should be fewer clouds in winter in NWC and NM. However, our results showed the opposite to be the case, which may be due to the area's topography [38]. This difference may also be caused by the recognition algorithm [39]. Because snow and clouds are difficult to separate, the recognition algorithm may present certain errors in winter.
When the Indian monsoon impacts the Tibetan Plateau, the relative humidity increases rapidly, as does the cloud coverage [33]. However, when the westerlies impact the plateau, the air is dry and the cloud coverage is low [33]. Therefore, the highest PACI of XZ occurred in winter.

Annual Variation of PACI
The standard deviation is a measure of the amount of variation or dispersion for a set of values. Therefore, the monthly interannual standard deviation can represent the interannual variability of the PACI. Based on the results, the PACI was more stable in summer and it was more likely to change in winter between different years.
Many studies have shown that the climate of China has changed significantly [40,41]. A negative trend for the time series of the total annual average cloud cover was derived from the station dataset [30,42], especially in north China [24]. In this context, a significant increase of the PACI throughout the country and most regions was found, and the largest increase occurred in NC ( Figure 5).
The increase of the PACI in the NEC is related to global warming [43]. Over the last few decades, the land and ocean have experienced continuous and extensive warming. North America, Europe and the middle and high latitudes of Asia had the most significant temperature increases [43]. The temperature increase in the middle and high latitudes was greater than the equator and low latitudes. Therefore, the meridional temperature gradient was reduced, which weakened the strength of the meridional circulation. As a result, the cloud cover in the updraft area decreased, and the cloud cover in the sinking area increased [43].
Under the influence of El Niño, a strong warm event occurred over the South China Sea in 1997, which led to a stronger summer monsoon and weaker winter monsoon [44]. The Qinghai-Tibet Plateau is one of the most sensitive regions to global climate change [45], and so the lowest PACI was found in XZ in 1997.

The Number of Days to Acquire Cloud-Free Imagery Per Month with Different Revisit Frequencies
When determining the PACI for specific months, monthly days of acquisition are an inverse function of the sensor's revisit interval. Therefore, as the sensor revisit interval increased, the number of days to acquire cloud-free imagery per month decreased and the decline rate decreased. Currently, many sensors observe land cover from space, which has advantages and disadvantages [21]. The number of days to acquire cloud-free imagery per month with different revisit frequencies is valuable for choosing the appropriate optical remote sensor in the study area and estimating the availability of composite images [15].
At the national level, to ensure that there is at least one cloud-free situation for each pixel within a month, the revisit frequency of the selected sensor must be less than 9 or even less than 5 days. The lowest opportunity to acquire cloud-free imagery occurs in January, when five or six daily images can be composited to one cloud-free image. The revisit frequency of Landsat is 16 days, which is suitable for regions with the highest PACI (NEC, NC, NWC, NM and XZ) for acquiring cloud-free imagery. It is difficult to obtain cloudfree images in cloudy regions (HHR, JhR, YHR, JnR, SC and SWC). In the month with the lowest PACI, it is possible to acquire no more than two cloud-free images in a month. Therefore, the availability of 16-day composite imagery is limited. If remote sensing is used for research, it is necessary to increase the time of acquisition or combine data from multiple sensors, or even replace the optical remote sensor with a synthetic aperture radar (SAR).

Conclusions
In this study, the PACI in China was calculated based on the daily AVHRR mask data. The results are crucial for choosing the appropriate study area, observation time and optical sensor. The analysis suggests the following main conclusions.
(1) The PACI varied among months and regions in China, which was consistent with the spatiotemporal characteristics of the regional climate. The overall national performance of the PACI was best in November. For all of China, the PACI gradually increased from southeast to northwest. (2) As the climate changes, the PACI in China changes. The PACI increased significantly throughout the nation and in most regions. The highest increase in all regions throughout the entire multiannual period occurred in NC. (3) As the sensor revisit interval increased, the number of days to acquire cloud-free imagery monthly decreased and the decline rate decreased. The results are valuable for choosing the appropriate revisit frequency of optical remote sensors and estimating the availability of composite images.
Notably, the rules we set were strict regarding the calculation of the PACI. If the threshold of cloud coverage is changed, the PACI will increase. Therefore, when using the results of this study, it is necessary to pay attention to the threshold of cloud coverage. However, irrespective of the cloud coverage threshold, the patterns of the PACI in China are consistent with the results of this study. In the future, the PACI based on different cloud fractions can be calculated to obtain more detailed results.