Evaluation of Cloud Mask and Cloud Top Height from Fengyun-4A with MODIS Cloud Retrievals over the Tibetan Plateau

: The Tibetan Plateau (TP) has profound thermal and dynamic inﬂuences on the atmospheric circulation, energy, and water cycles of the climate system, which make the clouds over the TP the forefront of atmospheric and climate science. However, the highest altitude and most complex terrain of the TP make the retrieval of cloud properties challenging. In order to understand the performance and limitations of cloud retrievals over the TP derived from the state-of-the-art Advanced Geosynchronous Radiation Imager (AGRI) onboard the new generation of Chinese Geostationary (GEO) meteorological satellites Fengyun-4 (FY-4), a three-month comparison was conducted between FY-4A/AGRI and the Moderate Resolution Imaging Spectroradiometer (MODIS) for both cloud detection and cloud top height (CTH) pixel-level retrievals. For cloud detection, the AGRI and MODIS cloud mask retrievals showed a fractional agreement of 0.93 for cloudy conditions and 0.73 for clear scenes. AGRI tended to miss lower CTH clouds due to the lack of thermal contrast between the clouds and the surface of the TP. For cloud top height retrievals, the comparison showed that on average, AGRI underestimated the CTH relative to MODIS by 1.366 ± 2.235 km, and their differences presented a trend of increasing with height. Author Contributions: Conceptualization, W.X. and D.L.; Methodology, W.X.; Software, W.X.; Validation, W.X.; Formal analysis, W.X.; Investigation, W.X.; Resources, D.L.; Data curation, W.X.; Writing—original draft preparation, W.X.; Writing—review and editing, W.X.; Visualization, W.X.; Supervision, D.L.; Project administration, D.L.; D.L.


Introduction
Clouds cover about 60~70% of the globe [1,2] and exert an enormous influence on weather and climate [3]. They are the prerequisite for precipitation, which are the intermediate stage between water vapor and precipitation. All moist processes in the atmosphere ultimately involve clouds, which play an important role in Earth's water cycle [4,5]. In addition, because clouds interact so strongly with sunlight and infrared radiation, they will bring significant climate feedback. As a result, clouds play an extremely critical role in regulating the radiation budget of the Earth-atmosphere system. It can be said that clouds dominate the energy budget of the Earth. [3]. Although clouds play a fundamental role in our weather and climate, there is much that we do not know. An inadequate way of representing cloud processes has long been identified as a main source of uncertainty in climate projections [6]. To predict changes in weather and climate brings the subject of clouds back to the forefront of atmospheric science and climate science [5].
The Tibetan Plateau (TP) exerts profound impacts on the continental to global-scale climate [7][8][9], and therefore, the bulk characteristics of clouds over the TP are important for studying cloud-climate feedback and their impacts on climate change and its dynamics [10,11]. As the highest and largest plateau on Earth with a mean altitude of more than 4000 m, the TP has profound thermal and dynamic influences on the atmospheric circulation, energy, and water cycles of the climate system. It is well known to regulate the Asian monsoon and the middle Asia dry climate by acting as a vast elevated heat source forecasting, and NWP services [30], these science products generated by FYGAT are also used in the research of cloud climatology and cloud and precipitation microphysics [33,34].
Both weather forecast and climatology research require an understanding of the uncertainties of cloud records [35]. In order to recognize the uncertainties and limitations of the AHI, AGRI, and their cloud products based on the FYGAT algorithm in different geographical contexts, comparison and verification studies are needed. Some quantitative evaluations have been conducted. Wang et al. [31] employed MODIS as the benchmark for AGRI and AHI cloud mask validation during the 21-day period. Their results suggest that they had high consistency, and cloudy scene identification was better than that observed for clear skies for both AGRI and AHI. Lai et al. [36] compared the cloud properties from AGRI with MODIS cloud retrievals from four months. Their comparison indicates that the cloud mask and cloud phase of these two instruments were consistent, while clear differences were noticed for cloud optical thickness and cloud effective radius results. Min et al. [37] used CALIPSO to verify the AHI cloud top height for one year. Their results indicated that samples with cloud top height higher than 12.0 km were significantly underestimated, while the average underestimation of high layer clouds was about 5 km. Wang et al. [38] compared the AGRI cloud fraction and cloud top pressure products with MODIS over East Asia for a month. The results showed that the two products from AGRI and MODIS generally agreed well. From these verifications, it can be seen that cloud products based on the FYGAT algorithm generally had good performance, and were consistent with the cloud products retrieved from MODIS and CALIPSO.
Although these comparisons have provided important overall insight, they have been limited by the lack of specific evaluation of the cloud products' performance in the TP region. Holz et al. [35] pointed out that the difficulty in characterizing cloud observation uncertainties is compounded by their strong regional dependence. The Tibetan Plateau has the highest altitude and most complex terrain in the world [18], which will make cloud retrieval more challenging, resulting in the overall evaluation results of these cloud products not applicable to TP regions.
In addition, considering the influence of observation geometry, FY-4/AGRI, located at 104.7 • E, with the capability of providing a complete image of the TP, has an advantage over Himawari/AHI, located at 140.7 • E, in observing the TP and clouds above it. The Himawari/AHI viewing zenith angle (VZA) of the pixels on the eastern flank of the TP reaches over 50 • , and a large area in the western part of the TP is beyond its scanning range. Consequently, the cloud properties obtained from FY-4A/AGRI are very valuable for the observation and study of clouds in this area. So far, the performance of pixel-level cloud properties derived from FY-4A/AGRI over the TP has not been evaluated.
Therefore, it is necessary to verify and understand the performance and limitations of AGRI cloud products over the TP based on the FYGAT algorithm to ensure the correct interpretation of the existing and future cloud datasets obtained from the measurements of the FY-4 series GEO satellite. In this context, we investigated the performance of the AGRI cloud mask and cloud-top height over the TP by comparing it with the coincident cloud retrievals of MODIS. In this paper, an accurate and computationally efficient collocation process was developed to facilitate direct comparisons of the AGRI and MODIS level-2 cloud retrievals. We also discuss the impacts of the TP's complex topography on the comparison results.
The rest of this paper is organized as follows. A description of the AGRI and MODIS cloud retrievals is provided in Section 2. The collocation methodology and algorithm developed for comparison is described in Section 3, followed by a comparison method. The results of three months of the TP collocated AGRI and MODIS quantitative comparisons of cloud mask and cloud top height are presented in Section 4. A discussion about the cloud retrievals' performance distribution and the impacts of the TP's complex topography on the comparison results is given in Section 5. Conclusions are then presented in Section 6.

Moderate Resolution Imaging (MODIS) and Advanced Geosynchronous Radiation Imager (AGRI) Measurements
The primary objective of this study was to assess the performances of cloud retrieval over the TP region from AGRI flown on the FY-4A platform. We compared three months (from 1 June to 31 August 2020) of collocated AGRI and MODIS pixel-level cloud mask (CM) and cloud top height (CTH) products over the area, 25 • N~45 • N, 70 • E~105 • E, following the definition of the TP region by Sato et al. [39]. Due to significant differences in the spatial and temporal sampling between AGRI and MODIS, it is necessary to collocate cloud products from these two instruments. Therefore, a collocation algorithm was designed to be computationally efficient and accurate, allowing for rapid identification of the coincident MODIS and AGRI observations. Cloud mask and cloud top height are crucially important and fundamental cloud products. Identifying cloud pixels from background environment, known as cloud detection or cloud mask, is fundamental and critical for optical remote sensing and for subsequent accurate image analysis [31,40,41]. Generally, the cloud mask algorithm lies at the top of the data processing chain and must be versatile enough to satisfy the needs of many applications [42]. Regardless of how sophisticated the new remote sensors and their applications become, cloud mask is usually the first processing step required [43].
Cloud top height is made available to all subsequent algorithms that require knowledge of the vertical extent of the clouds. It plays a critical role in determining the cloud cover, cloud layers [44], cloud classification [4,19,45], and cloud thermodynamic phase determination [46]. The broadband radiative impact of clouds is largely determined by their height [39]. In addition, as a critical parameter, CTH products are assimilated into numerical weather prediction (NWP) models [47,48].
First, AGRI and MODIS instruments and their respective level-2 CM and CTH retrieval algorithms are presented.

Moderate Resolution Imaging (MODIS)
Since the National Aeronautics and Space Administration (NASA) launched the "Terra" satellite in 1999 and the "Aqua" satellite in 2002, MODIS has become a keystone instrument and one of the most widely used satellite remote sensing platforms in the Earth Observing System (EOS) [46,49]. MODIS measures reflected solar and emitted thermal radiation in 36 spectral channels ranging from the visible (VIS) to the infrared (IR), with a spatial resolution of 250 m (two bands), 500 m (five bands), and 1000 m (29 bands). These channels, essentially covering all of the key atmospheric bands located between 0.415 and 14.235 µm, have been carefully selected to enable advanced studies of land, ocean, and atmospheric features, which provides unique spectral and spatial capabilities for retrieving cloud properties [46,50]. MODIS cloud properties' retrieval algorithms have evolved over the past two decades [46,49,[51][52][53][54]. MODIS cloud products have been validated by comparing them with active remote sensor observations and radiance simulations [35,41,53,55], and are widely used in studies of cloud macro-microphysical and optical properties [5,10,20,[56][57][58][59][60]. In addition, MODIS cloud products are also used as benchmarks or truth values for the verification and evaluation of satellite cloud products and retrieval algorithms [29,31,36,61,62].
The newest MODIS reprocessing dataset of the products, Collection 6 (C6), was completed in 2014 and 2015 for Aqua and Terra, respectively. The C6 update activity has benefited greatly from being able to compare MODIS radiance data and derived products with various sensors that compose the A-Train [54]. MODIS pixel-level (level-2) cloud products are generated on a granule basis, known as MOD06 and MYD06 for Terra and Aqua, respectively. A granule consists of 5 min of data. MOD06 and MYD06 in C6 provide the cloud parameters at a 1 km spatial resolution.
When we were working on this study, the MYD06 data (from June to August 2020) downloaded from the official MODIS website were incomplete, that is, there is a large amount of MYD06 data that are missing during that time period. In addition, to avoid the uncertainty of comparison due to the differences between Terra and Aqua in this study, only cloud products of MOD06 in Collection 6 were used as the benchmark to compare and validate the level-2 cloud products of FY-4A/AGRI based on the FYGAT algorithm.
Cloud mask is based on the contrast between the cloud and background environment. Contrast can be defined by differing signals for individual spectral bands, spectral combinations, or temporal and spatial variations of these [49]. The MODIS cloud mask algorithm employs up to 19 spectral bands to maximize reliable cloud detection [35]. In fact, the Collection 6 cloud mask algorithm uses as many as 22 bands [42]. After a series of sequential tests on the passive reflected solar and infrared observations, a degree of clear-sky confidence is assigned to the result to assess the likelihood that clouds obstruct a given pixel. Then, final cloud mask confidence (Q) is based on three thresholds (i.e., 0.66, 0.95, and 0.99). Test results can classify cloud contamination in every pixel of data as either confident clear (Q > 0.99), probably clear (Q > 0.95), probably cloudy (Q > 0.66), or cloudy (Q ≤ 0.66) [49]. Based on the comparison with three years of coincident observations, an assessment of the performance of MODIS cloud mask algorithm showed that the MODIS algorithm agrees with the combined radar and lidar from ground about 85% of the time [41]. Globally, the MODIS 1-km cloud mask and the CALIOP 1-km averaged layer product agreement is 87% for cloudy conditions for both August and February; for clear-sky conditions, the agreement is 85% (86%) for August (February) [35].
MODIS cloud top pressure (CTP) is determined using five thermal infrared bands (both day and night). The 11 µm infrared window brightness temperature data are used to estimate low clouds. Mid-to high-level clouds are derived from radiance ratios of bands located within the broad 15 µm CO 2 absorption region by applying the CO 2 slicing technique [53]. The theoretical basis of the CO 2 slicing technique is that as the wavelength increases from 13.3 to 15 µm, the atmosphere becomes more opaque due to the absorption of CO 2 , resulting in the radiation obtained from these spectral bands being sensitive to different layers in the atmosphere. CTP retrieval is derived from ratios of differences in radiances between cloudy and clear-sky regions at two nearby wave lengths [63]. Once the CTP is derived, it is converted to cloud-top height and temperature through gridded temperature profiles provided by the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS). Differences between model-derived and measured clear-sky radiances are mitigated with a radiance bias adjustment to select the optimal CTP. [53]. Globally, when compared to the CALIOP 1-km products, MODIS underestimates the CTH relative to CALIOP by 1.4 ± 2.9 km; when only high clouds above 5 km are considered, the differences are found to be greater than 4 km. Some differences in CTH between MODIS and CALIOP are expected since MODIS sees into the cloud to an optical thickness of approximately 1, while CALIOP senses the cloud top [35].

Advanced Geosynchronous Radiation Imager (AGRI)
AGRI is one of the most important payloads on the FY-4A GEO meteorological satellite platform. It measures radiances at 14 spectral bands covering the range of wavelengths from 0.47 to 13.5µm. These spectral band locations are very similar to those planned for other greatly improved imagers on board the new generation of GEO meteorological satellites such as AHI on Himawari-8 [29], ABI on GOES-R [28], and the European MTG Flexible Combined Imager (FCI). The spatial resolution of AGRI at the sub-satellite point is 1 km, 2 km, and 4 km in the visible (VIS), near infrared (NIR), and other IR spectral bands, respectively [32]. FY-4A offers full-disk Earth-view coverage every 15 min with the option for more rapid regional and mesoscale observation modes. The information from AGRI will be used for many applications related to cloud and aerosol properties, severe weather, tropical cyclones and typhoons, aviation, natural hazards, land and ocean surfaces, etc. AGRI will offer the same products and information for applications over China and its adjacent regions [30,32]. Enhanced calibration accuracy of FY-4A is better than 1 K for the thermal infrared bands and 5% for the reflective solar bands, which can greatly benefit quantitative applications [32].
AGRI scientific products are generated by the product generation system (PGS) of NSMC/CMA based on FYGAT algorithms. FYGAT algorithms are partially inherited from the FY-2/VISSR algorithm and partially referenced from the GOES-R/ABI algorithm [30]. Operational cloud mask algorithm for AGRI also classifies each pixel as being either clear, probably clear, probably cloudy, or cloudy. This cloud mask algorithm combines 13 spectral and spatial uniformity tests, and two restore tests to produce a cloud mask product for every pixel of data. The restore tests refer to tests that "restore" probably cloudy pixels to clear pixels and "restore" cloudy pixels to probably cloudy pixels. The purpose of restore tests is to provide a conservative estimate on cloudiness. Specifically, the spectral tests include six infrared independent tests, three solar reflectance tests, two shortwave infrared tests, a reflectance uniformity test, and a thermal uniformity test. These tests have different thresholds over land, ocean, and snow/ice. Some of these thresholds are derived from the statistical analysis from the comparisons with collocated CALIPSO products, and some from simulated data [31].
The cloud top pressure retrieval algorithm for AGRI combines two IR window band (11 and 12 µm) observations with a single CO 2 absorption band (13.5 µm) observation to estimate cloud height without large assumptions on cloud microphysics. This is the wellknown CO 2 /Split-Window algorithm, originally designed for ABI flown on the GOES-R series [44,47]. The split-window approach is based on two window band (11 and 12 µm) observations. It can provide accurate measurements of cloud emissivity and its spectral variation, but lacks sensitivity to cloud height for optically thin cirrus [47,64]. However, the CO 2 /Split-Window method benefits from combining the sensitivity to cloud height offered by a CO 2 channel with the sensitivity cloud microphysics offered by IR window channels [44]. First, the retrieval methodology employed the optimal estimation approach (also referred to as a 1DVAR approach) to retrieve cloud top temperature product at the pixel level; once the cloud top temperature is computed, the Numerical Weather Prediction temperature profiles are used to interpolate the values of cloud top pressure, and cloud top height. For pixels determined to be multi-layer clouds, information about the height of the surrounding low clouds is used to estimate the height of low clouds underneath higher clouds [37]. Table 1 listed the key bands used for cloud mask and cloud top height in MODIS and AGRI.

Collocation and Evaluation Methodology
The focus of this paper was to quantify the performance of the cloud products over the TP from the recently launched AGRI onboard FY-4A by a comparison with the coincident MODIS cloud retrievals. However, there are not only significant differences in the temporal sampling between AGRI and MODIS, but also spatial sampling differences between these two instruments. First, in order to avoid uncertainties in their comparison caused by the changes of cloud properties due to different observation time, only cloud retrievals of the exactly the same start scan time of these two systems were selected for comparison. Then, the spatial collocation rule was formulated as follows. For AGRI, the cloud mask and cloud top height are generated at a 4-km resolution. For MODIS, CM, and CTH products are retrieved at 1-km resolution. Therefore, multiple MODIS ground-projected instantaneous field-of-view (GIFOV) pixels will be collocated for a single AGRI retrieval, that is, the AGRI 4-km GIFOV is the "reference" with multiple MODIS measurements within the AGRI GIFOV. To find the MODIS pixels within the AGRI GIFOV means that the distance between AGRI pixel and MODIS pixel is required ≤2 km (i.e., half of AGRI resolution). If it is based on a conventional method, the distance between pixels of these two sensors has to be calculated individually. The total computation time complexity will be O(N 2 ), which requires a huge computational resource. Assuming that AGRI and MODIS each have N pixels at a certain observation time over a certain area, for any single AGRI pixel, to find those points within 2 km from it among the N MODIS pixels, N distance calculations must be carried out. As ARGI has N pixels, the total number of computation is N × N. In computer science, the time complexity of this calculation is defined as O(N 2 ).
In order to solve this massive computation problem as well as minimize the uncertainties resulting from the spatial sampling differences, a collocation algorithm-"multiple grid matching algorithm"-was developed in this paper to facilitate comparisons of the MODIS and AGRI cloud retrievals. This multiple grid matching algorithm was designed to be computationally efficient and accurate, allowing for rapid identification of the coincident MODIS and AGRI observations. The grid can be implemented by hash table keywords in computer science. The procedure is described in detail as follows.
First, pixels of AGRI and MODIS are subsampled and placed onto an equal-angle grid over the TP. The distance between AGRI pixel and MODIS pixel in the same grid cell should be within 2 km as much as possible. Then, AGRI and MODIS pixels are scanned separately and all pixels fall into each grid cell. Those pixels within the same grid box are exactly the required matches. However, it is possible that the nearest pixel is in the adjacent grid box. The next step is adjusting the size of the grid or shifting the coordinates of the entire grid appropriately, and scanning the AGRI and MODIS samples respectively in another iteration. The target pixels (distance within 2 km) and the nearest pixel can easily fall into a new grid cell. Figure 1 outlines this process. There is no doubt that multiple grid matching can significantly improve the matching accuracy. The side effect of multiple grid matching is the possibility of duplicate matches. Therefore, a step to remove duplicates is also required. Finally, it must be verified that the distance between these matching pixels are indeed less than 2 km. This multiple grid matching algorithm can reduce the computation time complexity to O(N). Thus, this approach can be affordable with limited computational resources. However another MODIS pixel (the red dot), which is closer to the AGRI pixel than the green one, is missed by this large red grid. Then, by using the grid of small size (black grid), this MODIS pixel, which was missed last time, and the AGRI pixel can fall into a new box, the black box, and they are collocated with each other.
In addition, the MODIS scanner may view a particular MODIS FOV at an angle that departs significantly from the nadir, which may lead to increased uncertainty of the comparison. Therefore, the MODIS samples with sensor viewing zenith angle (VZA) more than 45 • were filtered out. Only MODIS samples with VZA less than 45 • can be used for a comparison with the AGRI samples.

Evaluation Methodology
To quantitatively compare the cloud mask results, three validation scores (i.e., Probability of Detection (POD), False-Alarm Ratio (FAR), and Hit Rate (HR)) were employed to analyze the statistics of the agreement between the two systems. The definitions are given by Wang et al. [31]: where a and d represent the numbers of collocated samples that are all identified as cloudy and clear, respectively, by both AGRI and MODIS; b represents the number of matched samples determined as cloudy by AGRI but clear by MODIS; c represents the number of collocated samples classified as clear by AGRI but cloudy by MODIS. Note that the categories of "probably cloudy" and "probably clear" scenes are not included in the statistics here (i.e., only the pixels that classified as "cloudy" or "clear" by both AGRI and MODIS are considered here for comparison). PODcld and PODclr represent the efficiency of the AGRI cloud mask algorithm for detecting the cloudy and clear condition, respectively. The higher the POD value, the higher the efficiency. FARcld and FARclr indicate that the AGRI cloud mask algorithm misjudged collocated samples as "cloudy" and "clear", respectively. The HR or accuracy represents the overall efficiency of the AGRI cloud mask algorithm. Specifically, HR represents the identification accuracy of AGRI's cloud mask when only the detection results of cloud-sky and clear-sky conditions are compared, without considering the detection results of probably cloudy and probably clear conditions. At the extremes, there are two cases: (1) if the AGRI cloud mask agrees perfectly with MODIS, the POD and the HR will be one, and the FAR will be zero, while (2) no agreement results in the POD and the HR being zero, and the FAR being one. To quantitatively compare CTH, similar to Min et al. [37], four parameters (i.e., Standard Deviation (STD), Mean Absolute Error (MAE), Mean Bias Error (MBE), and Correlation Coefficient (CC)) were used to quantify the CTH statistics of differences between AGRI and MODIS.
where x Ai and x Mi represent the AGRI CTH and MODIS CTH, respectively, and N represents the numbers of collocated samples.
x Mi . Either one, or several 1-km pixel-level MODIS CTH retrievals can fall within a single 4-km AGRI CTH retrieval. Therefore, similar to Lai et al. [36], it was the distance weighted averaged MODIS CTH that was compared to the AGRI products within the AGRI 4-km FOV.

Results
Three months (June, July, and August 2020) of collocated AGRI and MODIS cloud detection and CTH retrievals over the TP area were compared. The three months of collocated AGRI and MODIS cloud retrievals resulted in approximately 3.2 and 1.5 million cases for cloud detection and cloud top height comparison, respectively. The results were separated by month and included statistics of the agreement with MODIS. For the comparison of the cloud mask results, only AGRI pixels where all the collocated MODIS pixels were identical (i.e., either all clear, all probably clear, all probably cloudy, or all cloudy) were included in the statistics. As a result of this requirement, approximately 37.5% of the collocated scenes were not included in the statistics, and approximately 2.0 million cases were included in the statistics in Table 2. Table 2. Fractional agreement, probability of detection (POD), false-alarm ratio (FAR), and hit rate (HR) scores for clear-sky and cloudy-sky conditions of AGRI cloud mask products over the TP during the periods June, July and August 2020. In the statistics, MODIS cloud mask results were used as a benchmark.
As earlier discussed (Section 3.1), either one or several MODIS cloud mask retrievals can fall within a single 4-km AGRI cloud mask retrieval. In order to mitigate the uncertainty of comparison due to sub-pixel inhomogeneity [36], only AGRI pixels where all the collocated MODIS cloud mask retrievals were identical (i.e., either all clear, all probably clear, all probably cloudy, or all cloudy) were included in the statistics. Figure 2 shows the fraction of the various identification results of the AGRI cloud mask for collocated samples when the MODIS cloud detection results were "cloudy", "probably cloudy", "probably clear", and "clear" respectively. The four colors represent these four different scenes. The evaluation results are presented as a fractional agreement. Fractional agreement is similar to POD score. The difference between POD score and fractional agreement is that the POD score is calculated from the matched sample dataset that only includes the pixels classified as "cloudy" or "clear" by both AGRI and MODIS, while the fractional agreement is calculated based on all matched samples that are flagged either "cloudy", "probably cloudy", "probably clear", or "clear". As shown in Figure 2, compared with the spatiotemporally collocated cloud detection results of MODIS, AGRI could identify clear and cloudy pixels with high agreement, showing good performance. Over the TP area, it was found that compared to the MODIS 1-km cloud mask, the fractional agreement of AGRI 4-km cloud mask was 0.932 for cloudy conditions. For clear-sky conditions, the fractional agreement was 0.734. However, there were distinct differences for the "probably clear" and "probably cloudy" scenes. This phenomenon is consistent with the result of the full-disk comparison [31]. These differences can mainly be attributed to the two totally different theoretical methodologies used by AGRI and MODIS for identifying "probably clear" and "probably cloudy" pixels. The MODIS Collection 6 cloud mask algorithm uses the final cloud mask confidence value based on three thresholds, as earlier mentioned (Section 2.1). In contrast, the current cloud detection algorithm for GEO satellite imagers uses spatial uniformity and two cloudy/clear restoral tests to identify "probably clear" and "probably cloudy" samples, which are closely linked to the results of neighbor pixels [31].
The TP region statistical results of the cloud mask comparison for the three months of collocated samples are presented in Table 2. The fractional agreement, POD, FAR, and HR scores of AGRI cloud mask performance were examined, and the results are separated by clear and cloudy FOVs as determined by MODIS in columns 3-9 of Table 2. Here, collocated MODIS cloud detection results were used as a benchmark. As earlier defined (Section 3.2) for POD, FAR, and HR, the categories of "probably cloudy" and "probably clear" scenes were not included in the statistics, while for the fractional agreement both in "cloudy" condition and in "clear" condition, the four scenes were all considered in the statistics. Therefore, the POD value was 0.973 for cloudy-sky conditions and 0.868 for clear-sky conditions, which was higher than the fractional agreement values (0.932 and 0.734). It should be noted that although the POD scores for cloudy conditions were higher than those for clear conditions, the FAR score was 0.093 for the cloudy condition, which was higher than that for clear-sky scene (0.04). In other words, there were more collocated samples that were misjudged as cloudy than those misjudged as clear. This finding is contrary to the result of the full-disk comparison [31]. In the full-disk comparison, FAR scores were higher for clear-sky scenes than for cloudy-sky scenes.
The results of the three months were similar. The fractional agreement and POD values were both above 0.9 under cloudy-sky conditions, while they were above 0.7 and 0.8, respectively, under clear-sky conditions. The HR scores of each month were higher than 0.9, indicating a high consistency of cloud mask between AGRI and MODIS.
In general, the AGRI cloud mask compared more favorably with MODIS cloudy scenes than for clear scenes. This result is consistent with the result of the full-disk comparison [31]. It can be inferred that AGRI cloud mask may be designed to be clear-sky conservative; that is, if there is uncertainty in the spectral tests, the AGRI cloud mask may tend to label the scene as cloudy. In addition, the AGRI cloud mask retrieval requires clear contrast between clear-sky and cloudy-sky conditions, which is dependent on both surface and atmospheric properties. The TP is associated with cold surfaces, causing low contrast between clear-sky and cloudy-sky scenes. Therefore, it is difficult to assign the pixels "clear" or "cloudy" in the algorithm confidence.

Cloud Top Height
Collocated 4-km AGRI cloud top height products were compared with MODIS 1-km CTH retrievals. As discussed earlier (Section 3.2), the distance weighted averaged pixel-level MODIS CTH retrieval was compared to the AGRI products within the AGRI 4-km FOV.
Based on this approach, CTH retrievals were collocated and compared for June, July, and August 2020. The statistical results of the comparison including mean absolute error, mean bias error, standard deviation, and correlation coefficient are presented in Table 3. The monthly comparison results were similar, with the CTH differences between AGRI and MODIS the smallest in June. For each month, the CTH difference was negative, and the average for these three months was −1.366 ± 2.235 km. The negative mean cloud top height differences (AGRI-MODIS) suggest that on average, AGRI underestimated the CTH relative to MODIS by 1.366 ± 2.235 km. The correlation coefficient value of each month was more than 0.74, indicating good correlation between the two datasets. Table 3. Cloud top height over the Tibetan Plateau (TP) statistics of the mean absolute error (MAE), mean bias error (MBE), standard deviation (STD), and correlation coefficient (CC) between AGRI and MODIS during the periods June, July and August 2020. MODIS cloud top height retrievals are used as a benchmark.  Figure 3 shows the validations of CTH products of AGRI for the complete collocated samples. The color bar represents the total number of matched samples in every bin at an interval of 0.25 km. It can be seen that the collocated samples were basically concentrated on the 1:1 line. There were a large number of collocated samples in the height range of 7~10 km, where the AGRI CTHs was clearly lower than that of MODIS. Mean absolute error, mean bias error, standard deviation, correlation coefficient, and the number of collocated samples for the whole period was also marked, which was 1.776 km, −1.366 km, 2.235 km, and 0.759, respectively.

Month Collocated Samples MAE (km) MBE (km) STD (km) CC
To further investigate the CTH differences between AGRI and MODIS at different height, the layered MAE, MBE, and STD at an interval of 1 km for collocated samples are presented in Figure 4. It can be seen that the MAE, MBE, and STD of CTH between AGRI and MODIS showed an obvious trend of error increasing with height, and finally reached the maximum MAE and STD (MBE minimum) value at approximately 12 km. For clouds with a top height above 12 km, the MAE, MBE, and STD fluctuated slightly with height; and for clouds with a top height above about 14 km, the differences in CTH between AGRI and MODIS decreased rapidly. The large negative differences were associated with high clouds with CTH in the range of approximately 12 to 14 km. The positive differences were associated with low-level clouds (<4.5 km).

Discussion
Interpretation of the cloud mask and cloud top height retrieval comparison results is complex. Biases and uncertainties result from a combination of spectral sensitivity differences and systematic algorithm biases from AGRI and MODIS as well as uncertainties from the spatial sampling differences. In addition, AGRI and MODIS cloud mask and cloud top height retrieval performance are dependent on both surface, cloud, and atmospheric properties. The complicated cloud regimes over the TP with complex terrain can increase the complexity of the interpretation of the comparison results.
To further investigate the performance of the AGRI cloud mask over the TP, the AGRI cloud mask agreement distribution separated for cloudy-sky and clear-sky scenes is presented in Figure 5. Three months of collocated samples of the same dataset in Table 2 were divided into 0.5 • grid boxes. As shown in Figure 5 Table 2. A grid cell with perfect agreement between AGRI and MODIS will have a fractional agreement of 1 (red), while boxes of poorer agreement are colored blue. Blank space means that there are no matched pixels under the corresponding conditions. The largest differences over the southern part of the TP (25 • N~29 • N, 79 • E~102 • E) for the clear-sky condition were partially due to sampling with only dozens or even fewer samples within a grid box instead of several hundred or even more than a thousand, as in other grid boxes. In summer, the warm and humid monsoon frequently reached the steep southern slope of the TP, making convective cloud system dominate there [22,65], therefore, there were few MODIS clear pixels during the summer monsoon season over those areas (Figure 5c). Surprisingly, there was poor agreement for clear sky conditions over the western part of the TP (36 • N~40 • N, 75 • E~85 • E, the area where China's largest desert, the Taklimakan Desert, and largest basin, Tarim Basin, are located). Although it is expected that the desert and basin area should be associated with warm surfaces, causing high thermal contrast between clouds and the surface, leading to better clear-sky detection; in fact, the differences over the under clear-sky condition were obvious. This is likely due to the fact that AGRI thresholds of the infrared tests are not applicable for the desert area, leading to poor agreement. It was found that cloud occurrence frequencies obtained from AGRI reached the peak at night and were the least during the day over that area. Radiation cooling at night will cause a significant temperature drop in desert areas. Additionally, the low temperature of the desert at night results in poor thermal contrast between the clouds and surface, which will increase the difficulty for cloudy-sky/clear-sky detection. It is necessary to further study what caused these differences in future work.
It is obvious that AGRI and MODIS were in better agreement for cloudy scenes than for clear-sky scenes. The fractional agreement was better than 0.9 over much of the TP for cloudy-sky condition. While AGRI and MODIS were generally in good agreement under this circumstance, there were some small regions located in the southern, southwestern, and northeastern part of the TP with poor agreement. To help interpret similarities and differences, Figure 6 presents the averaged MODIS cloud top height for two separated conditions: (1) for those cases where both MODIS and AGRI identify cloud scenes; and (2) for those cases where the MODIS 1-km cloud mask identifies cloud scenes but AGRI misses (i.e., AGRI flags them as clear or probably clear or probably cloudy). It should be noted that the 1-km pixel-level MODIS cloud top heights were first weighted averaged through Equation (10) to derive the cloud top height of the collocated sample, and then the MODIS cloud top height of each grid cell was derived by averaging all matched sample MODIS CTH within the grid box. The weighted average cloud height approach is to better characterize the cloud top height of those clouds that AGRI missed. By comparing Figure 6a with Figure 6b, it can be found that clouds that can be simultaneously detected by both MODIS and AGRI were those with higher CTH. The higher CTH in Figure 6a was most pronounced for the southeastern part of the TP (25 • N~32 • N, 85 • E~105 • E). This is due to the unique large topography of the TP that causes significant thermal-dynamic forcing on the atmosphere in summer over the steep southern slope and makes this region associated with frequent convective activities in summer [66,67]. For these convective clouds with high CTH (>12 km), AGRI and MODIS can obtain good agreement. AGRI tends to miss low CTH clouds (<5 km), which is more pronounced at lower altitudes (<4 km) in areas located away from the center part of the TP, particularly in the southwest TP. In addition, for the central region of the TP (29 • N~36 • N, 79 • E~97 • E) (i.e., the main part of the TP with an altitude of ≥4 km), the clouds missed by AGRI had a relatively lower CTH than those identified by both approaches over the same areas. This is likely due to the lack of thermal contrast between the clouds and the surface. The cloud top is usually colder than the surface, and will be colder with increasing cloud top height. A good contrast between clouds and surface will benefit cloud detection. The TP is associated with cold surfaces, so the higher the elevation, the colder. The cold background conditions of the TP make cloud and clear scene discrimination problematic, especially for the low CTH clouds that are relatively warmer than high CTH clouds over it. Figure 7 presents the mean absolute error and mean bias error of AGRI cloud top height compared with MODIS CTH separated into grid cells. Three months (June, July, and August 2020) of collocated samples of the same dataset with Table 3 were divided into 0.5 • grid cells. As shown in Figure 7, each grid box presents the MAE (Figure 7a) and MBE (Figure 7b) for all the matched samples within the box.
It can be found that the MAE presented a similar pattern of CTH disagreement distribution with MBE. In general, there were few CTH differences, especially over the eastern part of the TP. The largest differences were due to significant underestimation by AGRI occurring over the southern, southwestern, and northern region of the TP (Figure 7b). The pattern of large disagreement distribution over the southern (25 • N~27 • N, 86 • E~95 • E) and southwestern (28 • N~32 • N, 70 • E~75 • E) part of the TP was similar to the high CTH (>12 km) cloud distribution characteristic in Figure 6a. Therefore, the clouds with high CTH (>12 km) resulted in the large underestimations over the southern and southwestern part of the TP by AGRI. In summer, the TP is usually considered an "air pump" due to its higher surface temperature compared with surrounding regions [13], intensifying atmospheric instability induced by the South Asian Monsoon. These combined effects generate a high number of high altitude cirrus and deep convective clouds over the TP, especially the southern part of the TP [10,20,33]. These largest CTH differences in the southern and southwestern part are likely due to AGRI underestimating the high cloud top heights (CTH > 12 km) relative to MODIS as revealed by statistics shown in Figure 4.  Table 3. In (a), the deep blue value shows that AGRI and MODIS have the same cloud top height results, while the red values represent that there are large CTH differences between these two systems; in (b), a negative difference (blue) results when the mean AGRI cloud top height was lower than MODIS, while the deep red values represent AGRI overestimating the cloud top height relative to MODIS (the maximum bias value was 0.564 km). Blank space means that there were no matched pixels under the corresponding conditions.

Conclusions
This paper compared the FY-4A/AGRI and Terra/MODIS Collection 6 pixel-level cloud mask and cloud top height results over the Tibetan Plateau to better understand the cloud products' performance and characteristics of the new generation GEO satellite imager FY-4A/AGRI. Additionally, MODIS cloud retrievals were used as benchmarks. To facilitate the comparison, an accurate and computationally efficient collocation algorithm was developed. The comparison was conducted for the period of June, July, and August 2020. The comparison results showed that in general, the AGRI cloud mask over the TP is reliable for cloudy conditions. On average, AGRI underestimates the cloud top heights. For cloud detection, the two approaches (AGRI and MODIS) showed a fractional agreement of 0.93 for cloudy conditions, and 0.73 for clear-sky scenes when the MODIS cloud mask retrievals were used as a benchmark. However, there were distinct differences for "probably clear" and "probably cloudy" scenes, mainly due to the two totally different theoretical methodologies used by AGRI and MODIS. The agreement was generally better for cloudy conditions than for clear scenes. Significant disagreement found in the southern part of the TP for clear-sky conditions can be partially attributed to only a few clear pixels during the summer monsoon season over that region. The largest disagreement for cloudy-sky scenes occurred in some small regions located in the southern, southwestern, and northeastern part of the TP. AGRI and MODIS can obtain good agreement for detecting clouds with higher CTH, and AGRI tends to miss lower CTH clouds due to the lack of thermal contrast between the clouds and the surface of the TP.
For cloud top height products, the comparison shows that the mean cloud top height differences (AGRI-MODIS) were negative (−1.366 ± 2.235 km), suggesting that on average, AGRI underestimates the CTH relative to MODIS. The CTH differences between AGRI and MODIS showed an obvious trend of increasing error with height, and reached the largest difference at approximately 12 km. In general, there were few CTH differences, especially over the eastern part of the TP. The largest differences due to significant underestimation by AGRI occurred over small areas located in the southern, southwestern, and northern part of the TP. For the southern and southwestern part of the TP, the obvious CTH differences can be partially attributed to the CTH underestimation of clouds with high CTH (>12 km).