Development and Evaluation of a New Method for AMSU-A Cloud Detection over Land

: Satellite data are the main source of information for operational data assimilation systems, and Advanced Microwave Sounding Unit-A (AMSU-A) data are one of the types of satellite data that contribute most to the reduction of numerical forecast errors. However, the assimilation of AMSU-A data over land lags behind that over the ocean. In this respect, the accuracy of cloud detection over land is one of the factors affecting the assimilation of AMSU-A data, especially for the window and low-peaking channel (23–53.59 GHz and 89 GHz) data. Strong surface emissivity and high spatial and temporal variability make it difﬁcult to distinguish between the radiative contributions of clouds and the atmosphere. Based on the differences in the response characteristics of different channels to clouds, ﬁve AMSU-A window and low-peaking channels (channels 1–4 and 15) were selected to develop a new index for cloud detection over land. Case studies showed that the AMSU-A cloud index can detect most of the convective clouds; additionally, by further matching the MHS (Microwave Humidity Sounder) cloud detection index, we can effectively distinguish between cloudy and clear-sky observations. Batch test results also veriﬁed the accuracy and stability of the new cloud detection method. By referring to the MODIS (Moderate Resolution Imaging Spectroradiometer) cloud product, the POD (probability of detection) of the cloud ﬁelds of view with the new method was nearly 84%. By using the new cloud detection method to remove the cloudy data, the bias and standard deviation of the observation-minus-simulated brightness temperature (O − B) were signiﬁcantly reduced, with the bias of O − B for channels 2–4 being below 1.0 K and the standard deviation of channels 5 and 6 being nearly 1.0 K.


Introduction
By the end of the 20th century, direct assimilation of satellite radiance data into variational assimilation systems had begun to help with the problem of insufficient observational data, thus greatly improving the accuracy of numerical forecasts. In particular, the accuracy of forecasts in the southern hemisphere, where conventional observations are scarce, was rapidly improved to be consistent with that in the northern hemisphere [1]. The contribution of satellite observations in operational data assimilation systems is increasing; for example, the proportion of satellite information in GRAPES (Global/Regional Assimilation and Prediction System)-the numerical weather prediction system of the China Meteorological Administration-is more than 70%, and that of the European Centre for Medium-Range Weather Forecasts (ECMWF) exceeds 90% [2]. Amongst all the satellite instruments currently in operation, AMSU-A (Advanced Microwave Sounding Unit-A) is notable for its ability to retrieve information on the vertical distribution of atmospheric conditions and contributes significantly towards reducing global NWP forecast errors; additionally, it has also been demonstrated to be beneficial in many ways to regional forecasting systems [3,4]. measure accurately, which leads to more difficulties and challenges in assimilating AMSU-A observations over land than ocean, even with clear-sky assimilation [25]. To improve the accuracy of model-simulated BT over land, various surface temperatures and surface emissivity estimation methods have been proposed, and these methods have yielded significant improvements in the clear-sky assimilation of AMSU-A terrestrial observations [26][27][28]. However, in most of these methods, cloud product retrievals from other space-based instruments (e.g., MODIS (Moderate Resolution Imaging Spectroradiometer) cloud masks) are used to detect clouds, because most cloud detection methods over the ocean, such as LWP retrieval, are not applicable over land owing to the significant effect of surface emissivity. When using those cloud products, spatiotemporal interpolation is required between AMSU-A and other instruments, which is neither economical nor convenient for operational applications. While there are only a few empirical solutions available for terrestrial cloud detection based on the AMSU-A instrument itself; the accuracy of these methods is heavily dependent on the accuracy of the ancillary data. For example, in the GSI (Gridpoint Statistical Interpolation) assimilation system, terrestrial cloud detection is based on empirical scattering indices and precipitation indices [22]; whereas, GRAPES directly excludes the observations of the low-peaking channels 1-4 and 15, and mid-peaking channels 5 and 6, over land. A large number of AMSU-A terrestrial observations are excluded or discarded, which results in wasted information. Therefore, in this work, we attempted to develop a new AMSU-A terrestrial cloud detection method and, based on it, we evaluated the bias characteristics of different channels affected by clouds and different surface types under clear-sky conditions, and prepared for assimilating the observations of AMSU-A mid-and low-peaking channels over land in GRAPES.
The paper is structured as follows: Following this introduction, Section 2 introduces the datasets. Section 3 describe the new cloud detection method. Section 4 evaluates the effectiveness of the new cloud detection method and assesses the bias and standard deviation characteristics of different channels affected by clouds and different surface types under clear-sky conditions. Section 5 provides a discussion and conclusion. Finally, Section 6 gives a summary.

AMSU-A and MHS Onboard NOAA19
NOAA19 was launched on 6 February 2009, when it took over from NOAA18 as NOAA's primary afternoon satellite. The local equatorial crossing time at the launch of NOAA19 was 13:30; however, due to orbital drift, this had become 16:30 by 2019. Like its predecessor, NOAA19 carries two microwave instruments: AMSU-A and MHS, and the general characteristics of both instruments are listed in Table 1. AMSU-A is designed to detect the vertical temperature profile from the Earth's surface to a pressure height of about 2 hPa (45 km). Vertical profiles are obtained through the measurements of scene radiance in 15 channels, ranging from 23.8 to 89 GHz. The instrument has an instantaneous FOV of 3.3 • at the half-power points. The antenna provides a cross-track scan, scanning ±50 • from nadir with a total of 30 FOVs per scan line. The swath of AMSU-A is 2226.8 km and the spatial resolution at nadir is nominally 48 km.
MHS observes the Earth with five frequency channels ranging from 89 to 190 GHz. The instrument has two window channels (89 and 157 GHz) and three water vapor channels distributed around 181.3 GHz, mainly observing water vapor and cloud and rain information in the troposphere, which is more sensitive to the cloud, especially ice particles. MHS is also a cross-track instrument, with 90 contiguous scene resolution cells sampled in a continuous scan, covering 50 • on each side of the sub-satellite path, and with an antenna beam width of 1.11 • at the half-power point. The swath of MHS is 2348 km and the spatial resolution at nadir is nominally 17 km.
MHS and AMSU-A are carried on the same satellite platform, meaning the temporal difference between the two instruments is negligible and their swaths are similar, making it convenient to merge the two data streams. The number of MHS scan lines and FOVs on Remote Sens. 2021, 13, 3646 4 of 20 a scan line are three times greater than those of AMSU-A, which means 3 × 3 FOVs from the MHS can match one AMSU-A FOV.

MODIS Cloud Classification Product
MODIS is the instrument aboard NASA's Terra and Aqua satellites, which can acquire high radiometric-sensitive data (12 bit) in 36 spectral bands (wavelengths ranging from 0.4 to 14.4 µm) and sweep the entire surface of the Earth every 1 to 2 days. It has a viewing swath width of 2330 km and spatial resolutions of 250 m, 500 m, and 1 km (https://space.oscar.wmo.int/instruments/view/modis, accessed on 8 October 2020). Terra is a morning satellite, while Aqua is an afternoon satellite with a fixed local equatorial crossing time of 13:30. Therefore, in the present study, we used the MODIS data onboard Aqua, which has a 3-h time lag with NOAA19.
MODIS cloud products are widely used as benchmarks for verification and evaluation of satellite cloud products. The retrieval algorithms of MODIS cloud properties have evolved over the past two decades, and cloud products have been validated by comparing them with active remote-sensing observations. For example, in 2008, Ackerman [29] assessed the performance of the MODIS cloud mask algorithm using three years of radar and LiDAR data and showed that 85% of the MODIS cloud mask data agreed with the active remote-sensing data. Since then, the inversion algorithm of MODIS has undergone several updates, and the accuracy has been further improved. Based on the cloud classification criteria proposed by the ISCCP (International Satellite Cloud Climatology Project), cloud optical thickness and cloud-top pressure were obtained from the MYD06_L2 product (https://ladsweb.modaps.eosdis.nasa.gov/search/order/1/MYD06_L2--61, accessed on 8 October 2020) and combined to obtain clear skies and nine cloud classes (Figure 1b) [30,31]. The resolution of the obtained MODIS cloud product is 1 km.
The AMSU-A observational data can be used to retrieve the LWP over ocean, wherein an LWP value greater than 0.02 g/kg is conventionally considered to indicate cloud contamination of this FOV. Therefore, we can use observations from both instruments over the ocean to verify the effect of the time difference between the two instruments on the matching results. Figures 1a and 1b respectively show the spatial distribution of the whole-layer-integrated LWP from NOAA19 during 0300-0900 UTC and the MODIS cloud classification products during 0000-0600 UTC over the western Pacific on 12 August 2019. We focused on the locations of the clouds retrieved by the two instruments. Comparing the locations of the three pairs of cloud clusters that we marked in Figure 1, we can see that, although there is roughly a three-hour gap between the two instruments, the retrieved cloud locations are barely visible differences. Therefore, it is reasonable to assume that this three-hour gap will have a minimal effect when matching MODIS information to the coarse AMSU-A FOV. When matching with AMSU-A observations, we counted the MODIS sky category with the maximum proportion of the AMSU-A FOV within a 30-km radius as the sky category for this FOV.
wherein an LWP value greater than 0.02 g/kg is conventionally considered to indicate cloud contamination of this FOV. Therefore, we can use observations from both instruments over the ocean to verify the effect of the time difference between the two instruments on the matching results. Figures 1a and 1b respectively show the spatial distribution of the whole-layer-integrated LWP from NOAA19 during 0300-0900 UTC and the MODIS cloud classification products during 0000-0600 UTC over the western Pacific on 12 August 2019. We focused on the locations of the clouds retrieved by the two instruments. Comparing the locations of the three pairs of cloud clusters that we marked in Figure 1, we can see that, although there is roughly a three-hour gap between the two instruments, the retrieved cloud locations are barely visible differences. Therefore, it is reasonable to assume that this three-hour gap will have a minimal effect when matching MODIS information to the coarse AMSU-A FOV. When matching with AMSU-A observations, we counted the MODIS sky category with the maximum proportion of the AMSU-A FOV within a 30-km radius as the sky category for this FOV.   Figure 2 shows the spatial distribution of the O−B of AMSU-A channel 3 from NOAA-19 and MODIS cloud classification products in East Asia at 0600 UTC on 26 June 2019. The weighting function peak heights of channel 3 are located at the ground, which is the first temperature measurement channel of AMSU-A. To avoid the influence of topographic height, we chose to restrict the study to East Asia only. We used the Community Radiative Transfer Model (CRTM) developed by the Joint Center for Satellite Data Assimilation to simulate the AMSU-A BT. CRTM can provide fast, accurate satellite radiance  first temperature measurement channel of AMSU-A. To avoid the influence of topographic height, we chose to restrict the study to East Asia only. We used the Community Radiative Transfer Model (CRTM) developed by the Joint Center for Satellite Data Assimilation to simulate the AMSU-A BT. CRTM can provide fast, accurate satellite radiance simulations and Jacobian calculations at the top of the atmosphere. The model supports the simulation of sensor measurements covering wavelengths ranging from the visible through the microwave [32]. We used the FNL (final analysis) data [33] as the background field, and the NPOESS (National Polar-orbiting Operational Environmental Satellite System) dataset to determine the land-surface type of each FOV. As we did not input hydrometeor information into CRTM, we considered all-sky to be clear-sky. Comparing Figure 2a,b, it can be seen that the O−B in the thick cloud areas showed significant negative values, such as in the convective cloud system from Lake Baikal to northeast China and the convective cloud system over the Korean Peninsula, as well as in the stratocumulus over the eastern coast of China (black dashed circle). In the clear-sky area, meanwhile, the absolute value of O−B was smaller. The observed radiance of channel 3 in the clear-sky area was primarily from the surface-emitted radiance; whereas, the radiance observed by satellites in deep cloud areas was basically the cloud-top radiance, which was significantly lower than the surface radiance. Even for clouds penetrated by ground radiation, the scattering and absorption of water and ice particles in the clouds leads to the radiance received by satellites being significantly lower than the simulated clear-sky radiance. Although there are many factors that lead to differences between observed and simulated BTs, most O−B values in cloud areas were negative, which proves that clouds have an important impact on O−B. If cloud and clear-sky data cannot be distinguished, it results in false assimilation effects. Clouds have a significant effect on the BT observed by AMSU-A, but the response to clouds varies from channel to channel owing to frequency differences. AMSU-A window channels are sensitive to the presence of cloud and precipitation [34]. A scatterplot of the BTs observed by AMSU-A channels 3 and 15 over East Asia is given in Figure 3, where the circle colors indicate the matched simultaneous and the closest MODIS cloud classification results. It can be seen that the observed BT of channel 15 is higher than that of channel 3 in the clear-sky area. In the microwave region, Planck's formula can be simplified to the Rayleigh-Jeans radiation law, given the frequency ν and the thermodynamic temperature T of a black body:

Methods
where c is the speed of light and k is the Boltzmann constant, such that the BT is proportional to the quadratic of the frequency. This approximate theory has an accuracy of better than 1% for an object at 300 K viewed at a frequency less than 125 GHz [35]. In the clear-sky area, the AMSU-A observed radiance is mainly dependent on the radiance emitted from the sur- Clouds have a significant effect on the BT observed by AMSU-A, but the response to clouds varies from channel to channel owing to frequency differences. AMSU-A window channels are sensitive to the presence of cloud and precipitation [34]. A scatterplot of the BTs observed by AMSU-A channels 3 and 15 over East Asia is given in Figure 3, where the circle colors indicate the matched simultaneous and the closest MODIS cloud classification results. It can be seen that the observed BT of channel 15 is higher than that of channel 3 in the clear-sky area. In the microwave region, Planck's formula can be simplified to the Rayleigh-Jeans radiation law, given the frequency ν and the thermodynamic temperature T of a black body: where c is the speed of light and k is the Boltzmann constant, such that the BT is proportional to the quadratic of the frequency. This approximate theory has an accuracy of better than 1% for an object at 300 K viewed at a frequency less than 125 GHz [35]. In the clear-sky area, the AMSU-A observed radiance is mainly dependent on the radiance emitted from the surface; this can be simplified as the following Equation (2): where L Clr (ν,θ) is the clear-sky upwelling radiance, ε sfc is the surface emission, T sfc is the surface temperature, τ s is the transmittance from the surface to the top of the atmosphere, and B(ν,T) is the Planck function for a frequency ν and temperature T. Then, combined with Equation (1), the observed radiance ratio of the two channels in the same FOV is: In the same FOV, the surface emissivity and atmospheric state are fixed, but the frequency of channel 15 is larger than that of channel 3, so the observed BT of channel 15 is warmer than that of channel 3, and the ratio of the BT of channel 15 to that of channel 3 is close to a constant value.  In cloudy sky, the relationship between the BTs of the two channels is more complicated. The cloud attenuates the BT of both channels, and the thicker and higher the clouds, the more pronounced their attenuation effect and the lower the observed BT of channels 3 and 15, with the lowest BT observed in deep convective clouds and cirrostratus. However, microwaves can penetrate some thin clouds, so the observed BT under cirrus and some cirrostrati is not distinguishable from the observed BT under clear sky. Many cloudrelated factors will lead to a decrease in BT-for instance, the size and distribution characteristics of water and ice particles, as well as the shape of ice particles. Besides, there is a significant difference in the attenuation of cloud BT between the two channels, with channel 15 being more sensitive to clouds than channel 3, meaning the BT of channel 15 is more significantly reduced by clouds than that of channel 3. In Figure 3, the BT of channel 15 is remarkably smaller than that of channel 3 in the deep convective cloud area.
Therefore, we can try to define a cloud index based on the different responses of these two channels to clouds. Qin and Zou [36], based on MHS channel 2 being more sensitive to clouds than channel 1, used the standardized BT of channel 1 as the numerator and the BT of channel 2, which was adjusted to the same magnitude as the numerator, as the de- In cloudy sky, the relationship between the BTs of the two channels is more complicated. The cloud attenuates the BT of both channels, and the thicker and higher the clouds, the more pronounced their attenuation effect and the lower the observed BT of channels 3 and 15, with the lowest BT observed in deep convective clouds and cirrostratus. However, microwaves can penetrate some thin clouds, so the observed BT under cirrus and some cirrostrati is not distinguishable from the observed BT under clear sky. Many cloud-related factors will lead to a decrease in BT-for instance, the size and distribution characteristics of water and ice particles, as well as the shape of ice particles. Besides, there is a significant difference in the attenuation of cloud BT between the two channels, with channel 15 being more sensitive to clouds than channel 3, meaning the BT of channel 15 is more significantly reduced by clouds than that of channel 3. In Figure 3, the BT of channel 15 is remarkably smaller than that of channel 3 in the deep convective cloud area. Therefore, we can try to define a cloud index based on the different responses of these two channels to clouds. Qin and Zou [36], based on MHS channel 2 being more sensitive to clouds than channel 1, used the standardized BT of channel 1 as the numerator and the BT of channel 2, which was adjusted to the same magnitude as the numerator, as the denominator to define a terrestrial cloud detection index. The index can detect mostly cloudy FOVs. Zhu et al. [37] introduced this method to the Microwave Humidity Sounder II instrument onboard China's FY-3C satellite, also achieving satisfactory cloud detection results. In this work, five low-peaking channels (channels 1-4 and 15) of AMSU-A were selected to define the cloud index: (4) where T b,i is the observed BT of the ith channel of the five channels 1-4 and 15 of AMSU-A. The normalized brightness of channel 3 is used as the numerator, and the exponentiationadjusted brightness of channel 15 is used as the denominator. The spatial distribution of the numerator, denominator, and the AMSU-A cloud index at the same moment in time as in Figure 2a is given in Figure 4. Comparing with Figure 2b, because channel 3 is less sensitive to clouds, the normalized BT of channel 3 therefore showed a larger positive value in the cloudy areas but a smaller value in clear sky.
To further amplify the difference between cloudy and clear sky, we added the BT of the cloud-sensitive channel 15 and used the exponentiated BT of channel 15 as the denominator. The difference between the clear-sky and cloud-contaminated BT of channel 15 is amplified by the exponentiation, having been multiplied by a coefficient to adjust the magnitude to be comparable in size to the numerator. As the cloud attenuates the BT of channel 15 more significantly, the value of the denominator will thus be smaller in the cloud area, which ultimately gives the cloud index a large positive value in cloudy sky, while the value is smaller in clear sky, as shown in Figure 4c. where Tb,i is the observed BT of the ith channel of the five channels 1-4 and 15 of AMSU-A. The normalized brightness of channel 3 is used as the numerator, and the exponentiationadjusted brightness of channel 15 is used as the denominator. The spatial distribution of the numerator, denominator, and the AMSU-A cloud index at the same moment in time as in Figure 2a is given in Figure 4. Comparing with Figure 2b, because channel 3 is less sensitive to clouds, the normalized BT of channel 3 therefore showed a larger positive value in the cloudy areas but a smaller value in clear sky. To further amplify the difference between cloudy and clear sky, we added the BT of the cloud-sensitive channel 15 and used the exponentiated BT of channel 15 as the denominator. The difference between the clear-sky and cloud-contaminated BT of channel 15 is amplified by the exponentiation, having been multiplied by a coefficient to adjust the magnitude to be comparable in size to the numerator. As the cloud attenuates the BT of channel 15 more significantly, the value of the denominator will thus be smaller in the cloud area, which ultimately gives the cloud index a large positive value in cloudy sky, while the value is smaller in clear sky, as shown in Figure 4c.       Figure  5a,b, and the slopes of the two parts of the data after fitting are significantly different. Based on this feature, the threshold for distinguishing between the cloud and cloud-free observations can be determined. Of course, some cloudy and clear-sky observations were incorrectly distinguished, which we improved upon below.  In order to ensure the stability of the results, one month of AMSU-A observations were used to determine the thresholds of Aindex. Figure 6 shows the fitted slope of the denominator and numerator of the Aindex for different thresholds, in which the Aindex was calculated from AMSU-A observations of NOAA-19 over land areas of East Asia from 0000 UTC 15 June to 1800 UTC 15 July 2019. As can be seen from the figure, the slope for data with the Aindex bigger than the threshold (black curve) increased slowly with the threshold In order to ensure the stability of the results, one month of AMSU-A observations were used to determine the thresholds of A index . Figure 6 shows the fitted slope of the denominator and numerator of the A index for different thresholds, in which the A index was calculated from AMSU-A observations of NOAA-19 over land areas of East Asia from 0000 UTC 15 June to 1800 UTC 15 July 2019. As can be seen from the figure, the slope for data with the A index bigger than the threshold (black curve) increased slowly with the threshold value increasing to 0.14, and then basically stayed the same. However, for data with the A index less than the threshold (red curve), the slope increased rapidly before the threshold value reached 0.02, and then held steady from 0.02 to 0.1, after which it kept increasing and eventually got close to the black dotted curve. When the threshold value was less than 0.02, there were fewer clear-sky observations with the A index less than the threshold, so the absolute value of the fitted slope was large and increased rapidly. When the threshold value was in the range from 0.02 to 0.1, the data included by respecting the A index less than the threshold were mostly the same clear-sky observations, so the fitted slope stayed nearly the same. However, after the threshold value exceeded 0.1, the slope increased steadily and eventually got close to the slope of the observations in the cloudy area. This means In order to ensure the stability of the results, one month of AMSU-A observations were used to determine the thresholds of A index . Figure 6 shows the fitted slope of the denominator and numerator of the A index for different thresholds, in which the A index was calculated from AMSU-A observations of NOAA-19 over land areas of East Asia from 0000 UTC 15 June to 1800 UTC 15 July 2019. As can be seen from the figure, the slope for data with the A index bigger than the threshold (black curve) increased slowly with the threshold value increasing to 0.14, and then basically stayed the same. However, for data with the A index less than the threshold (red curve), the slope increased rapidly before the threshold value reached 0.02, and then held steady from 0.02 to 0.1, after which it kept increasing and eventually got close to the black dotted curve. When the threshold value was less than 0.02, there were fewer clear-sky observations with the A index less than the threshold, so the absolute value of the fitted slope was large and increased rapidly. When the threshold value was in the range from 0.02 to 0.1, the data included by respecting the A index less than the threshold were mostly the same clear-sky observations, so the fitted slope stayed nearly the same. However, after the threshold value exceeded 0.1, the slope increased steadily and eventually got close to the slope of the observations in the cloudy area. This means that cloudy observations have been included. So, A index = 0.1 can be used to distinguish between cloudy and clear-sky FOVs. value increasing to 0.14, and then basically stayed the same. However, for data with the Aindex less than the threshold (red curve), the slope increased rapidly before the threshold value reached 0.02, and then held steady from 0.02 to 0.1, after which it kept increasing and eventually got close to the black dotted curve. When the threshold value was less than 0.02, there were fewer clear-sky observations with the Aindex less than the threshold, so the absolute value of the fitted slope was large and increased rapidly. When the threshold value was in the range from 0.02 to 0.1, the data included by respecting the Aindex less than the threshold were mostly the same clear-sky observations, so the fitted slope stayed nearly the same. However, after the threshold value exceeded 0.1, the slope increased steadily and eventually got close to the slope of the observations in the cloudy area. This means that cloudy observations have been included. So, Aindex = 0.1 can be used to distinguish between cloudy and clear-sky FOVs.

Accuracy of the New Cloud Detection Method
Two examples of AMSU-A cloud index results are given in Figure 7, where the black circles are the cloudy FOVs detected by the AMSU-A cloud index. It can be seen from the

Accuracy of the New Cloud Detection Method
Two examples of AMSU-A cloud index results are given in Figure 7, where the black circles are the cloudy FOVs detected by the AMSU-A cloud index. It can be seen from the results that the cloud index can detect most of the cloudy areas, such as the banded cloud system from Lake Baikal to northeast China in Figure 7a, the high clouds over the Korean Peninsula, and the low clouds along the eastern coast of China (i.e., the area encircled by the black dotted line in Figure 2b). For the large convective clouds in northeast China (within 120-130 • E and 35-48 • N) in Figure 7b, these are mostly detected. However, the cloud index also has some shortcomings, such as its relative failure in detecting the cirrus and cirrostratus clouds around the convective cloud system in Figure 7; plus, the performance is unsatisfactory at high latitudes, where low clouds are missing in Figure 7a. Meanwhile, a small amount of over-detection was found over the area south of Lake Baikal in Figure 7a.
The detection accuracy of the AMSU-A cloud index obviously cannot meet the requirements for operational application, probably because the AMSU-A observations are more sensitive to water clouds and relatively insensitive to ice clouds, which makes identification by the index difficult. To address this problem, referring to the study of Zou et al. [14], we considered the addition of the MHS cloud index (M index ), which was defined as follows: where T b,i is the observed BT of the ith channel of the five channels 1-5 of MHS. The MHS cloud index was matched to the FOV of AMSU-A according to the method of nine MHS FOVs corresponding to one AMSU-A FOV proposed by Qin et al. [34]. M index > 0.35 is the threshold value to identify cloudy FOVs.
The blue circles in Figure 7 are the cloudy FOVs supplemented by the MHS cloud index. Compared with the MODIS cloud classification product, the addition of the MHS cloud index is a good remedy for the missed detection problem of the AMSU-A cloud index. The MHS cloud index alone misses some of the low clouds (not shown). However, by combining the cloud indices of the two instruments, the majority of cloud observations can be eliminated, and the structure and edges of the cloud system are detected accurately, with only a small fraction of cirrus and scattered point clouds being missed. It is worth mentioning that the result was compared to infrared cloud products, which are more sensitive to clouds than microwaves, and microwaves can penetrate some thin clouds, so some cirrus clouds missed by the cloud index may not have any effect on microwave observations. by combining the cloud indices of the two instruments, the majority of cloud observations can be eliminated, and the structure and edges of the cloud system are detected accurately, with only a small fraction of cirrus and scattered point clouds being missed. It is worth mentioning that the result was compared to infrared cloud products, which are more sensitive to clouds than microwaves, and microwaves can penetrate some thin clouds, so some cirrus clouds missed by the cloud index may not have any effect on microwave observations.     Table 2 shows the POD, FAR, and HR scores of three methods: the new method (AMSU-A and MHS indexes), the method only using the MHS index, and the empirical (old) method, respectively. The collocated MODIS cloud product was employed as a benchmark. The new method performed significantly better than the other two methods. Specifically, the new method achieved an average POD of 83.85% under cloudy sky, which is higher than the MHS-index-only method by about 15%, and the old method by about 22%. The MHS-index-only method has the highest clear-sky POD, but the HR was lower than the new method. Comparing the FARs of the two methods, the new method had a lower clear-sky (higher cloudy-sky) FAR than the MHS-index-only method, meaning fewer cloudy observations are missed by using the data of both instruments. Compared to the Remote Sens. 2021, 13, 3646 13 of 20 loss of some clear-sky observations, the negative impact on the forecast is more significant if the cloudy FOVs are misidentified as clear sky. Figure 9 presents a histogram of the O−B data volume distribution before and after the removal of cloud observations for AMSU-A channels 1-6 and 15 onboard NOAA-19 from June to August 2019. The results show that, before removing cloud-contaminated observations, the data distributions of O−B for the middle-and low-peaking channels are significantly skewed, with more observations having negative O−B values. The window channels (channels 15, 1, and 2) are more sensitive to clouds, and the clouds can make the O−B values of a few observations greater than 50 K. These observations can be excluded, after removing cloud-contaminated observations, and most of the observations were within the range of |O−B| < 10 K. The temperature channels (channels 3-6), meanwhile, are affected by clouds, as the weighting function peak heights become higher and decreases. After cloud detection, most of the observations fall within the range of |O−B| < 5 K. Figure 10 gives the O−B probability distribution curves for the original observations and after removal of cloudy observations for the three AMSU-A channels at the same time as in Figure 9. The O−B of the middle-and low-peaking channels after removing cloudy observations are more in line with the normal distribution.      Figure 11a shows the O−B deviation characteristics of the AMSU-A lower-and middle-peaking channel observations under cloudy and clear-sky conditions over East Asian land areas from June to August 2019. It can be seen that the average O−B value of the lowand middle-peaking channels is negative. In other words, the simulated BT is higher than the observed BT on average, regardless of cloudy or clear sky, except for channel 6. However, in cloudy sky, the average O−B value of the middle-and low-peaking channels is   Figure 11a shows the O−B deviation characteristics of the AMSU-A lower-and middle-peaking channel observations under cloudy and clear-sky conditions over East Asian land areas from June to August 2019. It can be seen that the average O−B value of the low-and middle-peaking channels is negative. In other words, the simulated BT is higher than the observed BT on average, regardless of cloudy or clear sky, except for channel 6. However, in cloudy sky, the average O−B value of the middle-and low-peaking channels is significantly negative compared with the clear-sky condition. Among the three window channels, channel 15 has the most significant deviation for simulated BT owing to the influence of clouds, followed by channel 2. The higher the peaking height of the temperature channel's weighting function, the weaker the influence of clouds; additionally, by channel 6, there was no significant difference between the average O−B in the cloudy and clear sky. Therefore, in the middle-and high-peaking channels, cloud detection should take into account the cloud-top height. If the cloud-top height is lower than the peaking height of the channel's weighting function, the observations of that channel should not be rejected, which is work that we plan to carry out in the future. The average O−B of channel 15 is still the most significant negative value in clear sky, being close to −2.0 K, followed by channel 1 and channels 2-4, which are around −1.0 K, and channels 5 and 6 were close to 0 K. Figure 11b gives the standard deviation of O−B values for all observations and excluding cloudy observations in this period. Before excluding cloudy observations, the standard deviation of O−B is significantly larger than that of the clear-sky data, and gradually decreased with the increase in the weighting function peak height. After excluding the cloudy observations, the standard deviations of the O−B values of the middle-and low-peaking channels were significantly reduced, and the standard deviation of channel 2 was still the largest. However, the standard deviation of channel 1 is larger than that of channel 15, while the simulated BT bias of channels 5 and 6 are smaller than 1.0 K. Additionally, other vegetation types were not considered because the number of observation samples was too small. The four vegetation types with the largest number of observation samples in the study area were broadleaf forest, pine forest, grass, and scrub, in that order. Because the simulation error was smaller in the middle-peaking channel, the amount of data was larger. As can be seen in Figure 12b, the simulated BT deviation was smaller for broadleaf forest between the two forest types, except for channel 6, where the simulated BT deviation was within 1 K for all channels, and within 0.5 K for channels 5 and 6. The simulated BT deviation of channel 5 was slightly larger for pine forest, but also less than 1 K. For scrub, the deviation was less than 0.5 K for all channels except channel 6, which is close to 1.0 K. The simulated BT deviation was larger for grassland, which was around 1.5 K for the lower channels and within 1.0 K for channels 5 and 6. The standard deviation of O−B for the four surface types was more consistent for the different channels, with a simulated BT error for the four low-level channels of around 2.5 K, a simulated error for channels 5 and 6 of around 1.0 K, and a simulated error less than 1 K for pine forest and grassland. The simulated BT error for channel 5 is larger for scrub, being close to 1.5 K.  6, which is close to 1.0 K. The simulated BT deviation was larger for grassland, which was around 1.5 K for the lower channels and within 1.0 K for channels 5 and 6. The standard deviation of O−B for the four surface types was more consistent for the different channels, with a simulated BT error for the four low-level channels of around 2.5 K, a simulated error for channels 5 and 6 of around 1.0 K, and a simulated error less than 1 K for pine forest and grassland. The simulated BT error for channel 5 is larger for scrub, being close to 1.5 K.

Discussion and Conclusions
The three-month analysis of cloud detection results reported in this paper validated the reliability of the new method, and the vast majority of cloud-contaminated FOVs could be detected. The new method only uses the observations, which helps to successfully avoid the influence of the model background field on the detection results, thus making this method promising for operational data assimilation. However, as with the old Additionally, other vegetation types were not considered because the number of observation samples was too small. The four vegetation types with the largest number of observation samples in the study area were broadleaf forest, pine forest, grass, and scrub, in that order. Because the simulation error was smaller in the middle-peaking channel, the amount of data was larger. As can be seen in Figure 12b, the simulated BT deviation was smaller for broadleaf forest between the two forest types, except for channel 6, where the simulated BT deviation was within 1 K for all channels, and within 0.5 K for channels 5 and 6. The simulated BT deviation of channel 5 was slightly larger for pine forest, but also less than 1 K. For scrub, the deviation was less than 0.5 K for all channels except channel 6, which is close to 1.0 K. The simulated BT deviation was larger for grassland, which was around 1.5 K for the lower channels and within 1.0 K for channels 5 and 6. The standard deviation of O−B for the four surface types was more consistent for the different channels, with a simulated BT error for the four low-level channels of around 2.5 K, a simulated error

Discussion and Conclusions
The three-month analysis of cloud detection results reported in this paper validated the reliability of the new method, and the vast majority of cloud-contaminated FOVs could be detected. The new method only uses the observations, which helps to successfully avoid the influence of the model background field on the detection results, thus making this method promising for operational data assimilation. However, as with the old method, the FAR is slightly higher, and we will focus on solving this problem in the future.
The present work focused only on the summer season, and so we need to use more and different seasonal data to analyze the effect of atmospheric temperature on the indexes in subsequent studies. Of course, this method is still based on the principle that clouds have a significant impact on the BT of each channel of AMSU-A and MHS. In some cases, the impact of clouds on the BT is very weak. For example, the thin cirrus cloud at high altitude has little impact on microwave radiation, and the low surface temperature at high latitudes and on glacial surfaces also causes the surface radiance to be similar to that of clouds, which may influence the effectiveness of the new method. The effect of the method on detection in winter also needs more data to be fully evaluated. In future research, we need to quantitatively evaluate the characteristics of clouds that influence the BT in these cases through idealized experiments based on the RTM, so as to determine a more reasonable detection threshold and further improve the method. At present, we do not recommend applying the method in areas with low surface temperature-for example, at high latitudes north of 60 • N in winter or areas covered by perennial glaciers.
In addition, due to data availability, we used MODIS cloud products to verify the method. The 3-h time difference between the cloud detection index and cloud products inevitably has an impact on the verification results. We believe that by using the bettermatched AVHRR cloud product data carried by the same satellite as the verification data, evaluation of this new method will more reasonable, which is also a direction for our group's work in the future. In addition, the effectiveness of the new cloud detection method still needs to be tested by assimilation experiments. Specifically, the effect of the method on assimilation needs to be judged in terms of actual assimilation results. Therefore, next, based on the new method, we intend to assimilate the clear-sky data of channels 5 and 6 over land areas in GRAPES to verify the improvements the new method can deliver in terms of achieving better forecast results.

Summary
Because of the strong surface emissivity and high spatial and temporal variability, the cloud detection of AMSU-A over land has been a challenge. In this work, based on the characteristics of AMSU-A and MHS channels, we developed a new terrestrial cloud detection method that relies only on the observations by merging the AMSU-A data and MHS data. Practical testing showed that the AMSU-A cloud index could detect most of the deep convective clouds, but missed the cirrus and some cirrostratus clouds. The addition of the matched MHS cloud index made up for the majority of clouds missed by the AMSU-A index. By comparing with the cloud classification product of MODIS, the cloud detection method after merging the information from both instruments could eliminate most of the cloudy observations.
The effectiveness and stability of the new cloud detection method were verified by collecting AMSU-A and MHS observations for three months. By referring to the MODIS cloud product, the POD, FAR, and HR of the three cloud detection methods were calculated, revealing that the new method performed the best. On average, the POD with the cloud FOVs of the new method could reach 83.85%; additionally, the new method was found to have a lower clear-sky (higher cloudy-sky) FAR than the MHS-index-only method, meaning fewer cloudy observations are missed by using data from both instruments.
After removing cloudy observations, the O−B of the low-and middle-peaking channels were found to be more in line with the normal distribution. Based on the accurate identification of the clear-sky observations, we also analyzed the O−B distribution characteristics of the AMSU-A low-and middle-peaking channel observations over land areas. Among the window channels, channels 1 and 15 had the largest bias and standard deviation in their simulated BT owing to the influence of clouds, which gradually decreased as the weighting function peak height of the channel increased. After removing the cloudy observations, the bias and standard deviation of O−B of the low-and middle-peaking channels of AMSU-A were found to reduce significantly; additionally, the bias of the O−B of channels 5 and 6 was within 1.0 K under clear-sky conditions, and standard deviation was around 1.0 K. The bias and standard deviation of the O−B for the middle-and lowerpeaking channels also differ among vegetation types under clear sky. The bias of broadleaf forest was smaller than that of pine forest, but the observation error was slightly larger than that of pine forest; the bias of grassland is larger, but the error was the smallest; and the observation error on scrub is the largest. Overall, the bias and standard deviation of the O−B of channels 5 and 6 are smaller among all channels.