Near-Ultraviolet to Near-Infrared Band Thresholds Cloud Detection Algorithm for TANSAT-CAPI

: Cloud and aerosol polarization imaging detector (CAPI) is one of the important payloads on the China Carbon Dioxide Observation Satellite (TANSAT), which can realize multispectral polarization detection and accurate on-orbit calibration. The main function of the instrument is to identify the interference of clouds and aerosols in the atmospheric detection path and to improve the retrieval accuracy of greenhouse gases. Therefore, it is of great signiﬁcance to accurately identify the clouds in remote sensing images. However, in order to meet the requirement of lightweight design, CAPI is only equipped with channels in the near-ultraviolet to near-infrared bands. It is difﬁcult to achieve effective cloud recognition using traditional visible light to thermal infrared band spectral threshold cloud detection algorithms. In order to solve the above problem, this paper innovatively proposes a cloud detection method based on different threshold tests from near ultraviolet to near infrared (NNDT). This algorithm ﬁrst introduces the 0.38 µ m band and the ratio of 0.38 µ m band to 1.64 µ m band, to realize the separation of cloud pixels and clear sky pixels, which can take advantage of the obvious difference in radiation characteristics between clouds and ground objects in the near-ultraviolet band and the advantages of the band ratio in identifying clouds on the snow surface. The experimental results show that the cloud recognition hit rate (POD cloud ) reaches 0.94 (ocean), 0.98 (vegetation), 0.99 (desert), and 0.86 (polar), which therefore achieve the application standard of CAPI data cloud detection The research shows that the NNDT algorithm replaces the demand for thermal infrared bands for cloud detection, gets rid of the dependence on the minimum surface reﬂectance database that is embodied in traditional cloud recognition algorithms, and lays the foundation for aerosol and CO 2 parameter inversion. for GF-1 remote sensing images. This study Proposed a cloud detection approach based on near-ultraviolet to near-infrared band with thresholds for different underlying surfaces (NNDT). misclassiﬁcations to the of the algorithm in separating cloud events from of 1.0 perfect of discrimination


Introduction
CO 2 is one of the greenhouse gases on earth, and the increase in its concentration over the last century seriously affects the environment of human survival [1]. The IPCC report in 2014 shows that the radiative forcing effect of aerosols is the largest source of uncertainty in climate change assessment. Monitoring and evaluating the spatial distribution and parameters of both CO 2 and aerosols have become a matter of great concern for scientists. Therefore, research and improvement of aerosol and CO 2 parameter retrieval schemes have been carried out all over the world. Large-scale observation of the earth can be achieved by satellite remote sensing, which has become one of the main ways to detect CO 2 and aerosol [2].
The China Carbon Dioxide Observation Satellite (TANSAT) project is the first global atmospheric carbon dioxide observation scientific experimental satellite developed entirely by China. It was successfully launched by the CZ-2D carrier rocket at 3:22 on 22 December 2016. TANSAT is a sun-synchronous polar orbiting satellite with an orbital altitude of 712 km. There are two key instruments on the satellite: carbon dioxide sensor (CDS) and cloud and aerosol polarization imager (CAPI) [3]. CAPI aims to identify cloud and aerosol interference in the atmospheric to more accurately retrieve CO 2 parameters. CAPI has a spatial resolution of 250 m/1000 m, and a calibration accuracy of better than 5%. There are two modes of sub-satellite point observation and flare observation on it, between which the sub-satellite point observation mode is not affected by the solar flare. CAPI is equipped with five near ultraviolet to near infrared bands (0.38, 0.67, 0.87, 1.375, and 1.64 µm), as shown in Table 1, and realizes the small-scale lightweight design. Similar to the cloud and aerosol imager (TANSO-CAI) on the greenhouse gas observation satellite (GOSAT) [4], CAPI has an additional 1.375 µm band for cirrus detection. CAPI adds polarization channels at 0.67 and 1.64 µm, respectively [5]. Its swath width is 400 km, which provides a large enough area for detecting aerosol spatial distribution and cloud cover. However, most remote sensing images contain clouds [6]. Cloud is an interference factor for atmospheric retrieval, especially the thin cirrus, which is difficult to detect, and its existence will greatly affect the retrieval accuracy of aerosol and carbon dioxide parameters [7]. Therefore, in order to minimize the impact, accurate cloud detection plays an important role in image preprocessing.
With the increasing application of machine learning in the field of remote sensing, it provides another effective means for cloud detection. However, because the machine learning method usually requires a large number of training samples for model construction, the workload is large, and the universality of the model is often not effectively guaranteed. The method of cloud texture and spatial features performs cloud detection based on the spatial information features of the image. The cloud has a unique texture feature that distinguishes it from the background, which is reflected in the spatial variation of the spectral brightness value. Due to the complexity of this algorithm and the large amount of calculation, it is difficult to meet the needs of high-efficiency calculations. The basic principle of the spectral thresholds method of cloud recognition is to rely on the spectrum difference between the cloud and the earth's surface in the remote sensing image. It is a physical method to distinguish between the cloud and the earth's surface by setting the radiance (reflectivity) thresholds. Due to its simple model and fast calculation speed, this method can meet the needs of batch image processing. Therefore, this research is based on the spectral thresholds method to design a cloud detection algorithm suitable for CAPI data. Table 2. Summary of three classes of cloud detection methods.

Optical Model Author Key Results
Machine learning method Yu et al. (2006) Automatic detection of cloud was realized using a clustering method according to texture features.

Wang et al. (2010)
A K-means clustering algorithm was used to classify the clustering feature of the thresholds for reflectivity data.

Jin et al. (2016)
A back propagation was established in a neural network for MODIS data, which has efficient learning ability. Yu Oishi et.al (2017) Analyzed the impact of different support vectors on GOSAT-2 CAI-2 L2 cloud discrimination.

Fu et al. (2019)
The cloud detection on FY meteorological satellite images was carried out using Random Forest Method.
Texture and spatial features method Lee et al. (1990) Texture cues were utilized to recognize clouds by their high spatial heterogeneity.
Liu (2007) The cloud detection on MODIS data was carried out using a classification matrix and dynamic clustering algorithm of shortest distance. However, there are a few researches on cloud recognition algorithm for CAPI at present. Only one scheme based on Chinese FengYun-3A Polar-orbiting Meteorological Satellite data was proposed by Wang et al. before satellite launch [30]. Although the cloud and aerosol unbiased decision intellectual algorithm (CLAUDIA) has been proved to be applicable to all remote sensing instrument data with corresponding visible to thermal infrared bands [27], it also has limitations. For CAPI, due to design limitation, the channels that can be applied to the CLAUDIA method only contain four bands from visible to near-infrared, which is difficult to achieve the accuracy of cloud detection algorithms using thermal infrared bands. In addition, using 0.67 and 0.87 µm bands for cloud detection needs Remote Sens. 2021, 13,1906 4 of 26 the support of minimum surface reflectance database to ensure the inversion accuracy, while the use of the minimum surface reflectance database has its own limitations such as causing image interpretation errors.
To solve the above problems, this paper proposes a more effective cloud detection method for CAPI, which is a thresholds detection method based on different underlying surface thresholds from near ultraviolet to near infrared (NNDT). For the first time, 0.38 µm and the ratio of 0.38 to 1.64 µm bands are used in cloud detection. First, the algorithm designed unique band test combinations for the four surface types of ocean, vegetation, desert, and polar regions. Subsequently, CAPI data have been collected over the four surface types to establish threshold values for the cloud detection spectral tests, and the corresponding fixed thresholds are determined. Finally, in order to verify and evaluate the effectiveness of the NNDT algorithm, the cloud recognition results of the algorithm and the official cloud recognition results of the Moderate Resolution Imaging Spectroradiometer (MODIS) and the Second-generation GLobal Imager (SGLI) are compared visually and quantitatively, respectively. Experiments prove that the algorithm can overcome the above limitations and can effectively identify the cloud. This paper is organized as follows. In Section 2, the NNDT algorithm is introduced, including the characteristics analysis of wavelength and wavelength combination and the introduction of algorithm flow (including: preprocessing process, algorithm design for different underlying surfaces, and the statistics and determination of each threshold). Section 3 gives the verification results of cloud detection. Finally, the experimental results are discussed in Section 4, and in Section 5, the conclusions are given and the future work is prospected.

Materials and Methods
The spectral thresholds method is based on the radiation difference between cloud and underlying surfaces reflectance in satellite images. The spectral characteristics used by NNDT algorithm include: (a) reflectance of solar radiance, (b) dependence of reflectance on wavelength, and (c) reflectance of solar radiance at the absorption wavelength. The algorithm uses one near ultraviolet (0.38 µm), one visible (0.67 µm), and three near-infrared (0.87, 1.375, and 1.64 µm) CAPI channels. The following section will explain the detailed radiation characteristics and how to apply them to cloud/clear sky discrimination in CAPI images.

Reflectance of Solar Radiance
In the non-absorption visible and near-infrared bands, the clouds have higher reflectance value than that of clear sky surfaces [30]. Based on this fact, 0.67 µm reflectance is used for vegetated land and coastal regions, and 0.87 µm reflectance is used over water scenes in many cloud detection algorithms [31,32]. However, in the 0.67 and 0.87 µm bands, the reflectivity of different land surface types varies greatly in different seasons, so the reflectivity thresholds of these bands also change [33]. In order to ensure the effectiveness of cloud recognition in these two bands, it is usually necessary to count the minimum surface reflectance varying with seasons or even months, while CAPI lacks the minimum surface reflectance with timeliness and full coverage. If an external database is used, such as MODIS minimum albedo product MCD43A3 or the minimum albedo map proposed by Ishida and Nakajima [26], the observation geometry matching and spectral response function matching of data need to be considered. In addition, if the minimum albedo is affected by optically thin clouds, the actual optically thin clouds will not be recognized.
Based on the above situation, this article uses 0.38 µm near-UV band of CAPI as an alternative. Before this, there was no research using 0.38 µm band for cloud recognition. Yu Oishi et al. analyzed the annual variation of reflectivity of each band of CAI load for several typical land surface types [13], indicating that the reflectivity of 0.38 µm band remains at a low value and basically unchanged throughout the year except that it increases due to the Remote Sens. 2021, 13, 1906 5 of 26 influence of ice and snow. From the spectral reflectance of cloud and underlying surface in Figure 1, it can be seen that the surface of soil, desert, and vegetation has lower reflectance in the 0.38 µm band than in the 0.67 and 0.87 µm band, the reflectance of water is still at a low level, and the reflectance of cloud is still at a high level. Moreover, the reflectivity of aerosol is very low in the 0.38 µm band [34]. In terms of cloud characteristics, the edge details of low and middle clouds in the 0.38 µm satellite image are very obvious. Figure 2 shows the CAPI grayscale images of the 0.38, 0.67, and 0.87 µm bands in the Australian semi-vegetated and desert. Therefore, the reflectance of 0.38 µm band is more beneficial to distinguish between cloud and clear sky surface on soil, desert, and vegetation underlying surface. In general, the reflectance of water and vegetation in the 0.38 µm band is at extremely low level of less than 0.05, while for the dry soil (i.e., bare soil) and desert surface, the reflectance is higher, between 0.1 and 0.3, there is a possibility of misidentification as a cloud. Because ice and snow have the same high reflectivity as cloud at 0.38 µm, simply using 0.38 µm cannot distinguish the cloud from the snow and ice surface.
Based on the above situation, this article uses 0.38 μm near-UV band of CAPI as an alternative. Before this, there was no research using 0.38 μm band for cloud recognition. Yu Oishi et al. analyzed the annual variation of reflectivity of each band of CAI load for several typical land surface types [13], indicating that the reflectivity of 0.38 μm band remains at a low value and basically unchanged throughout the year except that it increases due to the influence of ice and snow. From the spectral reflectance of cloud and underlying surface in Figure 1, it can be seen that the surface of soil, desert, and vegetation has lower reflectance in the 0.38 μm band than in the 0.67 and 0.87 μm band, the reflectance of water is still at a low level, and the reflectance of cloud is still at a high level. Moreover, the reflectivity of aerosol is very low in the 0.38 μm band [34]. In terms of cloud characteristics, the edge details of low and middle clouds in the 0.38 μm satellite image are very obvious. Figure 2 shows the CAPI grayscale images of the 0.38, 0.67, and 0.87 μm bands in the Australian semi-vegetated and desert. Therefore, the reflectance of 0.38 μm band is more beneficial to distinguish between cloud and clear sky surface on soil, desert, and vegetation underlying surface. In general, the reflectance of water and vegetation in the 0.38 μm band is at extremely low level of less than 0.05, while for the dry soil (i.e., bare soil) and desert surface, the reflectance is higher, between 0.1 and 0.3, there is a possibility of misidentification as a cloud. Because ice and snow have the same high reflectivity as cloud at 0.38 μm, simply using 0.38 μm cannot distinguish the cloud from the snow and ice surface. Reflectance curve of cloud and underlying surface. The underlying surface spectrum comes from ENVI spectral library [35], and the cloud spectrum comes from Airborne Visible Infrared Imaging Spectrometer. Reflectance curve of cloud and underlying surface. The underlying surface spectrum comes from ENVI spectral library [35], and the cloud spectrum comes from Airborne Visible Infrared Imaging Spectrometer.

Dependence of Reflectance on Wavelength
From visible to near-infrared in the solar radiation region, without considering the absorption valley region, cloud reflectivity does not change much with wavelength. On the contrary, the reflectance of several kinds of ground surface changes with the change of wavelength. Therefore, the difference (or ratio) of the reflectance at different wavelengths can well indicate whether the pixel is cloud or not [26]. In deserts or areas with sparse vegetation, the reflectivity in visible light region changes very little. In the near-infrared region, the reflectivity of deserts tends to increase with increasing wavelength [36]. Therefore, the reflectance ratio of 0.87 and 1.64 µm can better solve the problem of cloud identification over desert areas, and the ratio is expressed as: Remote Sens. 2021, 13, x FOR PEER REVIEW 6 of 26

Dependence of Reflectance on Wavelength
From visible to near-infrared in the solar radiation region, without considering the absorption valley region, cloud reflectivity does not change much with wavelength. On the contrary, the reflectance of several kinds of ground surface changes with the change of wavelength. Therefore, the difference (or ratio) of the reflectance at different wavelengths can well indicate whether the pixel is cloud or not [26]. In deserts or areas with sparse vegetation, the reflectivity in visible light region changes very little. In the nearinfrared region, the reflectivity of deserts tends to increase with increasing wavelength [36]. Therefore, the reflectance ratio of 0.87 and 1.64 μm can better solve the problem of cloud identification over desert areas, and the ratio is expressed as: But for bright surfaces, such as snow, it is not that effective. Normalized difference snow index (NDSI) is used to distinguish snow cover areas [37][38][39]. The NDSI, which has a larger value on snow surface than for cloud, is determined by the reflectance in 0.67 bands and 1.64 μm bands of CAPI [30]: In the visible and infrared regions, cloud shadow has very low reflectivity, while in the visible region, cloud reflective bright edge has very high reflectivity. As a result, the NDSI values in the cloud shadow area are very close to those in the cloud area. In addition, in the cloud reflective bright edge, the NDSI values are very close to those in the snow surface area. In this way, it is easy to mistakenly identify the cloud shadow as cloud, and mistakenly identify the cloud reflective bright edge as clear sky. For this reason, this paper But for bright surfaces, such as snow, it is not that effective. Normalized difference snow index (NDSI) is used to distinguish snow cover areas [37][38][39]. The NDSI, which has a larger value on snow surface than for cloud, is determined by the reflectance in 0.67 bands and 1.64 µm bands of CAPI [30]: In the visible and infrared regions, cloud shadow has very low reflectivity, while in the visible region, cloud reflective bright edge has very high reflectivity. As a result, the NDSI values in the cloud shadow area are very close to those in the cloud area. In addition, in the cloud reflective bright edge, the NDSI values are very close to those in the snow surface area. In this way, it is easy to mistakenly identify the cloud shadow as cloud, and mistakenly identify the cloud reflective bright edge as clear sky. For this reason, this paper proposes a new method of using the reflectance ratio of 0.38 µm band to 1.64 µm band to make up for this fault, as shown in Equation (3).
The values of cloud shadow and cloud reflective bright edge in 0.38 µm band are not too low or too high. Therefore, the ratio of 0.38 µm band to 1.64 µm band in cloud shadow is higher than that of snow surface, and in cloud reflective bright edge, it is equivalent to that of cloud pixel. The ratio of 0.38 µm band to 1.64 µm band is lower at cloud pixel than in snow pixel. By setting threshold, cloud on snow surface can be effectively identified.

Reflectance of Solar Radiance at The Absorption Wavelength
High clouds, such as cirrus, can be detected by solar radiation at absorption wavelengths [40]. The 1.375 µm channel is in the water vapor absorption zone, and the radiation reflected by the surface or low altitude clouds is almost completely absorbed by the abundant water vapor beneath the cloud. When high altitude clouds and cirrus clouds mainly composed of ice crystals exist, the reflected radiation intensity at 1.375 µm band will increase. However, 1.375 µm band detection is not applicable at high altitude areas and polar regions, where the atmosphere is thin, absorbing less and reflecting more radiation from the underlying surface. This band is therefore only designated to detect clouds at surface altitudes of less than 2000 meters. Figure 3 shows the flow chart of NNDT algorithm using CAPI data. There are four steps, including solar altitude angle and sun-earth distance correction, snow surface predetection, determination of land cover type, and cloud detection for four underlying surfaces. The first three of them are pre-processing work. The fourth step is the key part of the NNDT algorithm.

Preprocessing
(1) Solar altitude and sun-earth distance correction The data used in this paper are obtained from CAPI L1B (1000 M) V2.0 sub-satellite point observation mode (tracking the main plane) scientific observation products and geometric positioning products. The L1B data of CAPI is reflectance data, which is generated from the radiometric calibrated DN (digital number) value. Before the data can be used for cloud identification, the sun to earth distance correction and the solar altitude angle correction are also needed.

Preprocessing
(1) Solar altitude and sun-earth distance correction The data used in this paper are obtained from CAPI L1B (1000 M) V2.0 sub-satellite point observation mode (tracking the main plane) scientific observation products and geometric positioning products. The L1B data of CAPI is reflectance data, which is generated from the radiometric calibrated DN (digital number) value. Before the data can be used Remote Sens. 2021, 13, 1906 8 of 26 for cloud identification, the sun to earth distance correction and the solar altitude angle correction are also needed.
R* is L1B data, R is reflectivity data, L is the radiance data after radiometric calibration, Esun is the solar constant, d is the distance between the sun and the earth, cosA is the cosine of the solar zenith angle (unit: radians). The sun-earth distance d and the solar zenith angle A are stored in the geometric data of L1B.
(2) Snow surface pre-detection Scattered snow cover may be misidentified as cloud, therefore identifying snow cover in advance is required before cloud detection. Identifying the snow surface in advance can simplify the subsequent cloud detection procedure and reduce the probability of misjudging the snow as a cloud. The following tests are used: T NDSI is the detection threshold. Based on [41,42], we mark a pixel as snow when the NDSI is higher than 0.48 (0.6) during the Northern Hemisphere warm (cold) season, i.e., April to September (October to March); notice that warm and cold months are switched for the Southern Hemisphere (SH), with the reflectivity of 0.87 and 0.67 µm higher than 0.11 and 0.10, respectively.
(3) Determination of land cover type Another important step is to determine the land cover type. The land classification data were derived from the MODIS global land cover data (MCD12Q1) with a resolution of 500 m, which is matched to CAPI geometric data according to the longitude and latitude coordinates. In this paper, all underlying surface types are divided into four types: ocean, vegetation, desert, and polar, so as to implement the subsequent targeted cloud detection for different underlying surface types. Different from the traditional classification method, the inland water is classified as ocean in this paper, and the desert and bare surface are taken out as an independent category from the land category, and the rest types in the land category are collectively referred to as vegetation.

Cloud Recognition Algorithms for Different Surface Types
Cloud/clear discrimination is done by comparing the reflectance from the CAPI to the thresholds, which defines the boundary between cloud and clear pixels. NNDT algorithm consists of different threshold tests to ensure detection accuracy, in consideration that a threshold test that is effective for a certain surface type may not be appropriate for another type. Different cloud detection schemes 1-4 are designed for different underlying surface types, as shown in Formulas (9)- (16). T is the threshold of corresponding detection. For ocean and vegetation surface cloud recognition, we use the union of R f (0.38 µm) and R f (1.375 µm) threshold detection results. On the desert surface, we use the union of R f (1.375 µm) threshold detection results with the intersection of R f (0.38 µm) and R f (0.87 µm)/R f (1.64 µm) threshold detection results. In polar regions, we only use R f (0.38 µm)/R f (1.64 µm) threshold detection for cloud recognition. As shown in Table 3, different thresholds are used for 0.38 and 1.375 µm band detection on different underlying surfaces.
2. Vegetation R f (0.38 µm) > T veg-0.38 (11) or 3 . and When the above conditions are met, the pixel is identified as a cloud.
The thresholds used in all cloud tests in this study are obtained by CAPI data statistics, rather than the thresholds set by other cloud recognition algorithms for the band characteristics of their sensors [43]. The reflectance data of the Indian Ocean (35 • -40 • S, 120 • -130 • E), southern Africa (7 • -10 • S, 23 • -26 • E), Sahara Desert (15 • -18 • N, 12 • -17 • E), and Antarctica (68 • -72 • S, 148 • -158 • E) in 2017, which as representative areas of the four surface types, were collected for threshold statistics. All the data are corrected by solar altitude angle and sun earth distance. Considering whether the thresholds change with the season, we select all the images in the same area and at the same time with an 18-day revisit cycle from March 3 in a year to carry out visual cloud recognition by setting the detection thresholds. Because clouds with different heights, types, and optical thicknesses have different apparent reflectance [44,45]. For the purpose of identifying all possible clouds, a threshold is selected between the surface reflectance of clear sky and the reflectance of the cloud with the lowest optical thickness, which is close to the surface reflectance of clear sky.
All threshold results are shown in Table 4. It can be seen from the thresholds listed in the table that there is no significant change in the reflectance thresholds of the same land surface type in a year; but for different land surface types, the thresholds are quite different. Therefore, after excluding the obvious deviation value (irregular maximum or minimum) in the thresholds of the same land surface type, the smallest (large) value in the remaining data is used as the final cloud detection threshold of the corresponding land surface type, as shown in Table 3.

Results
Verifying the effectiveness of cloud recognition algorithms is difficult. Two important steps in validation are visual comparison and quantitative analysis [46][47][48]. In visual comparison, an analyst conducts a validation through visual inspection of the spectral, spatial, and temporal features in a set of images. Visual inspection is an important first step in validating any cloud mask algorithm. The analyst uses knowledge and experience of cloud and land surface spectral properties to identify obvious problems. However, visual comparison provides poor quantitative evaluation [25]. More quantitative validation can be attained through directly comparing the two results pixel by pixel. This section provides two schemes for verifying NNDT algorithm, which are visual comparison with MODIS cloud identification products and quantitative comparison with SGLI cloud identification products. Some validation examples will be given at the same time.

Visual Comparison with MODIS Cloud Detection Production
NNDT algorithm was evaluated by visually comparing cloud/clear discrimination results (referred to as the CAPI cloud flag) with the CAPI composited RGB image and the MODIS cloud-mask product (MYD35). MYD35 is a cloud-mask product for solar synchronous orbit data from MODIS (onboard Aqua) at 13:30 local solar time, close to TANSAT local solar time, making it possible to find images of CAPI and MODIS in the same place on certain dates with small transit intervals. However, their swath and revisit cycles are different: the CAPI has a swath of 400 km and a revisit cycle of 18 days. The Aqua/MODIS has a swath of 2330 km and a revisit cycle of 16 days. The orbit of Aqua overlaps with that of CAPI only intermittently.
MYD35 uses clear confidence level (CCL) to evaluate cloud detection results, including four confidence levels, which are: cloudy, probably cloudy, probably clear, and confidently clear [49,50]. In order to correspond with the output cloud flag "either cloud or clear" of the NNDT algorithm, the "confidently clear" and "probably clear" of MYD35 are classified as clear, and the other confidence levels are classified as cloud. CAPI and MODIS data, as well as cloud flag, are projected onto grids with a spatial resolution of 1 km × 1 km.
To clarify the characteristics of CAPI Cloud Flag, five scenes were selected (Figures 4-8 Other scenarios used data from 26 April 2017, when the track of two instruments crossed within 5 min of each other. In order to facilitate comparative observation, ENVI software is used to clip MODIS and CAPI images after geometric correction to obtain the overlapping area images. In polar regions, the geometrically corrected satellite images have a very small width, which is not conducive to observation. Therefore, images without geometric correction are used, and the approximate geographic range is marked. Each scene has four panels: (a) the CAPI false color image synthesized by the reflectance data of the 0.87 µm (Red), 0.67 µm (Green), 0.38 µm (Blue) channels for the scene; (b) the cloud flag image of CAPI derived by applying NNDT algorithm to the CAPI reflectance data; (c) the MODIS true-color image combining bands 1, 3, and 4 of multiple MODIS level-1b products overlapping with the CAPI scene; and (d) the cloud flag of MYD35 with only "cloud" and "clear" results.
As can be seen from the cloud identification of the northwest coast of Australia and Indian ocean area in Figure 4, the cloud area contours of CAPI cloud flag and MYD35 are very similar. Small clouds around the cloud cluster in the CAPI RGB image are also well recognized. Compared with MYD35, the cloud regions identified by NNDT algorithm are more coherent at the edge of the clouds and the hole in the center of the cloud cluster, and these regions are identified as clear in MYD35. Figure 5 shows images taken on Indonesian islands, including clouds at different heights and optical thicknesses over both surfaces of the ocean and vegetation. NNDT algorithm can recognize all kinds of cloud as cloud pixels, such as small scattered clouds, thick clouds, and optically thin clouds. The cloud area of CAPI cloud flag and MYD35 is basically the same. Figure 6 illustrates the results of cloud detection over bare surface. Like MYD35, the NNDT algorithm does not determine the locally highlighted sandbanks, which are easily identified as clouds. MYD35 sometimes recognizes inland intermittent waters as clouds, and the NNDT algorithm avoids this situation well. By comparing the CAPI cloud flag with the CAPI pseudo color image, it can be seen that NNDT algorithm can determine the distribution position of small scattered clouds. For the recognition of small scattered clouds, the cloud shape of CAPI cloud flag is more discrete, while that of MYD35 is more continuous. Figure 7 shows a typical desert scene, Sahara Desert scene. Its clear sky surface pixels have high reflectance, which can be easily identified as clouds. Nevertheless, the CAPI cloud flag is reasonable compared with the CAPI RGB image and MYD35. MYD35 with only "cloud" and "clear" results.
As can be seen from the cloud identification of the northwest coast of Australia and Indian ocean area in Figure 4, the cloud area contours of CAPI cloud flag and MYD35 are very similar. Small clouds around the cloud cluster in the CAPI RGB image are also well recognized. Compared with MYD35, the cloud regions identified by NNDT algorithm are more coherent at the edge of the clouds and the hole in the center of the cloud cluster, and these regions are identified as clear in MYD35. The cloud detection results for a scene over the Antarctic continent are shown in Figure 8. For pixels covered with snow and/or sea ice, it is generally difficult to distinguish clear sky and clouds because they have a visual feature and spectrum similar to clouds, namely, white with high reflectance. Comparison with the CAPI RGB image suggests that our algorithm can accurately distinguish between clear sky and cloudy areas over the bright snow and ice surface. The NNDT algorithm identifies the cloud anti bright edge as cloud and the cloud shadow as clear sky. It shows the sensitivity and accuracy of NNDT algorithm in distinguishing cloud anti bright edge from surface and cloud shadow from cloud. In addition, for pixels with high and thin cloud distribution, MYD35 is identified as clear sky, while NNDT algorithm can accurately identify them as clouds. Remote Sens. 2021, 13, x FOR PEER REVIEW 13 of 26  Figure 5 shows images taken on Indonesian islands, including clouds at different heights and optical thicknesses over both surfaces of the ocean and vegetation. NNDT algorithm can recognize all kinds of cloud as cloud pixels, such as small scattered clouds, thick clouds, and optically thin clouds. The cloud area of CAPI cloud flag and MYD35 is basically the same.   Figure 6 illustrates the results of cloud detection over bare surface. Like MYD35, the NNDT algorithm does not determine the locally highlighted sandbanks, which are easily identified as clouds. MYD35 sometimes recognizes inland intermittent waters as clouds, and the NNDT algorithm avoids this situation well. By comparing the CAPI cloud flag with the CAPI pseudo color image, it can be seen that NNDT algorithm can determine the distribution position of small scattered clouds. For the recognition of small scattered clouds, the cloud shape of CAPI cloud flag is more discrete, while that of MYD35 is more continuous.  Table 5 lists the cloud cover (the percentage of cloud pixels in the total pixels) identified from MODIS and CAPI images in these five scenes. It can be seen from the cloud cover of the two results that the two values are close, with only 1-5% difference. Therefore, it intuitively proves the rationality of the NNDT algorithm in this paper.

Quantitative Comparison with SGLI Cloud Detection Production
The verification experiment in Section 3.1 cannot provide quantitative comparison verification, so for the NNDT algorithm, this paper designs a pixel-by-pixel verification scheme based on SGLI radiance data at the top of atmosphere (LTOA) and SGLI official cloud recognition products (SGLI-CLFG). SGLI (the Second-generation GLobal Imager aboard the GCOM-C satellite, launched in December 2017) has 19 channels covering ultraviolet to thermal infrared spectra and two polarization and bidirectional channels, and contains all the channels used in the NNDT algorithm which are listed in Table 6. The thresholds of the tests are taken directly from the NNDT algorithm, considering the similar corresponding spectral bands between CAPI and SGLI. SGLI official cloud recognition product (SGLI-CLFG) is similar to MYD35, using the cloudy/clear discrimination algorithm (CLAUDIA) [31], which includes abundant cloud tests from visible to thermal infrared band. CCL is represented by a 3-bit binary number and is divided into 8 levels. In order to facilitate the comparison, this article simplifies the CLFG levels and merges them into two categories-cloud and clear. Specific applications are shown in Table 7. Please refer to the user's Manual for the specific application process [51,52].
In this paper, the algorithm verification experiments based on four scenes with different cloud attributes and surface features are selected as examples. The scene locations are shown in Table 8. The cloud screening cases over the four scenes are shown in Figures 9  and 10. Each scene has two panels: (a) The SGLI true color image of this scene synthesized from the VN8 (red), VN5 (green), and VN3 (blue) channel reflectance data and (b) The pixel-by-pixel comparison results of NNDT cloud recognition results (NNDT CLFG) based on SGLI observation data and SGLI official cloud recognition products (SGLI CLFG).  Table 8. The sites and locations of the four scenes for this verification experiment.

Scenes Site Location
Ocean As shown in Figures 9 and 10, the comparison results between NNDT-CLFG and SGLI-CLFG record pixels in four cases: A. NNDT-CLFG and SGLI-CLFG are both cloudy, B. both are clear, C. NNDT-CLFG are cloudy and SGLI-CLFG are clear, and D. NNDT-CLFG are clear and SGLI-CLFG are cloudy. It is apparent that the NNDT-CLFG provides relatively good agreement with both the SGLI-CLFG and the composite true-color images over the vegetation, desert, and ocean scenes. The cloud areas marked by the two results are basically the same, and only the determination of a very few pixels at the edge of the cloud has a deviation, which is within a reasonable range in the thresholds cloud recognition. However, for some optically thin clouds with extremely thin visual effects in the desert scene, both algorithms have the phenomenon of missing detection. The larger deviation between NNDT-CLFG and SGLI-CLFG appears in the polar scene, where a large number of cloud pixels at the edge of the cloud cluster marked by NNDT-CLFG are marked as clear sky by SGLI-CLFG. But by visually comparing NNDT-CLFG and true color images, it can be found that the two are highly consistent. These prove the effectiveness of the NNDT algorithm.
Four validation scores are used here for the quantitative analysis: the probability of detection (POD), the false-alarm ratio (FAR), the hit rate (HR), and Kuiper's skill score (KSS) [53]. The definitions are: where a and d, respectively, represent the number of pixels where NNDT-CLFG and SGLI-CLFG are both "cloud" and "clear sky", b represents the number of pixels where NNDT-CLFG are "clear" but SGLI-CLFG are "cloud", and c represents the number of pixels where NNDT-CLFG are "cloud" but SGLI-CLFG are "clear". These six scores all take SGLI-CLFG as standard, which means that SGLI-CLFG is correct. The POD and FAR scores are the measure of the efficiency of the cloud identification algorithm in determining either cloud or clear events [54]. The POD values are supposed to be as close to 1 as possible. Conversely, the FAR values are supposed to be as close to 0 as possible. The HR values estimate the overall efficacy of the cloud detection algorithm. In addition, the KSS index is a complementary measurement method, reflecting the misclassifications to some extent [54]. It is used to evaluate the performance of the algorithm in separating cloud events from clear events. A value of 1.0 represents a perfect discrimination, while a value of −1.0 describes a complete discrimination failure.  Table 8. The cloud screening cases over the four scenes are shown in Figure 9 and Figure 10. Each scene has two panels: (a) The SGLI true color image of this scene synthesized from the VN8 (red), VN5 (green), and VN3 (blue) channel reflectance data and (b) The pixel-by-pixel comparison results of NNDT cloud recognition results (NNDT CLFG) based on SGLI observation data and SGLI official cloud recognition products (SGLI CLFG).   As shown in Figure 9 and Figure 10, the comparison results between NNDT-CLFG and SGLI-CLFG record pixels in four cases: A. NNDT-CLFG and SGLI-CLFG are both cloudy, B. both are clear, C. NNDT-CLFG are cloudy and SGLI-CLFG are clear, and D. NNDT-CLFG are clear and SGLI-CLFG are cloudy. It is apparent that the NNDT-CLFG provides relatively good agreement with both the SGLI-CLFG and the composite truecolor images over the vegetation, desert, and ocean scenes. The cloud areas marked by the two results are basically the same, and only the determination of a very few pixels at the edge of the cloud has a deviation, which is within a reasonable range in the thresholds cloud recognition. However, for some optically thin clouds with extremely thin visual The scores of POD, FAR, HR, and KSS for NNDT algorithm, based on the cloud flag dataset of all seasons, are illustrated in Figure 11. Using the NNDT algorithm proposed in this paper, except for the polar region (the POD clear , HR, and KSS values of this region are about 0.82, 0.86, and 0.82, respectively), the POD and HR scores of cloud pixels and clear pixels reach a relatively high value of more than 0.90 or close to 1. Except for the region above the polar (the FAR cloud in this region is about 0.44), the FARs of other regions remain low or even close to zero. Overall, the results are encouraging.

Discussion
From the visual comparison results in Section 3.1, it can be seen that the NNDT algorithm cloud recognition results and MYD35 have a relatively consistent cloud distribution range in the five scenes, but there are differences in local details. Moreover, there is a value difference between 1% and 5% of the cloud amount between the two results. The common reasons that caused the cloud cover difference are that the cloud displacement caused during the transit time difference between CAPI and MODIS as well as the cloud distortion caused by the observation geometric difference and geometric correction error [55]. Other reasons will be discussed separately for different scenarios.
In the Pacific Ocean scene, compared with MYD35, the cloud area identified by the NNDT algorithm is more consistent with the distribution of clouds. In most cases, the spatial distribution of cloud optical thickness is gradual. The distribution areas of optically thin clouds include the edge of the cloud, the hole in the center of the cloud cluster, and the middle of two close cloud clusters, which are difficult to be seen subjectively in RGB images. Due to the sensitivity of 0.38 μm band to the edge details of middle and low clouds and 1.375 μm band to optically thin clouds, the existence of these thin clouds can be objectively determined by the reflectivity thresholds of these two bands. Therefore, for the phenomenon that the pixels of optically thin clouds or edge clouds are recognized as clouds by NNDT algorithm, but not in MYD35, it shows the advantages of NNDT algorithm in the recognition of optically thin clouds and cloud edge pixels on the ocean surface, excluding the cloud displacement during this time difference and the cloud distortion caused by the geometric difference between the two instruments.
In the cloud detection results of bare ground in Figure 6, for the recognition of small scattered clouds, the cloud shape of CAPI cloud flag is more discrete, while the cloud shape of MYD35 is more continuous. The reason is that while the NNDT algorithm uses the threshold test of the band combination R 0.87 μm R 1.64 μm ⁄ to suppress the recognition of the highlighted sandbank as a cloud, it also causes a part of the cloud pixels

Discussion
From the visual comparison results in Section 3.1, it can be seen that the NNDT algorithm cloud recognition results and MYD35 have a relatively consistent cloud distribution range in the five scenes, but there are differences in local details. Moreover, there is a value difference between 1% and 5% of the cloud amount between the two results. The common reasons that caused the cloud cover difference are that the cloud displacement caused during the transit time difference between CAPI and MODIS as well as the cloud distortion caused by the observation geometric difference and geometric correction error [55]. Other reasons will be discussed separately for different scenarios.
In the Pacific Ocean scene, compared with MYD35, the cloud area identified by the NNDT algorithm is more consistent with the distribution of clouds. In most cases, the spatial distribution of cloud optical thickness is gradual. The distribution areas of optically thin clouds include the edge of the cloud, the hole in the center of the cloud cluster, and the middle of two close cloud clusters, which are difficult to be seen subjectively in RGB images. Due to the sensitivity of 0.38 µm band to the edge details of middle and low clouds and 1.375 µm band to optically thin clouds, the existence of these thin clouds can be objectively determined by the reflectivity thresholds of these two bands. Therefore, for the phenomenon that the pixels of optically thin clouds or edge clouds are recognized as clouds by NNDT algorithm, but not in MYD35, it shows the advantages of NNDT algorithm in the recognition of optically thin clouds and cloud edge pixels on the ocean surface, excluding the cloud displacement during this time difference and the cloud distortion caused by the geometric difference between the two instruments.
In the cloud detection results of bare ground in Figure 6, for the recognition of small scattered clouds, the cloud shape of CAPI cloud flag is more discrete, while the cloud shape of MYD35 is more continuous. The reason is that while the NNDT algorithm uses the threshold test of the band combination R f (0.87 µm)/R f (1.64 µm) to suppress the recognition of the highlighted sandbank as a cloud, it also causes a part of the cloud pixels with a gray value similar to that of the sandbank to be recognized as clear. Therefore, the edges of some small scattered clouds are missed. The MODIS cloud recognition algorithm includes cloud detection in the thermal infrared band [32], which can make up for the omission of point cloud detection. Point clouds are thick clouds. However, for high and thin clouds, where the 1.375 µm test is not limited by the R f (0.87 µm)/R f (1.64 µm) band combination, there will be no similar missed detection.
From the six verification scores of the four scenarios in Section 3.2, it can be seen that the POD values are all greater than 0.8, most of which are close to 1, indicating that the NNDT algorithm cloud recognition result basically conforms to the official SGLI algorithm result. The FAR clear values are all below 0.1, even close to 0, except in the ocean scene, indicating that the NNDT algorithm rarely distinguishes the pixels marked as cloud by SGLI CLFG as clear. In the ocean scene, the FAR clear value is slightly larger than in other scenes, which is 0.14. The reason is that the tests use the thresholds based on CAPI data statistics, which cannot be fully applicable to SGLI data, so the NNDT algorithm misses some cloud pixels, which is most obvious on the underlying surface of the ocean. The FAR cloud has good values close to 0 in other scenes, but it is 0.44 in the polar scene. However, the high consistency of NNDT-CLFG with true color images proves the effectiveness of the NNDT algorithm. The HR values of all scenes are between 0.86 and 0.99, which indicates that the NNDT algorithm has a high cloud recognition hit rate and further proves that the NNDT algorithm has a high cloud recognition accuracy.
Through the verification examples and verification scores in Sections 3.1 and 3.2, it can be seen that, in general, the NNDT algorithm cloud recognition results have a more consistent range and cloud amount on the four underlying surfaces compared with MYD35 and SGLI CLFG. The NNDT algorithm has excellent performance in the ocean, vegetation, and snow underlying surface, especially in the detection of thin clouds and cloud edges, which are difficult to detect. In scenes affected by sand with high reflectivity, the NNDT algorithm can identify most of the clouds, although the recognition of pixels at the edge of the cloud and thin cloud pixels is insufficient. Moreover, the algorithm can effectively avoid identifying inland waters as clouds. These verifications can prove that the 0.38 µm band test can be used as an effective alternative to the 0.67 and 0.87 µm band tests in the cloud recognition algorithm and does not require the support of the minimum surface reflectance database of the corresponding band. The band test of R f (0.38 µm)/R f (1.64 µm) can be effectively used to identify cloud pixels on the polar surface. Based on the principle of identifying all cloud pixels in the image as much as possible, the NNDT algorithm proposed in this paper is more concise than the CLAUDIA algorithm and other algorithms that need to calculate the confidence level. It shows that the NNDT algorithm is reasonable, efficient, and suitable for cloud recognition of CAPI data.

Conclusions
Based on the requirements of TANSAT satellite sensor aerosol and CO 2 inversion for cloud detection, an innovative cloud recognition algorithm-NNDT (near-ultraviolet to near-infrared band with threshold tests for different underlying surfaces), suitable for CAPI data, is proposed. The algorithm uses unique band test combinations for four types of underlying surface, which are ocean, vegetation, desert, and polar regions, respectively. From the CAPI 2017 data, the four underlying surfaces are counted to obtain the thresholds used in all cloud recognition tests of this algorithm. Subsequently, two comparison methods are used, namely, a visual comparison method based on CAPI data and MODIS official cloud recognition results, and a quantitative comparison method based on SGLI data and SGLI official cloud recognition results, to verify and evaluate the effectiveness of the NNDT algorithm. Both verification methods randomly selected typical target areas representing four surface types (ocean, vegetation, desert, polar) and performed experimental verification. It can obtain good cloud recognition effects in all scenarios, with a hit rate between 0.86 and 0.99. Combined with the analysis of the spectral characteristics, it preliminarily shows that the 0.38 µm band can well distinguish water, vegetation, and desert surface from clouds, and the ratio of 0.38 to 1.64 µm band has an excellent distinction ability between clouds and polar snow in the CAPI data.
This algorithm uses the combined threshold detections of near-ultraviolet to nearinfrared bands to get rid of the drawbacks introduced using the 0.67 and 0.87 µm minimum reflectance database in traditional cloud detection algorithm. Moreover, the introduction of near-ultraviolet band to cloud detection algorithm enables it to obtain an effect comparable to that with thermal infrared band support, even without using the thermal infrared band. Based on the principle of identifying cloud pixels as much as possible, classifying all image pixels into cloud and clear categories can greatly improve the efficiency and accuracy of data utilization in aerosol and CO 2 inversion. This algorithm applies the 0.38 µm and the ratio of 0.38 to 1.64 µm to cloud recognition unprecedentedly, providing a theoretical basis and new ideas for future research.
In this paper, the underlying surface types are divided into four categories. The experimental regions selected for algorithm validation are different from those selected for threshold statistics. However, the application scenarios of this article cannot cover all the more detailed and special surface types in the world. In the future, we will focus on exploring and studying the differences of thresholds in different regions. According to the spectral characteristics in different regions, the underlying surfaces can be grouped in more detail, and the corresponding band detection schemes and thresholds can be set. Furthermore, cloud shadow is also a challenge in cloud detection. The surface data covered by cloud shadows will also affect the use of data by researchers [56][57][58]. In this paper, this method cannot solve this problem. This algorithm recognizes all cloud shadow areas as clear, which may affect the application effect of data in aerosol inversion. In future work, we will consider some bands and combinations that can detect cloud shadows, as well as corresponding threshold statistics.