A Method for Retrieving Cloud-Top Height Based on a Machine Learning Model Using the Himawari-8 Combined with Near Infrared Data

Dong, Yan; Sun, Xuejin; Li, Qinghui

doi:10.3390/rs14246367

Open AccessArticle

A Method for Retrieving Cloud-Top Height Based on a Machine Learning Model Using the Himawari-8 Combined with Near Infrared Data

by

Yan Dong

,

Xuejin Sun

^* and

Qinghui Li

College of Meteorology and Oceanography, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(24), 6367; https://doi.org/10.3390/rs14246367

Submission received: 10 November 2022 / Revised: 14 December 2022 / Accepted: 14 December 2022 / Published: 16 December 2022

(This article belongs to the Special Issue Meteorological Remote Sensing Algorithm and Applications for Clouds and Precipitation)

Download

Browse Figures

Versions Notes

Abstract

:

Different cloud-top heights (CTHs) have different degrees of atmospheric heating, which is an important factor for weather forecasting and aviation safety. AHIs (Advanced Himawari Imagers) on the Himawari-8 satellite are a new generation of visible and infrared imaging spectrometers characterized by a wide observation range and a high temporal resolution. In this paper, a cloud-top height retrieval algorithm based on XGBoost is proposed. The algorithm comprehensively utilizes AHI L1 multi-channel radiance data and calculates the input parameters of the generated model according to the characteristics of the cloud phase, texture, and the local brightness temperature change of the cloud. In addition, the latitude, longitude, solar zenith angle and satellite zenith angle are input into the model to further constrain the influence of the geographical and spatial factors such as the sea and land location, on CTH. Compared with the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO) cloud-top height data (CTH_CAL), the results show that: the algorithm retrieved the cloud-top height (CTH_XGB) with a mean error (ME) of 0.3 km, a standard deviation (Std) of 1.72 km, and a root mean square error (RMSE) of 1.74 km. Additionally, it improves the problem of the large systematic deviation in the cloud-top height products released by the Japan Meteorological Agency (CTH_JMA), especially for ice clouds and multi-layer clouds with ice clouds on the top layer. For water clouds below 2 km and multi-layer clouds with water clouds at the top, the algorithm solves the systematically serious CTH_JMA problem. XGBoost can effectively distinguish between different cloud scenarios within the model, which is robust and suitable for CTH retrieval.

Keywords:

CALIPSO; CTH; Himawari-8; XGBoost; retrieval

1. Introduction

Clouds play important roles in balancing the radiation of the Earth atmosphere system and in climate change. They are regulators that affect the radiation budget at the top of the atmosphere and the Earth’s surface [1,2]. The radiative effect of clouds is one of the biggest uncertainties when evaluating future climate change [3]. There are differences in the degree of atmospheric heating caused by clouds at different heights [4]. In addition, cloud-top height (CTH) is also an important parameter in the fields of weather forecasting, weather modification and aviation safety [5]. The accurate measuring of CTH is very important for climate change predictions [6,7].

At present, the measurement of CTH is mainly based on ground observations and satellite observations [8,9,10]. Ground-based remote sensing can continuously observe CTH, but it can only be observed at fixed points, and the observation space is small [11]. Meteorological satellites have extensive observation coverage, which can generate large-scale CTH observations and provide observation data over the ocean and polar regions without the need for ground stations. Therefore, satellite CTH measurements have become an important method. The remote sensing of CTH using satellites mainly includes active remote sensing and passive remote sensing. Active sensors such as Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) mounted on CALIPSO and Cloud Profiling Radar (CPR) mounted on CloudSat can obtain the vertical distribution of clouds [12,13]. Both systems have global observation capabilities, but the time resolution is low, and the field of view is small, making it difficult to continuously observe a certain area for a long time. In contrast, CTH remote sensing using geostationary meteorological satellite imagers enables continuous observations over a large, fixed area.

The new generation of geostationary meteorological satellites such as FY4A, Himawari-8, GOES-R, and MSG has provided CTH products, but the accuracy needs to be improved. The CTH algorithm released by FY−4A uses a combination of brightness temperatures in the 10.8 µm, 12.0 µm and 13.5 µm bands to retrieve the CTH [14]. This method is similar to the CTH retrieval method published by GOES-R, which uses the CO₂ slice method combined with the optimal estimation method to retrieve CTH [15,16,17]. Compared with CloudSat and CALIPSO, the standard deviation (Std) of this method is about 2 km [18]. The CTH retrieval algorithm released by Himawari-8 used the 11 µm, 12 µm, and 13.5 µm channel data to obtain the CTH using the Integrated Cloud Analysis System (ICAS), where the root mean square error (RMSE) of the single-layer cloud was 2.1 km [19]. Multi-channel imagers can retrieve CTH using thermal infrared imaging, CO₂ slices, and one-dimensional variational (1DVAR) methods [20,21,22]. Most CTH retrieval methods for passive sensors involve radiative transfer models (RTMs), which usually suffer from large uncertainties in cloudy skies [23,24]. It has limited CTH accuracy for optically thin or broken clouds [25]. Several previous studies pointed to significant bias in CTH measured by passive sensors [26,27,28]. Some scholars have considered improving the deviation of CTH products. Although good results have been achieved, systematic deviations still exist [29,30].

To date, the CTH retrieval algorithm of the above-mentioned multi-channel imager has not given full play to the advantages of multi-channel and high spatial resolution, and only used a few channels of thermal infrared imaging. In addition to the thermal infrared band, the near-infrared band can also be used to retrieve the cloud parameters [31,32]. Palmer analyzed the optical properties of water clouds in the near-infrared band in 1974 [33]. Pilewskie used the reflectivity of the 1.65 µm channel for cloud-phase identification and achieved good results [34]. This proves that the 1.65 µm channel can effectively extract the different information of the water cloud and ice cloud, and CTH has an inseparable relationship with the phase state of the cloud. Using near-infrared channels such as 1.65 µm to distinguish the characteristics of water clouds and ice clouds can help improve the accuracy of CTH retrieval. The CO₂ slice method is to retrieve the single pixel observed by the imager independently, without considering the characteristic cloud information between the adjacent pixels. The comprehensive use of neighbor pixels can reflect the heterogeneity and scale of clouds. This paper builds a multi-channel radiance CTH retrieval algorithm based on XGBoost. The algorithm uses seven channels of AHI data in total. In this case, 1.6 µm channel reflectivity is used as one of the input parameters of the model, the purpose of which is to establish the relationship between the cloud-phase state and the CTH within the model. At the same time, this paper extracts the cloud neighborhood image element information as a model input variable, which can better analyze cloud uniformity, scale size, and other characteristics. These features have a very positive effect on CTH retrieval.

In recent years, the continuous development of retrieval algorithms based on machine learning has provided a new solution for the retrieval and prediction of meteorological elements [35,36,37]. For example, Min proposed a machine-learning-based CTH retrieval algorithm; four machine learning models were trained and compared with traditional physical algorithms, and the four models retrieved CTH results with improved accuracy [28]. Recently, Wang proposed an algorithm to retrieve CTH using a deep neural network (DNN) model, which inputs not only the brightness temperature data of the 8.6 µm, 10.4 µm, 12.4 µm and 13.3 µm channels of AHI into the model, but also the vertical temperature profile, surface elevation, and other geographic information parameters. Comparing this algorithm with the joint data of CALIOPSO/CloudSat, the CTH result of the mean error (ME) is −0.13 km, and the RMSE is 3.37 km [38]. The phase state of clouds is inextricably linked to CTH, but current algorithms for artificial neural networks or machine learning to retrieve CTH only select the thermal infrared channel of AHI, while NIR can distinguish the different phase states of clouds more effectively.

In this paper, we propose a CTH retrieval method combining NIR channels based on the XGBoost model, and it can be applied to the observation data of the Himawari-8 satellite. In this method, the multi-channel radiance data of AHI, the spatially inhomogeneous cloud feature parameters, and the corresponding geographic information parameters are used as the input variables of the model. Taking the CALIPSO profile data corresponding to time and space to extract CTH as the output variable, the XGBoost model is trained, and the CTH retrieval algorithm based on the XGBoost model is obtained. The method utilizes feature parameters such as NIR and proximity image elements for cloud characterization, which allows for the effective differentiation of the clouds within the model and has a positive effect on CTH retrieval.

2. Method

2.1. Data

This paper uses the AHI L1 full-disk radiance data of the Himawari-8 satellite, L2 CTH product data and CALIOP 5 km cloud layer product data (Level 2 Clayer) of the CALIPSO satellite.

The Himawari-8 satellite, located near 140.7°E above the equator, is a new-generation Japanese geostationary meteorological satellite. The satellite was launched on 7 October 2014, and has been operational since 7 July 2015. The Advanced Himawari Imager (AHI) payload it carries has a total of 16 channels, including 3 channels for visible light (VIS), 3 channels for near-infrared (NIR), and 10 channels for infrared (IR). The AHI can obtain 16 channels of full disk observations every 10 minutes. The Himawari-8 AHI L1 full-disk data have a spatial resolution of 5 km × 5 km and cover the regions from 60°N to 60°S and 80°E to 160°W [39]. The AHI L2 CTH data are included in the CLP (Cloud Parameters) product data file provided by the Japan Meteorological Agency (JMA) with a spatial resolution of 5 km × 5 km. The AHI L2 CTH retrieval algorithm is roughly the same as the Integrated Cloud Analysis System (ICAS) algorithm, using AHI’s 11 µm, 12 µm, and 13.5 µm channel data. The ICAS algorithm was developed by Iwabuchi in 2016, which comprehensively utilizes the CO₂ slicing method, the infrared split window method, and the OE algorithm [40]. In 2018, Iwabuchi applied the ICAS algorithm to AHI and demonstrated its estimation accuracy for CTH [19]. At present, the Himawari-8 satellite only provides CTH data in the daytime, and this paper only verifies the retrieval algorithm results in the daytime.

The Cloud-Aerosols Lidar and Infrared Pathfinder Satellite Observations (CALIPSO) satellite was launched in 1998 by the National Aeronautics and Space Administration (NASA) and the Centre National d’Etudes Spatiales (CNES) [41]. It belongs to the A-Train satellite constellation [42]. The main payload of the CALIPSO satellite is Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP). CALIOP has three channels, 1064 nm channel and the orthogonally polarized components of 532 nm, one is parallel, and the other one is perpendicular. CALIOP mainly includes Level 1B, Level 2 Profile, Level 2 VFM, and Level 2 Clayer/Alayer data, which can provide the cloud and the aerosol type and location information. This paper uses the CTH in the CALIOP 5 km cloud layer product data (Level 2 Clayer) as the ground truth for model training and validation. Considering that the temporal resolution of the AHI full-disk image is 10 min and the spatial resolution of L2 CTH is 5 km, it is stipulated that the observation time of a profile in CALIOP and the observation time of a pixel in AHI should not exceed 10 min and the spatial distance should not exceed 5 km, it is considered to be a space time match. The green line shown in Figure 1 is the space-time matching point between CALIOP and AHI. The background in the figure is the brightness temperature of the 11th channel (BT8.6 µm) of AHI at 05:10 on 1 January 2019, and the green line is the ground observation track of CALIOP during the period of 05:00~05:10 on that day.

2.2. Retrieval Algorithm

This paper designs a CTH-retrieval algorithm based on the eXtreme Gradient Boosting (XGBoost) model. XGBoost is an ensemble learning algorithm based on gradient boosting. Its principle is to achieve an accurate fitting effect through the iterative calculation of the weak estimators. The model is suitable for nonlinear fitting processing [43]. The CTH retrieval is essentially a multi-parameter nonlinear fitting problem. XGBoost consists of an ensemble algorithm, weak estimators, and an application module, in which the ensemble algorithm and weak estimator are its core [44]. The ensemble algorithm builds multiple weak estimators and aggregates the modeling results of all weak estimators to obtain better regression or classification performance than a single model. Unlike random forest (RF), which builds multiple parallel independent weak estimators at one time, XGBoost is a method of building weak estimators one by one and accumulating multiple weak estimators after multiple iterations [45]. A certain number of samples are randomly selected from the total sample set to form a training set, and the weak estimator is trained by using the training set to generate a weak estimator based on the tree model. Then, the results estimated by the weak evaluator are evaluated, and the samples with large deviations are marked. After that, the random replacement sampling is continued from the total sample set, and the probability of being selected by the marked sample increases to form a new training set, generating two weak evaluators. After many iterations, an ensemble algorithm of multiple weak evaluators is obtained, thereby generating the final XGBoost model. The XGBoost model is developed from the Gradient Boosting Decision Tree (GBDT) and is optimized in many ways based on the GBDT model. Firstly, XGBoost’s weak evaluator is not only a tree model but also supports linear models. Secondly, while GBDT uses only first-order derivative information in its optimization, XGBoost performs a second-order Taylor expansion on the cost function, which not only speeds up the convergence of the model during training but also retains more information about the objective function, which is useful for improving accuracy. Thirdly, XGBoost has better robustness with the addition of a strategy to automatically handle the missing-value features. Finally, XGBoost supports the parallel processing of features, which is more computationally efficient. Therefore, compared to GBDT and the random forest model, XGBoost is more suitable for retrieving CTH using imagers.

2.3. Model Input Parameters

AHI has a total of 13 channels. In this paper, the 5th, 7th, 10th, 11th, 14th, 15th, and 16th channel radiance data of AHI are selected as the basic parameters. Table 1 shows each channel band and its corresponding physical characteristics. The near-infrared band of 1.6 µm is a weak absorption band of water vapor, and this channel can be used to identify cirrus clouds [34]. The absorption difference between ice clouds and water clouds is obvious in the vicinity of the 3.9 µm band [32]. The 7.3 µm and 8.6 µm bands are the water vapor absorption channels. These channels are useful for distinguishing between ice and water clouds. The brightness temperature of 11 µm or 12 µm (BT11 or BT12) is the basic variable for retrieving the CTH of thick clouds, which is similar to the cloud-top temperature of thick clouds [46]. In the presence of optically thin cloud, brightness temperature differences of 11 µm and 12 µm can reflect the transparency of the cloud [47]. The 13.3 µm is the CO₂ absorption channel, which helps to improve the CTH retrieval accuracy of high clouds.

The input parameters include the single-channel reflectance/brightness temperature (R1.6, BT7.3, BT11.2, BT13.3), two-channel brightness temperature difference (BT11.2-BT12.3, BT8.6-BT12.3, BT7.3-BT12.3, BT13.3-BT12.3), the Std of reflectivity/brightness temperature (R1.6, BT3.9, BT11.2), and the Std of brightness temperature difference (BT11.2-BT12.3, BT11.2-BT3.9) within a 5 × 5 grid difference, the difference between the brightness temperature of the 11.2 µm channel and the warmest/coldest brightness temperature within the 5 × 5 pixel range of the channel (BT11.2-BT11.2 W, BT11.2-BT11.2 C), and the warmest/coldest brightness temperature difference in the range of 5 × 5 pixels (BT12.3 W-BT11.2 W, BT12.3 C-BT11.2 C, BT11.2 W-BT3.9 W, BT11.2 C-BT3.9 C), which can be seen in Table 2. Brightness temperature differences can not only distinguish optical thin clouds but also identify the pixel information at the edge of the cloud. The Std of brightness temperature and that of brightness temperature difference can reflect the texture information of clouds. The difference between the hottest and the coldest brightness temperature from the neighbor pixel represents the gradient of the brightness temperature change in the cloud area, which in turn distinguishes the uniformity and the scale of the cloud. In addition, Hamada believed that constraints such as latitude, satellite zenith angle, and season could reduce the error of the imager in retrieving CTH [48]. Håkansson analyzed the relationship between the CTH bias retrieved by different imagers and the satellite observation zenith angle and found that the CTH bias was proportional to the satellite observation zenith angle [37]. Different seasons affect the height intervals of clouds with different phases in the atmosphere at different latitudes, which can cause the correlation between CTH and cloud phases to become more ambiguous. Different models need to be trained for different seasons to exclude this interference. This is the reason why only winter samples are selected for the experiments in this paper. Latitude can reflect the seasonal changes in the direct solar point and, combined with longitude, it can indirectly reflect the influence of sea and land differences on CTH. In this paper, the solar zenith angle (SZA), satellite observation zenith angle (viewing zenith angle, VZA), and longitude and latitude are also used as the input parameters of the model.

The XGBoost algorithm can generate an order of importance score for each input feature. As shown in Figure 2, (BT3.9)text and BT11.2-BT11.2 C are the two most important input variables. (BT3.9)text can effectively distinguish the phase types of clouds and their textural characteristics in the 5 × 5 pixel range. BT11.2-BT11.2 C can respond to Cloud Optical Thickness changes in the 5 × 5 pixel range. First, the importance scores are greater than 5% for any input variable involving NIR, which indicates the importance of using NIR to improve the accuracy of CTH retrieval in this paper. Second, the sum of the contributions of the variables with extracted neighborhood pixel information is greater than the sum of the variables without extracted neighborhood pixels. This proves that the retrieval of CTH is not enough to obtain information only on individual pixels. Finally, the importances of the input features do not differ much from each other and each input feature has a certain level of importance.

2.4. Model Training Method

The method of CTH retrieval based on the XGBoost model in this paper uses AHI L1 radiance data and CALIOP L2 cloud data for training and testing. The specific training scheme is shown in Figure 3. We selected the AHI L1 and CALIOP L2 cloud-layer data from January, February, and December 2019 as the data set for model training. The aim of building a winter-only training set is to help constrain the seasons and obtain more accurate results. The CTHs of the CALIOP cloud data are used as the training target. If the cloud thickness of the top layer is less than 20 m, the matching point is eliminated from the training set. A total of 87,315 matching points were used to train the XGBoost model. The parameters of the training model were tuned using the 5-fold cross-validation. method. This method is to split all the training set samples equally and randomly into 5 parts, train each of these 4 parts to obtain a model and leave 1 part for testing the model, which will give a set of prediction results. This scheme will train to obtain 5 models and 5 sets of predictions. The average of these 5 sets of prediction results reflects the effect of the model parameters. The XGBoost model for the AHI retrieval of CTH was obtained by training the model.

3. Result

3.1. Case Analysis

Figure 4 shows the Himawari-8 L2 CTH (CTH_JMA) and the CTH retrieved in this paper (CTH_XGB) at 5:50 (UTC) on 1 January 2021. It can be seen from the figure that the distribution trends of CTH_JMA and CTH_XGB are the same up to a point. Among them, high clouds are found in areas south of Taiwan, Xinjiang, and Tibet (left in the picture), and low clouds are generally predominant in China’s coastal areas and central and southern areas.

Figure 5 shows the comparison results of the CTH products from 05:00 to 06:00 on 1 January 2021 (UTC). In the figure, the pink dots represent the CTH(CTH_CAL) observed by CALIOP, the green dots represent CTH_JMA, and the blue dots represent CTH_XGB. Both CTH_JMA and CTH_XGB had high consistency with CTH_CAL. When the CTH is greater than 10 km, the CTH_JMA is generally low, and the CTH_XGB does not appear in this situation. The reason is that it is based on the XGBoost model that use CTH_CAL data training, and different types of clouds can be effectively summarized in the model, thus improving the retrieval accuracy of CTH of high-level clouds. There is an overestimation of CTH_JMA in areas above 20° latitude.

3.2. Statistic Analysis

Using the established XGBoost model, the cloud-top heights of 2021.01.08~2021.01.10, 2021.02.05~2021.02.07, 2021.12.02~2021.02.04 were retrieved, temporally and spatially matched with CALIOP, and a total of 18,259 matching data were obtained. Figure 6 is a scatter plot comparing CTH_JMA/CTH_XGB with CTH_CAL. Figure 7 shows a comparison chart of the ME, Std, RMSE, and correlation coefficients. It can be seen from Figure 6a that the slope of the fitted line is significantly smaller than that of the reference line, which indicates that above about 2.5 km, with the increase in CTH, the systematic underestimation of the CTH_JMA becomes more obvious. Below about 2.5 km, CTH_JMA is partly overestimated. The reason is that the brightness temperature observed by the CO₂ channel is the radiation brightness temperature of the entire atmospheric column. When the CTH reaches more than 10 km, there are mainly ice clouds. Most ice clouds have smaller cloud geometric thicknesses (CGTs) and higher transmittance, so they are more susceptible to radiation contamination in and under clouds. As the CTH continues to rise, the CGT of the ice cloud also decreases, and the radiation pollution under the cloud is more serious, resulting in a more obvious underestimation of the CTH. Some samples of CTH_JMA are high when the cloud is low. Iwabuchi believed that the temperature inversion near the CTH has an impact on the CO₂ channel, resulting in the phenomenon of low-cloud overestimation [19]. It can be seen from Figure 6b that CTH_XGB achieved a good fitting effect. The fitting line is close to the reference line. The reason is that the model comprehensively utilizes short-wave infrared, which excludes the CO₂ channel to observe the radiation pollution under the cloud. At the same time, the geographical distribution parameters are comprehensively used, which constrain the phenomenon of the low-cloud overestimation caused by the temperature inversion near the CTH to a certain extent.

Figure 7 shows the deviation probability density of CTH_JMA\CTH_XGB compared with CTH_CAL, the curve with a 0.2 km interval, and the dotted line represents the median. ΔCTH_XGB is closer to the normal distribution characteristic than ΔCTH_JMA, and the median of ΔCTH_XGB is about 0.25 km. The probability density curve of ΔCTH_XGB is steeper, which is closer to the normal distribution and has fewer systematic errors. The medians of the two product results are close to the size of ME, indicating that both pieces of data conform to a symmetrical distribution. CTH_XGB corrects the systematic bias of CTH_JMA and achieves the effect of closer normal distribution.

It can be seen from Figure 8 that the ME of CTH_XGB is 0.3 km, the ME of CTH_JMA is −1.27 km, and the systematic error is reduced by 76.32%, which improves the overall low search condition of CTH_JMA. The Std and RMSE of CTH_XGB also decreased compared with CTH_JMA, but the correlation coefficient could not be improved.

Figure 9 shows the comparison of CTH in the five cloud scenarios where the top layer observed by CALIOP is an ice cloud, water cloud, mixed cloud, monolayer cloud, and multi-layer cloud. Among them, the multi-layer cloud is the number of cloud layers observed by CALIOP ≥ 2 pixels. Figure 10 shows the absolute value, Std, RMSE, and correlation coefficient of ME under these cloud scenarios. Among them, from Figure 9(a1,b1), it can be seen that the CTH_JMA of the ice cloud is lower than that of CTH_CAL, and most of the scattered points are below the reference line (CTH_JMA = CTH_CAL), indicating that the CTH products of Himawari-8 are systematically low in ice clouds and multi-layer cloud scenarios with ice clouds on the top layer. However, the scattered points of CTH_XGB and CTH_CAL of ice clouds are distributed around the reference line CTH_XGB = CTH_CAL; the fitted trend line and the reference line basically coincide. This shows that the constructed retrieval model can better retrieve the CTH of ice clouds and solve the problem of the systematic low CTH of Himawari-8 ice clouds. From Figure 9(a2,b2), it can be seen that more scatter points of CTH_JMA than CTH_CAL for water clouds are located between 3 km~7 km below the reference line, indicating that the high product of the water cloud tops of Himawari-8 is systematically low between 4 km~8 km. However, the CTH_XGB and CTH_CAL scattering points of the water cloud are distributed around the reference line, and the fitted trend line basically overlaps the reference line, indicating that the constructed retrieval model can better retrieve the CTH of the water cloud, which better solves the systemic low CTH problem of Himawari-8 water cloud. Secondly, the concentrated area of the CTH_JMA scatter distribution is similar to a circle shape for low clouds located at a CTH of less than 2 km, which indicates that there is a partial overestimation of CTH_JMA below 2 km, while with CTH_XGB, this bias phenomenon is not obvious in the special diagnosis. It can be seen from Figure 9(a3,a4) that the systematic deviation of the fitted line of CTH_JMA basically disappears below 2.5 km, and the lowering of the fitted line becomes more obvious as the CTH increases. This is likely to be a systematic difference in the content of the ice–water mixture in the mixed cloud. When the mixed cloud’s CTH is less than 2.5 km, although it may contain ice crystals, its content is low. When the mixed cloud’s CTH is greater than 15 km, although it may contain liquid particles, its content is also low. CTH_JMA’s retrieval of ice-phase clouds has obvious deviations. Compared with CTH_JMA and CTH_CAL, the fitting line of CTH_XGB and CTH_CAL for mixed clouds is obviously improved. By combining Figure 9(a4,b4,a5,b5), it can be seen that for both the single- and multi-layered clouds, there is an improvement in the fitted lines of CTH_XGB to CTH_CAL and CTH_JMA to CTH_CAL at different levels, especially for high clouds. CTH_JMA and CTH_CAL single-layer clouds have no systematic bias in low clouds, while multi-layer clouds have a systematic bias in low clouds. This may be the radiation pollution caused by the transmittance of the upper ice cloud or the thin upper cloud, which still needs to be further explored. The scatter-fitting results of CTH_JMA and CTH_CAL for single-layer clouds are better than for multi-layer clouds, which is similar to the conclusion drawn by Tan in 2018 [18]. In summary, it can be seen that CTH_XGB is closer to the reference line on the fitting line for different cloud types, which fully proves that the XGBoost model has better robustness. The systematic underestimation of CTH_JMA is mainly due to the small-cloud optical thickness of ice clouds.

Figure 10 shows the probability density distributions of ΔCTH_JMA and ΔCTH_XGB in different cloud scenarios, where the solid line is a broken line with an interval of 0.2 km, and the dashed line is the median of the corresponding cloud scenario. In Figure 10a, the CTH_JMA has obvious normal distribution characteristics in water cloud and single-layer cloud scenarios. The median ΔCTH_JMA of water clouds is about 0.2 km, but there is a probability density extreme value area of around 1.7 km, which indicates that for water clouds, the CTH_JMA is partially overestimated. The median ΔCTH_JMA of the monolayer cloud is about −0.6 km, the median ΔCTH_JMA of the mixed cloud is about −1.2 km, and the median ΔCTH_JMA of the multi-layer cloud and ice cloud is as low as −2.5 km. CTH_JMA has the best retrieval results for water clouds, followed by single-layer clouds, and other cloud scenarios have large systematic underestimations. The ΔCTH_XGB has an obvious normal distribution in each cloud scene, and the median is also distributed near the reference line, indicating that ΔCTH_XGB has a symmetrical distribution. Compared with Figure 9a, the normal distribution of ΔCTH_XGB in different cloud scenarios is more obvious, which shows that the model has better robustness. In addition, ΔCTH_XGB has better symmetry reducing systematic bias.

Table 3 shows the statistical comparison results between CTH_XGB\CTH_JMA and CTH_CAL under different cloud scenarios. Among them, water clouds, ice clouds, and mixed clouds all refer to the-top layer cloud phase. The ice clouds and multi-layer clouds of CTH_JMA have obvious systematic underestimation, followed by mixed clouds, and the systematic deviation of water clouds and single-layer clouds is relatively small. The ME of CTH_XGB in different cloud scenarios is closer to 0, which indicates that the model has better robustness to CTH retrieval in different cloud scenarios.

The model makes full use of the short-wave infrared channel, which plays a role in distinguishing the sublayer radiation pollution of ice clouds and multi-layer clouds in the decision tree model. The RMSE of CTH_JMA in the water cloud scene is 1.67 km, while the RMSE of CTH_XGB is 1.44 km. The Std of CTH_JMA in the water cloud scene is 1.51 km, and that of CTH_XGB is 1.43 km, indicating that CTH_JMA and CTH_XGB have relatively low accuracy and dispersion for water cloud deviation. The results of the two algorithms for water clouds are not much different. Likewise, for single-layer clouds, the CTH_JMA and CTH_XGB results are not much different, but CTH_XGB significantly improves ME and RMSE compared to CTH_JMA in the case of multi-layer clouds. For multi-layer clouds, the scheme adopted in this paper has no obvious systematic deviation in results. The proposed scheme has higher retrieval accuracy for multi-layer clouds.

Figure 11 shows the mean deviation profiles of CTH_XGB and CTH_JMA. Figure 10a shows the ice cloud results. The solid line is a single-layer ice cloud, and the dashed line is a multi-layer cloud with an ice cloud on the top layer. It can be seen that the ΔCTH_JMA of the ice cloud is obviously lower with the increase in CTH, and the underestimation of the multi-layer ice cloud is larger than that of the single-layer ice cloud by nearly 1 km at different heights. For ice clouds, the CTH_JMA retrieval bias is mainly due to the contamination of the brightness temperature observed by the CO₂ channel and the infrared splitting window channel by the transmitted radiation of the atmosphere below the ice cloud. The CTH_XGB retrieved by this model, whether for a single-layer ice cloud or multi-layer ice cloud, has deviation results around 0 km at different heights, and the deviation in the single-layer ice cloud and multi-layer ice cloud is not much different. It shows that the accuracy of CTH_XGB in the case of ice clouds has been greatly improved.

Figure 11b shows the result of the water cloud. The solid line is a single-layer water cloud, and the dashed line represents a multi-layer cloud with a water cloud on the top layer. The CTH_JMA of the water clouds is overestimated below 2 km, and the systematic overestimation of CTH_JMA decreases with the increase in CTH. Iwabuchi believed that this situation might be due to the error caused by the temperature inversion near the CTH, and it may be that other constraints, such as geographic distribution, diurnal cycles, and seasonal changes, are not utilized [19]. It is also possible that CTH_JMA only selects the split window channel and CO₂ channel, ignoring the absorption effect of water vapor on these bands. The lower the CTH, the higher the relative humidity. The reflected long-wave radiation is more likely to be absorbed by near-surface water vapor. The deviation in ΔCTH_XGB is closer to 0 km below 2 km. The bias of ΔCTH_XGB below 2 km is closer to 0 km, which suggests that CTH_XGB has better accuracy for water clouds below 2 km. The reason is that CTH_XGB utilizes the latitude and longitude, solar zenith angle, and satellite zenith angle information as the input parameters of the model. Combined with the very small or even negligible absorption of water vapor by the 1.6 µm channel, the deviation caused by the water vapor absorption of the split window channel and the CO₂ channel can be corrected more effectively. CTH_JMA has underestimated the multi-layer water cloud above 2 km, the underestimation is about 2 km near 5 km, and the 7.5 km has been improved. For the multi-layer water clouds, the CTH_JMA retrieval bias also comes from the fact that the brightness temperature observed by the CO₂ channel and the infrared splitting window channel is polluted by the transmitted radiation from the atmosphere below the top layer of the water cloud. The reason is that when the CTH is above 7.5 km, the COT of the top water cloud is high, and the radiation below the cloud cannot be observed by AHI. At this time, the cloud top temperature is close to the brightness temperature of the split window channel, so the deviation is alleviated with the increase in height. The deviation in ΔCTH_XGB at different heights is basically within 1 km, whether for single-layer water clouds or multi-layer water clouds.

4. Discussion

Currently, the Himawari-8 CTH retrieval method selects the 11 µm, 12 µm, and 13.5 µm channel data of AHI, which was developed by Iwabuchi in 2016. The algorithm combines the CO₂ slicing method, the split window method, and OE algorithm, but there exists a systematic bias problem. Considering the active satellite data, this paper proposes a new CTH retrieval method based on the XGBoost model using Himawari-8/AHI data. In this paper, only winter data were chosen as the experimental sample, the purpose of which was to better constrain the bias caused by seasonality. It not only considers the effect of thermal infrared on CTH retrieval but also takes the radiance values of the 1.6 µm and 3.9 µm channels as the input of the model retrieval, which is helpful in distinguishing the different phases of clouds. Combined with the solar zenith angle and satellite zenith angle, as well as the longitude and latitude set as the input variables of the model, these variables can indirectly reflect the changes in the land, sea, and seasonal influences. In addition, the variable of the adjacent pixel value information plays a very important role in improving retrieval accuracy. The selected satellite data were calculated and divided into five categories of channel reflectance/brightness temperature, the brightness temperature difference between different channels, texture parameters, adjacent hottest/coldest difference, and the geographic and spatial information in the model. The results show that:

The solution proposed in this paper has little difference in the statistical results under different cloud scenarios. It is indicated that XGBoost can distinguish different cloud scenarios and has certain robustness, which is suitable for CTH retrieval. Although this paper uses winter data as an experimental sample, this is only a retrieval scheme for the separate training modeling of different seasons, and the scheme is equally applicable to other seasons.

The CTH_XGB improves significantly in the case of ice clouds and multi-layer clouds with ice clouds at the top. In this scenario, the systematic underestimation of CTH_JMA improves significantly, and the degree of improvement is proportional to the CTH value. At the same time, CTH_XGB significantly improves water clouds and multi-layer clouds, with water clouds on the top layer below 2 km. The CTH_JMA is prone to systematic overestimation, and the degree of improvement is inversely proportional to the CTH value.

For multi-layer clouds with water clouds on the top layer, CTH_JMA has an underestimation above 2 km, and the underestimation of CTH_JMA is about 2 km near 5 km, and it improves at 7.5 km. The reason is similar to the error of multi-layer clouds with ice clouds on the top. However, ΔCTH_XGB is small and varies little with height. For multi-layer clouds whose top layer is a water cloud, the CTH near 2–7.5 km has a relatively obvious improvement.

In general, compared with CTH_JMA, CTH_XGB has a 76.32% improvement in ME compared with CTH_JMA, which shows that the systematic deviation in the retrieval CTH scheme proposed in this paper is significantly reduced; the RMSE is increased by 24.65%, showing that the scheme has higher accuracy. It is improved by 11.9% in Std, indicating that the dispersion of the deviation of this scheme is lower.

5. Conclusions

The algorithm in this paper can better solve the systematic deviation problem of Himawari L2 CTH products, but the improved Std in different cloud scenarios is not large, and the improved CTH deviation dispersion is not obvious. Better improvement of Std will be the focus of the next step. Additionally, it is of great significance to the Himawari-8 satellite remote sensing retrieval of CTHs.

Author Contributions

Y.D.: Data processing, Formal analysis, Investigation, Methodology, Software, Visualization, Validation, Writing—review & editing. X.S.: Conceptualization, Funding acquisition, Supervision, Validation. Q.L.: Data processing, Software, Writing—review & editing. In addition, the above three authors are responsible for the writing and revision of the corresponding parts of the manuscript. Y.D. is responsible for the overall reviewing and editing of the manuscript and other matters not mentioned here. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by National Natural Science Foundation of China grant funded by the Chinese government (41575020).

Data Availability Statement

The data that support the findings of this study are available in https://search.earthdata.nasa.gov/ (accessed on 10 December 2022) for CALIPSO and http://www.ptree.jaxa.jp (accessed on 10 December 2022) for Himawari-8.

Acknowledgments

We sincerely appreciate National Natural Science Foundation of China (41575020), Himawari-8 data provided by the Japan Meteorological Agency and the A-train satellite data downloaded from the NASA Earth data website of the National Aeronautics and Space Administration.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Stephens, G.L.; Webster, P.J. Clouds and climate: Sensitivity of simple systems. J. Atmos. Sci. 1981, 38, 235–247. [Google Scholar] [CrossRef]
Sassen, K.; Wang, Z.; Liu, D. Global distribution of cirrus clouds from CloudSat/Cloud-Aerosol lidar and infrared pathfinder satellite observations (CALIPSO) measurements. J. Geophys. Res. Atmos. 2008, 113. [Google Scholar] [CrossRef]
Wang, H.; Su, W. Evaluating and understanding top of the atmosphere cloud radiative effects in Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5) Coupled Model Intercomparison Project Phase 5 (CMIP5) models using satellite observations. J. Geophys. Res. Atmos. 2013, 118, 683–699. [Google Scholar] [CrossRef] [Green Version]
Li, J.; Yi, Y.; Minnis, P.; Huang, J.; Yan, H.; Ma, Y.; Wang, W.; Ayers, J.K. Radiative Effect Differences between Multi-layered and Single-layer Clouds Derived from CERES.CALIPSO, and CloudSat Data. J. Quant. Spectrosc. Radiat. Transfer. 2011, 112, 361–375. [Google Scholar] [CrossRef]
Boucher, O.; Randall, D.; Artaxo, P.; Bretherton, C.; Feingold, G.; Forster, P.; Zhang, X.Y. Clouds and Aerosols. In Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
Holz, R.E.; Ackerman, S.A.; Nagle, F.W.; Frey, R.; Dutcher, S.; Kuehn, R.E.; Vaughan, M.; Baum, B.A. Global Moderate resolution Imaging Spectroradiometer (MODIS)cloud detection and height evaluation using CALIOP. J. Geophys. Res. Atmos. 2008, 113, 1–17. [Google Scholar] [CrossRef] [Green Version]
Miller, S.D.; Forsythe, J.M.; Partain, P.T.; Haynes, J.M.; Bankert, R.L.; Sengupta, M.; Mitrescu, C.; Hawkins, J.D.; Vonder, H.; Thomas, H. Estimating Three-dimensional Cloud Structure via Statistically Blended Satellite Observations. J. Appl. Meteorol. Clim. 2014, 53, 437–455. [Google Scholar] [CrossRef] [Green Version]
Hollars, S.; Qiang, F.; Comstock, J.; Ackerman, T. Comparison of cloud-top height retrievals from ground-based 35 GHz MMCR and GMS-5 satellite observations at ARM TWP Manus site. Atmos. Res. 2004, 72, 169–186. [Google Scholar] [CrossRef]
Platnick, S.; King, M.D.; Ackerman, S.A.; Menzel, W.P.; Baum, B.A.; Riédi, J.C.; Frey, R.A. The MODIS cloud products: Algorithms and examples from Terra. IEEE Trans Geosci Remote Sens. 2003, 41, 459–473. [Google Scholar] [CrossRef] [Green Version]
Minnis, P.; Sun-Mack, S.; Young, D.F.; Heck, P.W.; Garber, D.P.; Chen, Y.; Spangenberg, D.A.; Arduini, R.F.; Trepte, Q.Z.; Smith, W.L.; et al. CERES Edition-2 Cloud Property Retrievals Using TRMM VIRS and Terra and Aqua MODIS Data—Part I: Algorithms. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4374–4400. [Google Scholar] [CrossRef]
Campbell, J.R.; Dolinar, E.K.; Lolli, S.; Fochesatto, G.J.; Gu, Y.; Lewis, J.R.; Marquis, J.W.; McHardy, T.M.; Ryglicki, D.R.; Welton, E.J. Cirrus Cloud Top-of-the-Atmosphere Net Daytime Forcing in the Alaskan Subarctic from Ground-Based MPLNET Monitoring. J. Appl. Meteorol. Clim. 2020, 60, 51–63. [Google Scholar] [CrossRef]
Da, C. Preliminary assessment of the Advanced Himawari Imager (AHI) measurement onboard Himawari-8 geostationary satellite. Remote Sens. Lett. 2015, 6, 637–646. [Google Scholar] [CrossRef]
Liu, Q.; Li, Y.; Yu, M.; Long, S.C.; Yang, C. Daytime Rainy Cloud Detection and Convective Precipitation Delineation Based on a Deep Neural Network Method Using GOES-16 ABI Images. Remote Sens. 2019, 11, 2555. [Google Scholar] [CrossRef] [Green Version]
Min, M.; Wu, C.; Li, C.; Liu, H.; Xu, N.; Wu, X.; Chen, L.; Wang, F.; Sun, F.; Qin, D.; et al. Developing the Science Product Algorithm Testbed for Chinese Next-Generation Geostationary Meteorological Satellites: Fengyun-4 Series. J. Meteorol. Res. 2017, 31, 708–719. [Google Scholar] [CrossRef]
Heidinger, A.K.; Bearson, N.; Foster, M.J.; Li, Y.; Wanzong, S.; Ackerman, S.; Holz, R.E.; Platnick, S.; Meyer, K. Using Sounder Data to Improve Cirrus Cloud Height Estimation from Satellite Imagers. J. Atmos. Ocean. Technol. 2019, 36, 1331–1342. [Google Scholar] [CrossRef]
Heidinger, A.K.; Pavolonis, M.J. Gazing at Cirrus Clouds for 25 Years through a Split Window. Part I: Methodology. J. Appl. Meteorol. Clim. 2009, 48, 6. [Google Scholar] [CrossRef]
Li, J.; Menzel, W.P.; Schreiner, A.J. Variational Retrieval of Cloud Parameters from GOES Sounder Longwave Cloudy Radiance Measurements. J. Appl. Meteorol. 2001, 40, 312–330. [Google Scholar] [CrossRef]
Tan, Z.; Ma, S.; Zhao, X.; Yan, W.; Lu, W. Evaluation of Cloud Top Height Retrievals from China’s Next-Generation Geostationary Meteorological Satellite FY-4A. J. Meteorol. Res. 2019, 33, 553–562. [Google Scholar] [CrossRef]
Iwabuchi, H.; Putri, N.S.; Saito, M.; Tokoro, Y.; Sekiguchi, M.; Yang, P.; Baum, B.A. Cloud property retrieval from multiband infrared measurements by Himawari-8. J. Meteorol. Soc. Jpn. 2018, 96B, 27–42. [Google Scholar] [CrossRef] [Green Version]
Heidinger, A. ABI cloud height. In NOAA/NESDIS/STAR, GOES-R Algorithm Theoretical Basis Document (ATBD); NOAA NESDIS Center for Satellite Applications and Research: College Park, MD, USA, 2012; pp. 1–77. [Google Scholar]
Schmit, T.J.; Gunshor, M.M.; Menzel, W.P.; Gurka, J.J.; Li, J.; Bachmeier, A.S. Introducing the next generation Advanced Baseline Imager on GOES-R. Bull. Am. Meteorol. Soc. 2005, 86, 1079–1096. [Google Scholar] [CrossRef]
Menzel, W.P.; Frey, R.A.; Zhang, H.; Wylie, D.P.; Moeller, C.C.; Holz, R.; Maddux, B.; Baum, B.A.; Strabala, K.I.; Gumley, L.E. MODIS global cloud-top pressure andamountestimation: Algorithm description and results. J. Appl. Meteorol. Clim. 2008, 47, 1175–1198. [Google Scholar] [CrossRef]
Li, J.; Yi, Y.H.; Stamnes, K.; Ding, X.D.; Wang, T.H.; Jin, H.C.; Wang, S.S. A new approach to retrieve cloud base height of marine boundary layer clouds. Geophys. Res. Lett. 2013, 40, 4448–4453. [Google Scholar] [CrossRef]
Li, J.; Li, Z.; Wang, P.; Schmit, T.J.; Bai, W.; Atlas, R. An efficient radiative transfermodel for hyperspectral IR radiance simulation and applications under cloudy skyconditions. J. Geophys. Res. Atmos. 2017, 122, 7600–7613. [Google Scholar] [CrossRef]
Baum, B.; Menzel, W.P.; Frey, R.; Tobin, D.; Holz, R.; Ackerman, S. MODIS cloudtop property refinements for Collection 6. J. Appl. Meteorol. Clim. 2012, 51, 1145–1163. [Google Scholar] [CrossRef]
Weisz, E.; Li, J.; Menzel, W.P.; Heidinger, A.K.; Kahn, B.H.; Liu, C.Y. Comparison ofAIRS.MODIS.CloudSat and CALIPSO cloud top height retrievals. Geophys. Res. Lett. 2007, 34, 1–5. [Google Scholar] [CrossRef]
Sherwood, S.C.; Chae, J.-H.; Minnis, P.; McGill, M. Underestimation of deep convective cloud tops by thermal imagery. Geophys. Res. Lett. 2004, 31, 11. [Google Scholar] [CrossRef] [Green Version]
Min, M.; Li, J.; Wang, F.; Liu, Z.J.; Menzel, W.P. Retrieval of cloud top properties from advanced geostationary satellite imager measurements based on machine learning algorithms. Remote Sens. Environ. 2019, 239, 111616. [Google Scholar] [CrossRef]
Chang, F.L.; Minnis, P.; Ayers, J.K.; McGill, M.J.; Palikonda, R.; Spangenberg, D.A.; Smith, W.L., Jr.; Yost, C.R. Evaluation of satellite-based upper troposphere cloud top height retrievals in multilayer cloud conditions during TC4. J. Geophys. Res. Atmos. 2010, 10, 11–15. [Google Scholar] [CrossRef]
Chang, F.L.; Minnis, P.; Bing, L.; Khaiyer, M.M.; Palikonda, R.; Spangenberg, D.A. A modified method for inferring upper troposphere cloud top height using the GOES 12 imager 10.7 and 13.3 μm data. J. Geophys. Res. Atmos. 2010, 115, 1–13. [Google Scholar] [CrossRef] [Green Version]
Key, J.R.; Intrieri, J.M. Cloud Particle Phase Determination with the AVHRR. J. Appl. Meteorol. 2000, 39, 1797–1804. [Google Scholar] [CrossRef]
Daniel, J.S. Cloud liquid water and ice measurements from spectrally resolved near-infrared observations: A new technique. J. Geophys. Res. Atmos. 2002, 107, 1–16. [Google Scholar] [CrossRef]
Palmer, K.F.; Williams, D. Optical properties of water in the near infrared*. J. Opt. Soc. Am. B (1917-1983) 1974, 64, 1107–1110. [Google Scholar] [CrossRef]
Pilewskie, P.; Twomey, S. Cloud Phase Discrimination by Reflectance Measurements near 1.6 and 2.2 µm. J. Atmos. Sci. 1987, 44, 3419–3420. [Google Scholar] [CrossRef]
Min, M.; Bai, C.; Guo, J.; Sun, F.; Liu, C.; Wang, F.; Xu, H.; Tang, S.; Li, B.; Di, D.; et al. Estimating summertime precipitation from Himawari-8 and global forecast system based on machine learning. IEEE Trans. Geosci. Remote Sens. 2019, 57, 2557–2570. [Google Scholar] [CrossRef]
Tan, Z.; Huo, J.; Shuo, M.; Han, D.; Wang, X.; Hu, S.; Yan, W. Estimating cloud base height from Himawari-8 based on a random forest algorithm. Int. J. Remote Sens. 2021, 42, 2485–2501. [Google Scholar] [CrossRef]
Håkansson, N.; Adok, C.; Thoss, A.; Scheirer, R.; Hörnquist, S. Neural network cloud top pressure and height for MODIS. Atmos. Meas. Tech. 2018, 11, 3177–3196. [Google Scholar] [CrossRef] [Green Version]
Wang, X.; Iwabuchi, H.; Takaya, Y. Cloud identification and property retrieval from Himawari-8 infrared measurements via a deep neural network. Remote Sens. Environ. 2022, 275, 113026. [Google Scholar] [CrossRef]
Husi, L.; Nagao, T.M.; Nakajima, T.Y.; Riedi, J.; Ishimoto, H.; Baran, A.J.; Shang, H.; Sekiguchi, M.; Kikuchi, M. Ice cloud properties from Himawari-8/AHI nextgeneration geostationary satellite: Capability of the AHI to monitor the DC cloud generation process. IEEE Trans Geosci Remote Sens. 2019, 57, 3229–3239. [Google Scholar]
Iwabuchi, H.; Saito, M.; Tokoro, Y.; Putri, N.S.; Sekiguchi, M. Retrieval of radiative and microphysical properties of clouds from multispectral infrared measurements. Prog. Earth Planet. Sci. 2016, 3, 32. [Google Scholar] [CrossRef] [Green Version]
Hostetler, C.A.; Liu, Z.; Reagan, J.; Vaughan, M.; Winker, D.; Osborn, M.; Hunt, W.H.; Powell, K.A.; Trepte, C. CALIOP Algorithm Theoretical Basis Document, Calibration and Level 1 Data Products. Available online: https://www-calipso.larc.nasa.gov/resources/pdfs/PC-SCI-201v1.0.pdf (accessed on 10 January 2022).
Winker, D.M.; Vaughan, M.A.; Omar, A.; Hu, Y.; Powell, K.A.; Liu, Z.; Hunt, W.H.; Young, S.A. Overview of the CALIPSO mission and CALIOP data processing algorithms. J Atoms. Ocean. Technol. 2009, 26, 2310–2323. [Google Scholar] [CrossRef]
Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data. An. 2002, 38, 367–378. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. ACM 2016, 785–794. [Google Scholar] [CrossRef]
Romeo, L.; Frontoni, E. A Unified Hierarchical XGBoost Model for Classifying Priorities for COVID-19 Vaccination Campaign. Pattern Recogn. 2021, 121, 108197. [Google Scholar] [CrossRef] [PubMed]
Inoue, T. On the Temperature and Effective Emissivity Determination of Semi-Transparent Cirrus Clouds by Bi-Spectral Measurements in the 10 µm Window Region. J. Meteorol. Soc. Jpn. 1985, 63, 88–99. [Google Scholar] [CrossRef] [Green Version]
Derrien, M.; Lavanant, L.; Le, H.; Gleau. Retrieval of the cloud top temperature of semi-transparent clouds with AVHRR. In Proceedings of the IRS’88, Deepak Publ., Hampton, Lille, France, 8–24 August 1988; pp. 199–202.
Hamada, A.; Nishi, N. Development of a Cloud-Top Height Estimation Method by Geostationary Satellite Split-Window Measurements Trained with CloudSat Data. J. Appl. Meteorol. Clim. 2010, 49, 2035–2049. [Google Scholar] [CrossRef]

Figure 1. AHI and CALIOP data match.

Figure 2. Out-of-bag importance of input variables in the XGBoost training process.

Figure 3. Model training process based on XGBoost.

Figure 4. Distribution comparison of CTH products. (a) Himawari-8 L2 CTH; (b) XGBoost CTH.

Figure 5. Comparison of CTH Products.

Figure 6. Comparison of CTH scatter. (a) Himawari-8 L2 CTH; (b) XGBoost CTH.

Figure 7. CTH Bias Probability Density Distribution vs. CALIOP.

Figure 8. Statistical results compared with CTH_CAL.

Figure 9. Scatter plots of CTH results compared with CALIOP in different cloud scenarios: (a) Himawari-8 L2 CTH; (b) XGBoost CTH; 1 ice cloud, 2 water cloud, 3 mixed cloud, 4 single-layer cloud, 5 multi-layer cloud.

Figure 10. Probability density distribution of CTH products compared with CTH_CAL under different cloud scenarios. (a) Himawari-8 L2 CTH; (b) XGBoost CTH.

Figure 11. The mean error of CTH product at different heights.

Table 1. Selected AHI channel.

Number of Channels	Center Wavelength of the Channel	Channel Characteristics
Channel 05	1.6 µm	Low water vapor absorption channel
Channel 07	3.9 µm	Different cloud phase states have absorption differences
Channel 10	7.3 µm	Water vapor absorption channel
Channel 11	8.6 µm	Water vapor absorption channel
Channel 14	11.2 µm	Split window channel
Channel 15	12.3 µm	Split window channel
Channel 16	13.3 µm	CO₂ absorption channel

Table 2. Model input parameters.

Variable Type	Variable	Note
Reflectivity	R1.6	Sensitive to the phase state of cloud
Bright Temperature	BT11.2	Temperatures close to opaque cloud
	BT7.3	It is important to identify high optical thin cloud
	BT13.3	It is important to identify high optical thin cloud
BT difference between channels	BT11.2-BT12.3, BT8.6-BT12.3, BT7.3-BT12.3, BT13.3-BT12.3	Holds information about whether the cloud is opaque and how transparent it is
Texture parameters	(BT11.2)text, (BT3.9)text, (R1.6)text (BT11.2-BT12.3)text, (BT11.2-BT3.9)text	Save information about the opacity, translucency, or edges of cloud
BT differences to warmest/coldest neighbor	BT11.2-BT11.2 W, BT11.2-BT11.2 C, BT12.3 W-BT11.2 W, BT12.3 C-BT11.2 C, BT11.2 W-BT3.9 W, BT11.2 C-BT3.9 C	Cloud Optical Thickness
Geographic and spatial information	Latitude, Longitude, SZA, VZA	Eliminate some uncertainties caused by geographic location and space

Table 3. Statistics of differences between CTH_JMA and CTH_XGB.

Cloud Scene	ME/km		RMSE/km		Std/km
	CTH_JMA	CTH_XGB	CTH_JMA	CTH_XGB	CTH_JMA	CTH_XGB
Ice	−2.34	−0.05	2.80	1.75	1.70	1.54
Water	0.82	0.73	1.67	1.44	1.51	1.43
Mix	−1.23	0.34	2.46	2.00	2.13	1.97
Single-layer	−0.73	0.56	1.95	1.72	1.80	1.62
Multi-layer	−2.22	−0.17	2.84	1.78	1.78	1.67
All	−1.27	0.30	2.31	1.74	1.93	1.72

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dong, Y.; Sun, X.; Li, Q. A Method for Retrieving Cloud-Top Height Based on a Machine Learning Model Using the Himawari-8 Combined with Near Infrared Data. Remote Sens. 2022, 14, 6367. https://doi.org/10.3390/rs14246367

AMA Style

Dong Y, Sun X, Li Q. A Method for Retrieving Cloud-Top Height Based on a Machine Learning Model Using the Himawari-8 Combined with Near Infrared Data. Remote Sensing. 2022; 14(24):6367. https://doi.org/10.3390/rs14246367

Chicago/Turabian Style

Dong, Yan, Xuejin Sun, and Qinghui Li. 2022. "A Method for Retrieving Cloud-Top Height Based on a Machine Learning Model Using the Himawari-8 Combined with Near Infrared Data" Remote Sensing 14, no. 24: 6367. https://doi.org/10.3390/rs14246367

APA Style

Dong, Y., Sun, X., & Li, Q. (2022). A Method for Retrieving Cloud-Top Height Based on a Machine Learning Model Using the Himawari-8 Combined with Near Infrared Data. Remote Sensing, 14(24), 6367. https://doi.org/10.3390/rs14246367

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Method for Retrieving Cloud-Top Height Based on a Machine Learning Model Using the Himawari-8 Combined with Near Infrared Data

Abstract

1. Introduction

2. Method

2.1. Data

2.2. Retrieval Algorithm

2.3. Model Input Parameters

2.4. Model Training Method

3. Result

3.1. Case Analysis

3.2. Statistic Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI