Tropical Overshooting Cloud-Top Height Retrieval from Himawari-8 Imagery Based on Random Forest Model

: Tropical overshooting convection has a strong impact on both heat budget and moisture distribution in the upper troposphere and lower stratosphere, and it can pose a great risk to aviation safety. Cloud-top height is one of the essential concerns of overshooting convection for both the climate system and the aviation weather forecast. The main purpose of our work is to verify the application of the machine learning method, taking the random forest (RF) model as an instance, in overshooting cloud-top height retrieval from Himawari-8 data. By using collocated CloudSat observations as a reference, we utilize several infrared indicators of Himawari-8 that are commonly recognized to relate to cloud-top height, along with some temporal and geographical parameters (latitude, month, satellite zenith angle, etc.), as predictors to construct and validate the model. Analysis of variable importance shows that the brightness temperature of 6.2 um acts as the dominant predictor, followed by satellite zenith angle, brightness temperature of 13.3 um, latitude, and month. In the comparison between the RF model and the traditional single-channel interpolation method, retrievals from the RF model agree well with observation with a high correlation coefﬁcient (0.92), small RMSE (222 m), and small MAE (164 m), while these metrics from traditional single-channel interpolation method shows lower skills (0.70, 1305 m, and 1179 m). This work presents a new sight of overshooting cloud-top height retrieval based on the machine learning method. formal analysis, G.W., H.W., and Y.Z.; investigation, G.W.; resources, G.W. and Y.Z.; data curation, G.W. and S.C.; writing—original draft preparation, G.W.; writing— review and editing, G.W., H.W., and Y.Z.; visualization, G.W.; supervision, H.W., Y.Z., and Q.W.; project administration, H.W.; funding acquisition, H.W. and Q.W. authors


Introduction
Overshooting convection, which is a small subset of clouds with strong convective updrafts that penetrate the upper troposphere-lower stratosphere (UTLS) region, has been widely recognized to be related to the transport of various atmospheric constituents and chemical species from the troposphere into the stratosphere. It plays an important role in moisture distribution and energy budgets globally [1][2][3][4][5], and the height of overshooting clouds indicates the altitude to which tropospheric air parcels are transported. The cloudtop height (CTH) of overshooting clouds is also required by aviation weather forecast since it is an indication of strong vertical development of deep convective cores [6] and convectively induced turbulence caused by gravity wave breaking [7] which occurs at ≈1 km above overshooting tops causing damage to flight [8]. Hence, the importance of accurately obtaining the CTH of overshooting clouds globally and at high frequency is underscored.
Methods of CTH retrieval from both ground-and satellite-based remote sensing data have been developed for several years. As for active sensors, reflectivity echo-top height Radar or Lidar (active sensors) could act as a reference of CTH directly, including Micro-Pulse Lidar (MPL) and millimeter wavelength cloud radar (MMCR) at the

Dataset
Datasets of about 1.5 years (1 March 2016 to 30 September 2017) for the domain of interest (20 • N~20 • S, 80 • E~160 • W) were collected to characterize tropical overshooting cloud, and datasets used in this study are described as follows.

Himawari-8 Satellite (HMW8/AHI)
Himawari-8 is a next-generation geostationary meteorological satellite launched into the geosynchronous orbit around 140.7 • E by the Meteorological Satellite Center (MSC) of the Japan Meteorological Agency (JMA, Tokyo, Japan). The core observatory, the Advanced Himawari Imager (AHI), is a multispectral infrared imager containing 16 observational channels with central wavelengths ranging from 0.47 to 13.3 µm. The temporal resolution is 2.5 min (10 min) for sectored regions (Full Disk), and the spectral resolution is 0.5 km (2 km) [31,32]. The Full-Disk data cover regions over East Asia, South-East Asia, Australia, and the western Pacific region, and have been validated to be good enough for some meteorological research, such as convection initiation, deep convection characters, etc. [33,34]. For convenience, Himawari L1 Gridded data archived on the JAXA P-TREE system was utilized in this study (http://www.eorc.jaxa.jp/ptree/index.html). This product was remapped from Full Disk data to 2 km spatial resolution and 10 min temporal resolution. Data of satellite zenith angle (SZA) and satellite azimuth angle (SAA) were utilized for datasets collocation and brightness temperature data from HMW8/AHI imageries were used to construct predictors for training and testing the machine learning model.

CloudSat Products
The Cloud Profiling Radar (CPR) on-board CloudSat is a 94 GHz nadir-looking active radar that provides the vertical structure of the cloud between the surface and 30 km altitude by measuring backscattered signal as a function of distance from the radar. The effective vertical resolution is 480 m, with oversampling at 240 m and the footprint of CPR covers 1.7 km in the along-track direction and 1.3 km in the across-track direction. There are many products based on CPR and several of them were used in this study and described below. 2B-CLDCLASS is a cloud type product that categorized clouds along tack into eight types with considering basic factors such as cloud phase, hydrometeor density, etc. 2B-GEOPROF products contain cloud mask information and radar reflectivity factor at every 125 vertical bins. The cloud mask value of 2B-GEOPROF corresponds to the presence of the cloud inside a CPR bin, and a larger cloud mask value is related to higher confidence in cloud detection [35]. We obtain CTH directly by searching the highest altitude of cloud mask value ≥ 30, which is suggested by the CloudSat project team that vertically connected CloudSat bins with significant cloud mask values (≥30) could be regarded as a cloud layer with a false alarm rate smaller than 2%. Only pixels classified as monolayer deep convective cloud for simplicity by 2B-CLDCLASS, and with CloudSat CTH higher than GFS tropopause, are collected as overshooting cloud pixels. CTH from CPR has been tested to be proper for the research of overshooting clouds [20]. Therefore, we used CloudSat CTH as a reference in this study, though CTH from CPR signal is often below that of the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO). Besides, ECMWF-AUX contains state variable data interpolated to each CPR bin by Generic-AUX Interpolate-to-Reference algorithm with AN-ECMWF dataset provided by the European Center for Medium-Range Weather Forecasts (ECMWF, Reading, UK). Temperature profiles from ECMWF-AUX are utilized to derive cloud-top heights in the traditional method for comparison with the machine learning model. Details of the CloudSat mission and data products are given by CloudSat Data Processing Center (http://www.cloudsat.cira.colostate.edu/).

Tropopause Height of the Global Forecast System (GFS)
The Global Forecast System (GFS) final analysis product (FNL) with 0.5 • × 0.5 • horizontal grids produced by the National Centers for Environmental Prediction (NCEP) is used to categorize the overshooting clouds (available at https://www.ncdc.noaa.gov/dataaccess/model-data/model-datasets/global-forcast-system-gfs). The GFS tropopause is calculated on the model's native 64 hybrid-sigma pressure level grid using an algorithm implementing the WMO thermal lapse rate tropopause definition (WMO, Geneva, Switzerland, 1957) and interpolation between levels. Thermal tropopause was designed to locate the critical change in the vertical gradient of the temperature which is the only globally applicable tropopause definition. It has been shown to mark the critical level of temperature gradient and a sharp increase in the static stability [36,37] and it is widely used in studies of clouds above tropopause [7,12,38,39]. Besides, the uncertainty of GFS tropopause has been tested by Homeyer [38] and Pan and Munchak [39], and they suggested that the GFS tropopause showed good agreement compared to radiosondes with low bias and standard deviation of uncertainty. The GFS 3-hourly tropopause dataset was remapped to the resolution of corresponding HMW8/AHI imagery linearly to avoid the grid box artifacts in our study.

Methodology
For the development and validation of the random forest model for tropical overshooting CTH retrieval, several steps were included: (1) data collocation between HMW8, CloudSat products, and GFS; (2) construction of the random forest model including predictor selection and model tuning; (3) model assessment and comparison with traditional single-channel interpolation method. All these procedures are summarized in the workflow shown in Figure 1, and the detail of all methods are described as follows.

Data Collocation
An important step in constructing training and testing data for the machine learning model is the collocation of CloudSat and HMW8/AHI data. The most common approach for collocation is to keep the ground footprints of sensors matching each other by matching each CloudSat profile with the nearest pixel of geo-satellite (NC, nearest collocation) [40], and matching each CloudSat profile with the averaged value of 3 × 3 grids HMW8/AHI pixels centered on the nearest geo-satellite pixel (ANC) [41] to overcome the potential navigation errors. However, considering the large distance between geo-satellite and geo-center as well as the tall height of the overshooting cloud, the parallax problem [42] could cause a distinguish mismatch. We considered parallax and applied a geometrical collocation (GC) in this work, by assuming spherical earth and calculating corrected displacement simply by SZA, CTH, and latitude. A detailed description can be found in Appendix A, and a comparison of the above-mentioned collocation methods is shown in Section 4.1.
Considering the structure of overshooting convection and their short lifetime, only collocated pixels that meet the following criteria were used in this study: (1) classified by 2B-CLDCLASS products as deep convection; (2) CTH of CloudSat over GFS tropopause; (3) occurs within 120s time gap between CloudSat and HMW8/AHI imagery; finally, we obtained a total of 8780 effective tropical deep convective overshooting cloud pixels, and the CTH of these pixels were labeled as CTH obs afterward. All samples were then divided into a training dataset (70% of total) and a test dataset (30% of total) randomly for RF model training and validation.

Random Forest Regression Model
Random forest, an ensemble supervised machine-learning algorithm, has been shown to perform well in a variety of meteorological investigations like the prediction of precipitation, mesoscale convection system, and snowfall, etc. [43][44][45]. We constructed a large number of nonlinear decision trees at training time and output the mean prediction of the individual trees in the model. Each tree was created using a random subset of candidate predictors on top of bagging by sampling with replacement, which could improve the stability and reduce variance to avoid overfitting. Therefore, it is computationally efficient on large data set and relatively robust to outliers and noise. For more details of the RF algorithm, please refer to Liaw [46], Cutler [47], Boulesteix, Janitza [48], Breiman [49], etc.

Feature Selection
There have been several well-developed CTH retrieval algorithms based on geosatellite. The single-channel interpolation is one of the most common methods in thick cloud-top height retrieval in which the infrared brightness temperature (IRBT) of 11.2µm (BT 11.2 , IRW) is used to be a good proxy for cloud-top temperature and CTH in most cases [18,50]. The brightness temperature of some other channels like the carbon dioxide absorption channel (BT 13.3 , CO 2 ), water vapor absorption channel (BT 6.2 , WV), and ozone absorption channel (BT 9.6 , O 3 ) can also be used for CTH assignment in some conditions [50]. Besides, two-channel brightness temperature differences (BTDs), such as BTD between WV and IRW (BTD 6.2-11.2 ), BTD between ozone and IRW (BTD 9.6-11.2 ), and BTD between carbon dioxide and IRW (BTD 13.3-11.2 ) [51][52][53], are also widely used to indicate the strength of deep convection, which is relative to CTH to some extent. In addition, Hamada [54] suggested that estimates of upper-tropospheric cloud-top height were related to the latitude, SZA, season, diurnal cycle, and surface type (land or ocean). Therefore, we took these temporal and geographic factors as candidate predictors except the diurnal cycle, since CloudSat is a daytime satellite. All candidate predictors are listed in Table 1. Although all the candidate predictors have been verified by pre-existing methods, some of them are correlated and make some predictors redundant, which could cost more computation resources and even introduce noise to outputs. We used a simple and powerful method (Wrapper) by evaluating a subset of features according to the performance of the RF model for the dimension reduction purpose. Firstly, feature importance was estimated and sorted by their relative mean decrease accuracy (MDA) on the out-of-bag predictions, which means how much the accuracy decreases after a predictor is randomly permuted or deleted, the higher the MDA, the more important the predictor. Secondly, we adopted the sequential forward selection (SFS) as the searching strategy, which has been testified to be well suited for feature selection procedure considering both performance and computational expenditure [55,56]. Subsets of predictors start to be empty and predictors are progressively incorporated into larger and larger subsets in order of decreasing the trend of their importance. The means square error of out-of-bag prediction (OOB_MSE) is calculated in each loop when rebuilding the RF model to indicate the model performance.

Model Tuning
There are two main parameters to adjust when using the RF model: the number of trees in the forest (n_tree) and the number of predictors randomly sampled at each split (max_features). The n_tree is a free parameter since generalization error always converges as n_tree increases [57] but computation cost will be higher for more trees. The max_features is a parameter that depends on the data at hand. A smaller value of max_features means a greater reduction of variance while a larger one leads to greater bias [48]. To get the optimal combination of n_tree and max_features, a large number of RF models was created using a grid search strategy in which each n_tree value (n_tree = 1 to 1000) pairs up with every max_features value (max_features = 1 to the number of predictors). We also set a default RF model in the feature selection procedure where max_features equal the number of subsets predictors in each loop, which is the empirically proper value for regression problems, and n_tree is set to 1000 which is large enough for most cases. To get a stable analysis, RF models rebuilt with different parameters were trained and validated using standard 10-fold-cross-validation.

Metrics for Model Evaluation
Once the RF model was established, the calculated CTHs (ŷ RF i ) were validated against the collocated CTHs from CloudSat/CPR (y obs i ). Mean absolute error (MAE), root mean square error (RMSE), and correlation coefficient (R) were used as measures of the agreement between calculated and observed CTH. These scores are given as follows: R can be utilized as a measure of the linear relation between the observed and predicted values ranging from −1 to 1, and the closer the R is to 1 indicates the better model performance. Besides, we used RMSE as a measure of the goodness of fit using the quadratic scoring rule, and it gives a relatively high weight to larger errors, so we also utilized MAE to indicate the average magnitude of the absolute difference between predicted value and observation, which could be used to diagnose the variation in the errors along with RMSE. Smaller RMSE and MAE indicate better prediction and zero value for these two quantities indicates a perfect prediction. Additionally, the greater difference between RMSE and MAE means the greater the variance in the prediction and worse model performance, and all errors are at the same magnitude if they are equal.

Comparison of Collocation Method
We present a case to show how the GC method improves the accuracy in matching CloudSat profile location to HMW8 imagery than other collocation methods in Figure 2 of an example of an overshooting cloud with CTH higher than 17 km that occurred at 7:40 UTC on 28 February 2017 along with the collocated CloudSat footprint by NC, ANC, and GC. CloudSat/CPR reflectivity with cloud mask value > 30 and the collocated BT 11.2 by the three methods are shown in Figure 2b,c, respectively. To make the comparison clearer, we also plotted CTH retrieved from the traditional interpolation method over the cloud structure in Figure 2b. It is widely acknowledged that the overshooting top is a small cluster with very cold brightness temperatures region and it is colder than the surrounding anvil cloud temperature [7,58], so the better collocation method is supposed to get better consistency between BT 11.2 and cloud structure from CloudSat/CPR. However, as shown in this case, the overshooting top is matched to a relative warmer BT 11.2 and the lower anvil around it has the coldest BT 11.2 by NC method. ANC is a modified algorithm of NC, but the difference of BT 11.2 with NC and ANC is not distinct (mostly ≈3 K) and the mismatch between the coldest BT 11.2 and highest CTH is still apparent by the ANC method. Since ANC and NC share the same footprint on HMW8 imagery, 3 × 3 pixel-averaged BT 11.2 of ANC cannot differ from that of NC largely. There is a distinct shift of footprint over HMW8 imagery after geometric correction, ranging from 12.1 to 26.8 km, which is much larger than the resolution of HMW8 (2 km). Besides, the BT 11.2 difference between GC and the former two methods is up to 65 K around the overshooting top and the BT 11.2 profile of GC agrees better with the structure of the overshooting cloud than those of the other two methods. Meanwhile, even though there is a significant underestimation of CTH from the traditional method, CTH from BT 11.2 of the GC method still shows better consistency than the NC and ANC. Therefore, the geometric collocation method improves the accuracy in matching CloudSat profiles location on HMW8 imagery.
Standard deviation (STD) and the average value of BT 11.2 for every 500 m vertical bin were used for the statistical analyses and a good collocation method should result in a small dispersion of BT 11.2 at each CTH bin for the thick cloud. We excluded multilayer cloud and cirrus to avoid the influence of surface temperature. Figure 3 shows the average BT 11.2 (line) and their STD (shade) of each height bin for deep convection. Both average value and STD of BT 11.2 of ANC (dotted line, read shadow) are almost equal to those of NC (dashed line, blue shadow) at each altitude bin, which means that ANC cannot improve NC remarkably. On the other hand, the GC method reduces the variation of BT 11.2 distinctly for deep convection. The average BT 11.2 from GC (solid line) is also lower than that of NC and ANC as height increases (more than 1 K at 17.5 km). Theoretically, as shown in the functions in Appendix A, the corrected shift of GC is not only related to CTH, but also related to SZA (ranging from 0 to 84 • ) which indicates the relative distance from the target cloud to the subsatellite point, and SAA which is only related to the transformation from geometric displacement to the geographic coordinate system. The relationship between corrected shift and CTH/SZA is not simply linear but monotonic, and the shift increase with larger CTH/SZA. Only corrected distance of cloud with low CTH and/or small SZA could be neglected. Overall, GC has the advantage over NC and ANC since it considers both the influence of CTH and SZA on the parallax shift of BT 11.2 and the GC-corrected BT 11.2 shows a considerably colder average BT 11.2 as well as smaller STD.  For the GC method, there are still some factors that could introduce errors to collocation, such as spherical earth assumption and cloud overlapping. We tested the impact of earth shape by comparing the difference between corrected displacement by spherical and elliptical earth assumption. The elliptical earth was modeled with the semimajor and semiminor axis defined in the World Geodetic System (WGS84) while the radius of a spher-ical earth is set to a fixed number (6371 km). For simplicity, we only compared these two earth models for a constant 18 km CTH. Although the difference of corrected shift between the sphere earth model and the elliptical earth becomes larger as SZA increases, it is too small to affect the collocation between HMW8/AHI and CloudSat ( Figure S1). As for the overlapping problem, clouds with low CTH and/or large SZA can be overlapped with other clouds on the scan route and it cannot be settled by the GC method. However, error from overlapping could be neglected in this work since we only focus on overshooting clouds with very high CTH (over 14 km at tropics). Figure 4 shows the performance of the RF model indicated by OOB_MSE during the SFS procedure and the predictors are sorted in order of descending importance, where colder color indicates greater importance of the variable. A smaller OOB_MSE indicates better accuracy of prediction. The OOB_MSE typically decreases first and then increases slightly again as the subset gets larger. In our case, the feature subset with five predictors has the best performance with the smallest OOB_MSE and this optimal subset of predictors includes BT 6.2 , SZA, BT 13.3 , latitude, and month (descending importance) and is used for subsequent model tuning. The importance of predictors represented by MDA of OOB_MSE are shown in Figure S1 and these five variables have MDA larger than 0.5, which means that the shuffle of any of these five variables could lead to an OOB_MSE increased by as much as 50% from the default model. Atmospheric window and carbon dioxide absorption channel (11.2 and 13.3 um) have stronger penetration than water vapor absorption channel (6.2 um), and brightness temperature is retrieved from the radiation of different channels from the cloud top, so that BT 6.2 generally indicates brightness temperature of higher altitude, which might be the potential reason for the importance of BT 6.2 . There are three types of penetration convection [6] including 'cold-high', 'cold-low', and 'warm-high' corresponding to the incipient, mature, and dissipating stage of overshooting convection. In this case, BT 13.3 and BT 11.2 are not the best proxy of cloud-top height especially for 'warm-high' types, since they represent the emission that arises from deeper within the cloud as the fall of the convective core, which makes the cloud appear warmer in IR brightness temperature; but the sensitivity to the moisture of 6.2 um channel makes BT 6.2 still represent the height near cloud-top. BT 13.3 is the most commonly used channel to estimate cloud-top height and it just works as a complement of BT 6.2 and could take over BT 6.2 as the dominant predictor if manually excluding BT 6.2 from predictors, but at the price of a little accuracy. Notice that BT 11.2 is less important as shown in Figure S1, its high correlation with BT 13.3 for overshooting clouds (correlation coefficient is 0.98 at 1% significant level) may lead to its importance shared by BT 13.3 . However, BTDs are not important for the estimation of CTH of overshooting clouds in our RF model. Besides, the effect of SZA on the infrared radiance of each pixel causes the impact of SZA on CTH estimation. The spatial and temporal variation of tropopause, latitude, and month are also important for overshooting cloud CTH retrieval.

Feature Selection and Model Tuning
Parameters including n_tree and max_features can affect the performance of the RF model. To assess the impact of n_tree and max_features on OOB_MSE, a large number of RF models with n_tree ranging from 1 to 1000 using randomly selected subsets (max_features = 1 to 5) were created and Figure 5a provides information about how the model error (OOB_MSE) changes with the n_tree for different max_features. The error of model converges from ≈ 500 trees onwards and a larger number of trees neither increase nor decrease the model accuracy. Considering both accuracy and computation time, n_tree = 500 is large enough and sufficient to produce a stable prediction for this work. Based on the foregoing results about n_tree, OOB_MSE at different max_features of RF models with n_tree = 500 are considered, and the influence of max_features on model error is shown in Figure 5b. When max_features = 3 is selected, OOB_MSE turns to be the smallest value (averaged at 219.25 m), which indicates the highest prediction accuracy.
Since our purpose was testing the feasibility of machine learning in overshooting cloud-top height retrieval, we arranged the feature selection procedure followed by model tuning in this study to save computation expense, and the main parameters were set to default in the feature selection procedure. However, when it comes to operational application, if not caring about the computation cost, we recommend the grid search strategy (or exhaust algorithm), which processes feature section and model tuning simultaneously, to get the optimal subset of predictors and model parameters so to obtain the model with the best accuracy.

Evaluation of the RF Model
To evaluate the RF model, we compared CTH retrieved from CloudSat (CTH obs ) and RF model retrieval (CTH RF ) based on test data and the results are shown in Figure 6a. CTH RF shows good consistency to CTH obs with R = 0.92, RMSE = 222 m, and MAE = 164 m.
The frequency distribution of residual between calculated and observed CTH also shows good consistency with 99.3% of the samples having residual smaller than 750 m while 96.9% of total cases have CTH RF within 500 m from CTH obs (not shown), which is better than previous work [20] that had 65% of geostationary overshooting top heights within ±500 m of the coincident CPR-estimated heights. However, previous work using ML methods in CTH retrieval which treats all kinds of cloud equally in training machine learning method has worse performance with the mean MAE over 1.6 km and most of their error comes from the cloud with CTH > 12 km or CTH < 2 km [19]. Another way to evaluate model performance is based on the overshooting depth represented by the difference between CTH obs and GFS tropopause height. We categorized all cases into three classes based on overshooting depths with bins of 500 m (0-500 m, 500-1 km, above 1 km). The performance of RF model of these three classes was tested, and results show that CTH obs agrees well with CTH RF , with R ranges from 0.85 to 0.91, RMSE ranges from 200 m to 281 m, while MAE ranges from 151 to 281 m from low to high penetration cloud relative to GFS tropopause. When it comes to the performance of the RF model for the extremely high overshooting clouds with cloud-top temperature colder than tropopause temperature which cannot be searched on NWP temperature profile, the result still shows a good agreement with observation (R = 0.86, RMSE = 226 m).

Comparison with Interpolation Method
We applied the most common method for ascertaining CTH from space in real-time by matching the satellite IRBT to the appropriate vertical level within a collocated rawinsonde (temperature profile) from NWP, especially for deep convection which is optically opaque. The infrared window (11.2 µm) brightness temperature taken by the HMW8/AHI was utilized along with the temperature profiles obtained from CloudSat ECMWF-AUX product in this method and the CTH of overshooting clouds from HMW8 pixels were linearly interpolated as the lowest altitude where the temperature profile matched BT 11.2 . For extremely high overshooting clouds (overshooting convective clouds), their cloud-top temperature can be cooler than the environment [20], then their CTHs were calculated by adding the difference between the cloud-top temperature and tropopause temperature divided by the adiabatic lapse rate (9.76 K/km) to the tropopause height, with the assumption that undiluted convective updraft overshoots tropopause and continues to rise adiabatically [6].
We obtained the CTH of test data by the interpolation method described above as shown in Figure 6b, and the CTH retrieved from the single-channel interpolation method (CTH IR ) has larger discrepancies than CTH RF , with R of 0.70. CTH IR shows significant underestimates for most samples, with RMSE and MAE more than 1 km (1305 and 1179 m); The result is similar to the work of Min et al. [19] which showed that CTHs products of HMW8 proposed by the Japan Meteorological Agency (JMA, Tokyo, Japan) [18,59], which uses the combination of the interpolation method, radiance ratioing method, and intercept method, have significant underestimates with almost ≈5 km lower than that observed from Cloud Physics Lidar (CPL) [19]. Even though they contribute the underestimation to the undetectably thin and high cloud layers, it has already shown pronounced underestimation when CTH > 12 km. For the peer comparison, we calculated R, RMSE, and MAE of three classes based on overshooting depths. CTHs estimates at each 500 m bins above GFS tropopause show less accuracy with R ranging from 0.35 to 0.63, RMSE from 1274 to 1324 m, and MAE from 1126 to 1208 m in comparison with that of RF model with R ranges from 0.85 to 0.91, RMSE ranges from 200 to 281 m, while MAE ranges from 151 to 281 m.
The traditional method holds an assumption that the cloud temperature is the same as that of the environment at the same altitude. Even though the temperature contrast between cloud-top and environment generally becomes smaller towards higher altitude due to adiabatic cooling and mixing. However, when it comes to specific events, it varies based on different conditions, and, for most cases, the convective thermals still tend to be warmer than the environment, which might be the reason for its underestimate. However, as for the RF model, we took all variables related to overshooting events into consideration, including variables about tropopause such as latitude and month, variables related to cloudtop properties such as cloud-top brightness temperature, as well as, even though excluded in the final model, variables related to the strength of convection such as BTDs, etc., and the RF model learns the relationship between CTH and all predictors, which indicates that the predictability of the RF model comes from the combination of all these relationships without any pre-existing assumption. However, there is an inherent shortcoming of the data-driven ML method that it can only learn rules from data on hand, and the accuracy of the model is somehow affected by the sample size, therefore, it is very important for data collection in our future work.

Conclusions
Errors in the nearest matching method (NC) for the collocation between HMW8/AHI and CloudSat data produce anomalously high/low values of BT 11.2 and significant mismatch especially for tall cloud. The 3 × 3-pixel-average method (ANC) cannot decrease this error. A simplified collocation method, which considers geometric displacement and utilizes spherical earth assumption, was used in the data collocation. As shown in the theoretical analysis (functions in Appendix A), parallax displacement is related to both CTH and SZA, and their relationship is monotonic. Only for the cloud with small CTH and SZA, the geometric shift could be neglected. Results show that the HMW8/AHI cloud-top BT 11.2 agrees better with the structure of CloudSat reflectivity after geometric collocation in both case study and statistical analysis.
After collocation, we used the RF model based on IR indicators from the HMW8/AHI satellite along with some temporal and geographic factors as predictors to test the applicability of the machine learning model in overshooting cloud-top height retrieval. After the SFS procedure and model tuning, the RF model was finally constructed with n_tree = 500 and max_feature = 3 from five predictors (BT 6.2 , SZA, BT 13.3 , latitude, and month). Basically, the prediction provided by the RF model was calculated by combining relationships between overshooting CTH and all predictors. Results show that the RF model is an effective and valid method in calculating CTH of overshooting clouds with low error (small RMSE and MAE around 200 m) and higher correlation (R = 0.92), even as for cases over 1 km above the tropopause, their CTH retrieved from the RF model still highly correlated to observations with R = 0.85. However, the traditional single-channel interpolation method has a much lower correlation of 0.7 and larger RMSE and MAE over 1 km. Results verify the applicability of machine learning for CTH retrieval of overshooting clouds.
Overall, this work provides a preliminary application of CTH estimation with the machine learning method and obtains good accuracy. However, some shortcomings remain unsettled, for example, model error increases as increasing CTH, which is the result of decreasing sample size of overshooting clouds. What we used for the RF model was overshooting cases so that the bias from decreasing sample size is still acceptable, but it is noteworthy to mention that larger sample size is necessary to improve the accuracy. Considering the larger spatial coverage of more advanced radars onboard TRMM/GPM satellite [60] which could provide a near-global view of 3D cloud structure in 2-3 h, as well as other active sensors such as CALIPSO, one of our main ongoing efforts is collecting more samples by combining these advanced radars with HMW8 to improve the accuracy of the model. More algorithms other than RF, such as neural networks, will be implemented and assessed with data from different geostationary satellites and their application to CTH retrieval based on cloud types to produce a global product of CTH retrieval in our future works. Additionally, it is also of good possibility for machine learning methods, along with the overshooting detection method for geo-satellites, to be promoted to geo-satellites other than HMW8, such as GOES-R, for the real-time and global analyses of penetration cloud height for a variety of weather and climate applications in the future work.

Acknowledgments:
The authors are grateful to the Japan Aerospace Exploration Agency (JAXA) Himawari Monitor (p-tree system) for providing the Himawari-8/AHI L1 Gridded data, the Cloud-Sat Data Processing Center for providing the CloudSat data, and the NCEP for providing GFS data. Additionally, we also thank the anonymous reviewers for their suggestions for improving the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
As shown in Figure A1, referring to the parallax correction method given by [42], assuming S CPR as CloudSat CPR and S AHI as Geo-satellite Himawari-8/AHI, A observes cloud-top at point F with a ground footprint of CloudSat at point D, and B observes the same cloud-top with ground footprint E. To make sure that the information from the geo-satellite sensor correctly matches that of CloudSat, a shift in the ground footprint of B is required and the correct shift is related to some factors including satellite zenith angle, the altitude of instrument A, cloud-top height, etc. Besides, we denote satellite zenith angle (SZA, ∠ADS AHI ) as α, satellite viewing angle (SVA, ∠OS AHI D) as β. Assuming that earth is a sphere with constant averaging real earth radius (L OD = = Re,~6371 km), distance from earth's center to the satellite (DES) is constant (L OB ,~km) as well.
Using sine law in ∆DOS AHI , ∆FDS AHI , and ∆EOS AHI , then we get  Figure A1. Schematic representation of viewing geometries from space-to-earth view for two satellites of HMW8/AHI (satellite S AHI ) and CloudSat (satellite S CPR ) for the deep convective cloud. C is the subsatellite point of HMW8/AHI on the equator and D, the footprint of deep convective cloud, could be at any location.