Extreme Gradient Boosting Model for Rain Retrieval using Radar Reﬂectivity from Various Elevation Angles

: The purpose of this study was to develop an optimal estimation model for rainfall rate retrievals using radar reﬂectivity, thereby gaining an e ﬀ ective grasp of rainfall information for disaster prevention uses. A process was designed for evaluating the optimal retrieval models using various dataset combinations with radar reﬂectivity and ground meteorological attributes. Various ground meteorological attributes (such as relative humidity, wind speed, precipitation, etc.) were obtained using the land-based weather stations a ﬃ liated with Taiwan’s Central Weather Bureau (CWB). This study used nine radar reﬂectivity provided by the Hualien weather surveillance radar station’s Volume Cover Pattern 21 system. The developed models are built using multiple machine learning algorithms, including linear regression (REG), support vector regression (SVR), and extreme gradient boosting (XGBoost), in addition to the Marshall–Palmer formula (MP). The study examined 14 typhoons that occurred from 2008 to 2017 at Chenggong station in southeast Taiwan, and Lanyu station in the outlying islands, and the top four major rainfall events were designated as test typhoons—Nanmadol (2011), Tembin (2012), Matmo (2014), and Nepartak (2016). The results indicated that for rainfall retrievals, radar reﬂectivity at a scanning (elevation) angle of 6.0 ◦ combined with ground meteorological attributes were the optimal input variables for the Chenggong station, whereas radar reﬂectivity at an elevation angle of 4.3 ◦ combined with ground meteorological attributes were optimal for the Lanyu station. In terms of model performance, XGBoost models had the lowest error index at Chenggong and Lanyu stations compared with MP, REG, and SVR models. XGBoost models at Lanyu station had the highest e ﬃ ciency coe ﬃ cient (0.903), and those at Chenggong station had the second highest (0.885). As a result, pairing the combination of optimal radar reﬂectivity and ground meteorological attributes, as veriﬁed by the evaluation process, with a high-e ﬃ ciency algorithm (XGBoost) can e ﬀ ectively increase the accuracy of rainfall retrieval during typhoons.


Introduction
Typhoons are extreme weather systems that occur more frequently during summer and fall in Taiwan. Typhoons mostly originate in the intertropical convergence zone, with the strongest typhoons created in the western North Pacific and the South China Sea. Approximately 25.7 typhoons are formed on average each year, and the strong winds and rainstorms generated by landfall affect hydrological cycles in East Asia [1,2]. Taiwan is positioned at 120-122 • east and 22-25 • north in the West Pacific (Figure 1), and its terrain is mostly steep hills. Due to the simultaneous influence of monsoons, ocean climate, and West Pacific typhoons, climate conditions easily form extreme weather systems that are highly destructive; the extreme rainfall and strong winds brought by typhoons are the cause of the most serious disasters in Taiwan [3][4][5]. Remote sensing, by definition, refers to the gathering of information regarding objects or media (e.g., ground, rivers, oceans, and atmosphere) without contact. The information is obtained through interaction of electromagnetic or acoustic waves with the media or objects by using passive or active instruments. Remote sensing instruments and rain gauges are the tools most often used to measure hydrometeorological parameters in research [6][7][8]. Most meteorological radars are set up on the ground, and scan clouds or falling water drops near the radar station (e.g., maximum scan radius = approximately 450 km for S-band radar) in a 360 • omnidirectional rotation using multiple elevation angles from the bottom upwards. Because of radars' higher resolution and real-time data acquisition, many radar rainfall prediction systems employ the relationship between radar reflectivity and rainfall intensity, expressed as the relationship between radar reflectivity (Z) and rainfall rate (R); for instance, the Marshall-Palmer formula [9] (Z = 200 R 1.6 where Z is in mm 6 /m 3 and R is in mm/h) converts radar reflectivity into rainfall rate. Numerous studies have analyzed and explored radar reflectivity-based rainfall estimations [10][11][12][13][14][15][16][17]. For example, Borga et al. [18] used high-resolution radar rainfall fields and space-time distributed hydrological models to evaluate the rainfall runoff during storm floods. Gabella et al. [19] used radar reflectivity to improve the accuracy of rainfall estimations in complex terrains. Seo and Breidenbach [20] used rain gauge measurements to correct nonuniform spatial deviations in radar rainfall parameters in real time. Libertino et al. [21] developed a quasi-real-time procedure for an adaptive (in space and time) estimation of the Z-R relationship. Tang and Matyas [22] presented a methodology to forecast a tropical cyclone rainfall distribution up to 8 h into the future using a high-resolution Doppler radar reflectivity mosaic in a large analytical domain. Chen et al. [23] reported the vertical structures of raindrop size distribution features and quantitative precipitation estimation parameters of two main synoptic systems, typhoons and meiyu/baiu fronts, based on summer observations with a ground-based impact disdrometer and a vertically pointing radar.
Radar reflectivity is highly correlated with rainfall estimation. However, few studies have compared errors in rainfall estimates from a single scan with estimates from scans at multiple elevations; for example, the terrain might block the radar reflectivity, making rainfall conditions behind the mountains unobservable. This results in radar reflectivity that might result in the miscalculation of the actual rainfall in a specific location. The aim of this study was to develop an optimal estimation model for rainfall retrievals during typhoons in Taiwan. Usually, the radar system can provide radar reflectivity from various scanning (elevation) angles. Thus, this study used the radar reflectivity from various elevation angles as input variables for retrieval models and evaluated the optimal elevation angles. In the rainfall retrieval models established in the study, additional inputs included ground meteorological attributes as well as radar reflectivity. The meteorological attributes were obtained using the land-based weather stations affiliated with Taiwan's Central Weather Bureau (CWB) and located at Chenggong and Lanyu. The data collected from the weather stations comprised the following: pressure, temperature, humidity, solar radiation, rainfall, and wind at or near the ground. The presented retrieval models used the data combination of radar reflectivity and ground meteorological attributes with a specific gauge instrument. In practice, the meteorological attributes should be obtained in advance. Fortunately, because Taiwan has a tight cluster of automatic rainfall stations, sufficient ground meteorological attributes of an arbitrary location can be obtained. However, when a location lacks an automatic rainfall station, one could obtain the meteorological attributes through a self-built meteorological instrument.
An increased number of input variants necessitates the use of newer algorithms for high-dimensional and nonlinear regression models. With the rapid development of artificial intelligence (AI), regression-type models have become one of the key algorithms in machine learning (ML) and are used to solve high-dimensional and nonlinear problems [24][25][26][27][28][29]. Therefore, an increasing number of scholars have used advanced regression algorithm models and applied them to rainfall estimation problems to gain more precise estimation values. Some renowned ML models are support vector regression (SVR), artificial neural networks (ANNs), Bayesian networks, decision trees, and random forests [30][31][32][33][34][35][36][37][38][39][40]. Moreover, studies have used radar reflectivity in ML models; for example, Chiang et al. [41] used radar reflectivity in dynamic neural networks for rainfall estimation and prediction, in addition to weather radar data in ANNs-which are capable of processing complex nonlinear relationships-to conduct quantitative precipitation estimation. Wei [42] developed a typhoon radar reflectivity-based rainfall nowcasting model to operationally predict hourly rainfall. An adaptive network-based fuzzy inference system was developed to estimate precipitation.
In the current study, evaluation process was designed for a typhoon-season rainfall retrieval model. During the design of the process, radar reflectivity values from multiple scanning angles of weather in elevation (i.e., the elevation angles of the antenna) were tested to find the optimal elevation angles. Multiple ML algorithms were used to build a more precise rainfall estimation models, including linear regression (REG), SVR, and extreme gradient boosting (XGBoost). The XGBoost model was developed by Chen and Guestrin [43], and is a popular state-of-the-art algorithm applied in ML. XGBoost is a scalable ML system for tree boosting, which is a highly effective and widely used ML method; XGBoost improves the deficiencies of traditional tree learning algorithms in processing sparse data. Furthermore, XGBoost can simplify learning by models and prevent overfitting; therefore, its calculative abilities are superior to those of traditional gradient boosted decision trees (GBDTs). Dissertations on XGBoost have already been published in the fields of atmospheric composition and atmospheric science, substantiating its usability [44][45][46][47][48]. Currently, there are a few applications in rainfall estimation, such as [49,50]; therefore, this new algorithm was adopted in the present study to improve the accuracy of rainfall retrieval. Furthermore, the Marshall-Palmer formula (hereafter "MP") proposed in a previous study [9] was used as the benchmark model for the estimation values.
The remainder of this paper is organized as follows. Section 2 introduces the study regions, selected typhoon events, raw radar reflectivity collection, and ground meteorological attributes. Section 3 outlines the rainfall retrieval case design and algorithm theory. Section 4 describes the building of the rainfall retrieval model and its parameter verification, and Section 5 evaluates and discusses the results. Section 6 presents the typhoon simulation results, and Section 7 provides the conclusion.

Study Area and Data
Because typhoons typically move toward Taiwan along an east-to-west path, Chenggong weather station on Taiwan's east coast and Lanyu weather station in Taiwan's outlying islands were selected as the study areas ( Figure 1). We preliminarily screened for previous typhoons in Taiwan that had moved through the study areas or that had affected Taiwan's southeastern coastal areas ( Figure 2).  Table 1 presents typhoons that affected the study areas from 2008 to 2017 and their dates, total rainfall, and intensity. Fourteen typhoon events were collected for this study. According to CWB definitions, severe typhoons have maximum windspeeds of 51.0 m/s or higher near the typhoon eye, whereas moderate and mild typhoons have windspeeds of 32.7-50.9 m/s and 17.2-32.6 m/s, respectively. Statistics showed that moderate typhoons are most common (eight occurrences), followed by severe typhoons (five), and mild typhoons (one).

Radar Reflectivity
This study used radar reflectivity from Taiwan's southeastern coastal area and outlying islands recorded by the Hualien Doppler weather surveillance radar ( Figure 1). The reflectivity data from this radar in Rainbow®5 format and CWB hourly observation data were collected for each typhoon that made landfall in Taiwan; the radar reflectivity data over weather stations were collected from scanning results of the CWB's Doppler weather radar at Hualien. The Volume Cover Pattern (VCP) 21 system used by Hualien radar station can provide radar measurements for nine elevation angles: 0.5, 1.4, 2.4, 3.4, 4.3, 6.0, 9.9, 14.6, and 19.5 • . The VCP21 system can complete scanning at nine elevation angles within 6 minutes and, compared with other radar systems (such as VCP11), it has slower antenna rotation, resulting in more precise radar reflectivity and velocity data [51]. Figure 3 displays a radar reflectivity rotating 360 • with the radar at the center, where "range" is the scan range for the elevation, and "range_step" is the size of each range bin. Google Earth Pro was used to accurately determine the azimuth and the distance between the Hualien radar station and a ground weather station. The azimuth and the distance were 199 • and 102 km, respectively, between the Hualien radar station and Chenggong station, and 183 • and 218 km, respectively, between the Hualien radar station and Lanyu station. Considering strong winds affecting the rainfall location of raindrops over the ground station, the average value of radar reflectivity generated in larger station-centered grid spaces (i.e., 10 by 10 km) were selected to represent radar reflectivity intensity above the target station.

Ground Observations
This study used hourly meteorological attributes (2008-2017) from CWB weather stations (Chenggong and Lanyu). Table 2 presents the meteorological attributes and CWB notations (i.e., PS01, PS02, TX01, TX05, RH01, SS02, PP02, RH02, WD01, WD02, WD05, WD06, and PP01). There are 13 attributes at each station. Based on the meteorological attributes of the data collected, the number of model input attributes was discovered. However, some attributes may not have exhibited a high degree of correlation with the objective attribute (rainfall). Therefore, suitable attributes were required to be chosen before model construction. A correlation analysis was used to select the meteorological attributes with higher correlations with the rainfall attribute. The correlation coefficient (ρ) is defined as follows: where x is the independent variable and y is the dependent variable; x and y are the average values of x and y. Typically, |ρ| > 0.7 represents a strong correlation, and |ρ| < 0.3 represents a weak correlation [53]. Based on these definitions, ρ absolute values over 0.3 were adopted as the attribute data in this study. Figure 4 displays the correlation analysis results for each weather station. In the figure, for example, the ρ value of PS01 (−0.274) was computed by Equation (1), where x is the ground air pressure (i.e., PS01) and y is the precipitation (i.e., PP01). Table 3

Case Design and Algorithms
This section presents the process for evaluating the optimal rainfall retrieval model. As illustrated in Figure 5, the various models, including three ML-based models (REG, SVR, and XGBoost) and the MP formula, in addition to the designed cases, were used to build retrieval models. The case designs for the rainfall retrieval model were based on the datasets established in the previous section. The dataset combinations of three cases are described as follows: • Case 1 used radar reflectivity {Z} to retrieve rainfall rate. This case used radar reflectivity from every elevation angle as the model input to establish separate models. For example, the radar reflectivity from an elevation angle of 0.5 • formulates ML-based rainfall retrieval models (namely subcase 1.1). That is, R = f 1 (Z 1 ), where f 1 () can be an ML-based model or the MP formula. Because there were nine elevation angles, nine models were established (i.e., subcases 1.1 to 1.9). An additional model, subcase 1.10, featured a specific model that used radar reflectivity of all elevation angles; that is, 9 ), where f 2 () represents using ML-based models. • Case 2 used meteorological attributes {G k } k=1,6 of weather stations to retrieve rainfall rate; that is, 6 ), where f 3 () represents using ML-based models. • Case 3 combined reflectivity intensity {Z} and meteorological attributes {G} to retrieve rainfall rate. Nine elevation angles (Z 1 to Z 9 ) separately combined with {G k } k=1,6 can build nine models (i.e., subcases 3.1 to 3.9). For example, for subcase 3.2, R = f 4 (Z 2 , {G k } k=1,6 ), where f 4 () represents ML-based models. An additional model, subcase 3.10, featured a specific model that combined meteorological attributes with the radar reflectivity of all elevation angles; that is, 6 ), where f 5 () represents using ML-based models. Then, the corresponding performance criteria for the rainfall observation values and rainfall retrieval values were calculated. Based on the comparison results, the optimal calculation model and corresponding dataset were selected.

Algorithms
This section explains the theoretical bases for the REG, SVR, and XGBoost models.

REG
Linear regression is a crucial and widely used regression technique, and the main strength is that results are easy to interpret. When linear regression is performed for a set of independent variables x = (x 1 , . . . x r ), r is the number of variables, assuming that the linear regression relationship between y and x is as follows: where β 0 , β 1 , and β r are regression coefficients, and ε is the random error. In this study, rainfall was estimated with datasets {Z} and {G} using linear regression.

SVR
The basic theory of SVR is to find the most suitable hyperplane within a space. Training data were set as (x 1 , y 1 ), . . . , (x i , y i ), with x as the input characteristic and y representing the characteristic's corresponding regression value. SVR's mathematical representation is similar to the following regression formula: If the difference between the regression value f (x i ) and truth value y i is very small, then the predictive value f (x) can be accurately derived after inputting property x, and weight w is the hyperplane sought in SVR.

XGBoost
XGBoost is a novel sparsity-aware algorithm for sparse data and a weighted quantile sketch for approximate tree learning [43]. The basis of XGBoost is gradient boosting (GB). GB iteratively generates models with weaker convergence results in ML and sums each weak model's predictive results to optimize or minimize the loss function. Boosting can also be defined as raising or improving, and each addition of training generated by the new weak model improves on the previous results [54].
GB is simply a framework in which different algorithms can be entered, the most common being decision trees. The classic classification and regression tree was used in this study, and can also be referred to as GBDT [55]. Because each algorithm generates residuals, each subsequent calculation will establish a new algorithm calculation based on the gradient direction of the previous residual reduction, and the residual will decrease with multiple iterations. The derivation process of the GB formula is explained as follows.
The optimal prediction function F (x) minimizes L(y, F(x)) projected by x onto y: where F(x) is the function of weak classifier P={P 1 ,P 2 , . . . }. The weak classifier equation can be expressed as: where α m is the parameter of the mth regression tree, and β m is the weight of the same tree in the prediction function.
The mth weak classifier is expressed as β m h(x, α m ); therefore, the prediction function F (x; P) can be expressed as: The mth weak classifier in Equation (6) should be established on the prediction loss function generated by the m−1 weak classifier to predict the direction of descent; −g m (x i ) represents the direction in which the mth iteration weak classifier is built, and the formula is: α m and β m can be expressed as follows: To avoid overfitting, each weak classifier is typically multiplied by the learning rate v: XGBoost is built on the basis of this derivation formula and has two more Taylor expansions than GB when calculating residuals; as a result, XGBoost has superior convergence in its loss function prediction.

Programming Tools
The SVR and XGBoost models were implemented using the open-source scikit-learn and Keras libraries in Python 3.7 (Python Software Foundation, Wilmington, DE, USA [56,57]). Because CWB radar reflectivity data were stored as Rainbow®5 files, Python wradlib modules were then used to analyze the data and obtain the radar reflectivity.

Performance Criteria
The performance criteria used in this study included the root mean square error (RMSE), mean absolute error (MAE), relative RMSE (rRMSE), relative MAE (rMAE), and efficiency coefficient (CE). The formulas are defined as follows: where N is the number of data records; h obs i is the ith observation value; h ret i is the ith retrieval value; h obs is the average observation value; and h ret is the average retrieval value.

Modeling
Because there were only 14 typhoon events, this study adopted the following approach to use data to effectively improve the quality of model training. During a testing typhoon, the data for the other 13 typhoons were used as the training and validation dataset; the trained model parameters were then used in the simulation of the testing typhoon. During another testing typhoon, the same approach was used for model building and the simulation of the testing typhoon.
The model training and validation process in this study used 10-folder cross-validation, in which 90% of the data were randomly selected as training data each time; the remaining 10% were used as validation data and not repeated until all of the data had been part of the 10% testing set. Lastly, the final accurate value was obtained by averaging the results of each test set. The four typhoons with the greatest rainfall were selected as the testing typhoons: Tembin in 2012 (459 mm), Nepartak in 2016 (399 mm), Matmo in 2014 (394 mm), and Nanmadol in 2011 (360 mm).

Parameter Calibration
This section discusses the processes used to verify and optimize the parameters of the SVR and XGBoost models. Possible values for the parameters were searched for in the process to execute the model's best-fit data capabilities. Parameters were calibrated through trial and error; that is, a single parameter was fixed and another parameter was adjusted to verify which parameter combination had a lower error. Because of the many case models designed in this study, the verification process for model parameters in Cases 1-3 and their optimal results are explained using the examples of Chenggong station with a radar angle of 6.0 • and Lanyu station with a radar angle of 4.3 • . Figure 6 displays the verification results for the SVR models, for which the main parameters were penalty coefficient C. The C represents the degree of punishment for samples outside the margin, and the value is related to the degree of tolerance of error. The C values of 0.001, 0.01, 0.1, 1, 2, and 3 were used for verification in this study, with 10 to 1000 iterations. First, the RMSE for each iteration was drawn, fixing the C value at 0.001 and so on for every C value, to determine the smallest parameter for RMSE. The results showed that the optimal C values for Chenggong station for Cases 1-3 were 3, 2, and 3, respectively, whereas the optimal C values for Lanyu station in Cases 1-3 were 3, 0.01, and 0.01, respectively. Figures 7 and 8 present the verification results of the XGBoost models; XGBoost parameters include learning rate, min_child_weight, and max_depth. The min_child_weight refers to the minimum sum of instance weight needed in a child, and the max_depth is used to prevent overfitting, to avoid the tree growing very deep. The scope of the verification learning rate was 0.0001, 0.001, 0.01, 0.1, 0.2, 0.3, and 0.4, with 10 to 1000 iterations; min_child_weight was configured at 1, 2, and 3 for verification, and the max_depth verification scope was 3 to 15. Parameters were verified using trial and error, first by fixing the learning rate and adjusting the iterations to obtain the smaller parameter set in RMSE. Once the optimal learning rate parameter was obtained, the optimal parameter sets for min_child_weight and max_depth were verified. Table 4 lists the verification results for the optimal XGBoost model parameters in Cases 1-3.

Model Performance
Based on the approaches in the previous section, the optimal parameter set was found through completing the verification process for all cases. This section compares the error indices of all the cases. Figure 9a presents the RMSE results for the MP, REG, SVR, and XGBoost models in every case and under every elevation angle at Chenggong station; the figure demonstrates that (1) XGBoost models had the lowest error index, followed by SVR, REG, and MP; and (2) Case 3 had the lowest error index among the cases, followed by Cases 2 and 1. Based on the optimal model, XGBoost, Case 2 used ground station data and did not consider radar reflectivity from each elevation angle; therefore, the error values were the same; Case 3 used both elevated radar data and ground station data and had superior performance compared with Case 2. By illustrating the error results of these two cases (Figure 10a), we found that using different radar reflectivity could display changes in their error quantities. Figure 10a also demonstrates that optimal retrieval results were observed at an elevation angle of 6.0 • (error index value = 2.520 mm/h). Figure 9b presents the model verification results for every case of rain retrieval at Lanyu station. This figure demonstrates that, similarly, XGBoost models showed optimal performance and that Cases 2 and 3 had superior performance. Figure 10b illustrates the RMSE error values for both cases in XGBoost models for further evaluation, demonstrating that radar reflectivity data at an elevation angle of 4.3 • had the best results (error index value = 2.016 mm/h).

Evaluation and Discussion
This section presents an evaluation to determine the optimal model and dataset combinations under different elevation angles. Table 5 presents the XGBoost model with Case 3 datasets as the optimal combination at Chenggong and Lanyu stations under every elevation angle; this study compared the average performance between stations and found that Lanyu station (RMSE = 2.359 mm/s) outperformed Chenggong station (RMSE = 2.706 mm/s). The Chenggong station is situated in the southeast of the main island of Taiwan and is possibly affected by terrain factors, with the Coastal Mountain Range (150 km long, 10 km across from east to west, with an average height of 1000 m and the tallest peak height of 1682 m; Figure 11a) sheltering the station. Model verification results demonstrated that favorable results can be obtained under a specific elevation angle of 6.0 • . According to Figure 11a, the range of theoretical elevation angles can be calculated. Waco [58] reported that the upper circulation of strong hurricanes extends into the tropopause of the atmosphere at 15-18 km. Houze et al. [59] and Houze [60] reported that the convective cell of Hurricane Ophelia (2005) reached approximately 17 km in echo top height. In this study, we assumed that the height of the cloud top in a tropical cyclone is 18 km (Figure 11a). The distance from the Hualien radar station to the Chenggong station is 102 km and to the northern tip of the Coastal Mountain Range is 15 km. Therefore, the range of radar elevation angles can be derived from 3.81 • to 10.01 • . As a result, the model demonstrated its validity regarding the optimal angle of 6.0 • . The Lanyu station generally had lower RMSE at lower elevation angles (approximately 1.4-4.3 • ), according to the model results, possibly because of less interference from land elevation factors when the electromagnetic pulses (radar beam) sent and received by the radar station pass across sea level, resulting in superior error results closer to the error index values at lower elevation angles. As illustrated in Figure 11b, the theoretical radar elevation angle can be calculated for the Lanyu station. The distance between the Hualien radar station and the Lanyu station is 218 km, with the maximum elevation angle at 4.72 • . As a result, the model results appear reasonable regarding the optimal angle of 4.3 • .

Simulations
The four testing typhoons were simulated to evaluate the retrieval effectiveness. Figure 11a Among the four typhoons, Matmo had the highest rainfall between both stations (observation value = 66 mm/h at Chenggong station and 61 mm/h at Lanyu station). Figure 2f shows that the path of Matmo's center was very close to both stations, resulting in heavy rain. Rainfall observation records for the four typhoons at both stations showed that Chenggong station was more likely to see rainfall than Lanyu station, possibly because of the "wind sweep" rainfall phenomenon caused by the terrain of Central Mountain Range and the impact of the Coastal Mountain Range on the typhoon's peripheral circulation as it drew nearer Taiwan's east coast; when the typhoon airflow encounters the slope and is forced to rise, the water vapor in the air begins to condense from the lowering temperatures in higher altitudes and forms terrain rainfall. Conversely, Lanyu station is situated on the ocean, and therefore sees topographical rainfall; this is similar to the peak rainfall patterns observed for Typhoons Tembin ( Figure 2d) and Nepartak (Figure 2j) at Chenggong and Lanyu stations reflecting the possible effects of topographical rainfall. Furthermore, Nanmadol did not exhibit significant single-peak rainfall patterns, possibly because its trajectory was from south to north-drawing near Taiwan's southern tip and following the southern edge of the Central Mountain Range (Figure 2c), resulting in continuous and sustained rainfall.
In Figure 12, the orange solid line represents the MP model's retrieval value, the green solid line represents that of the REG model, the black dashed line represents that of the SVR model, and the blue solid line represents that of the XGBoost model. For simulations at Chenggong station (Figure 12a), the rain pattern variations retrieved by the four models exhibited greater differences, and peak rainfall was significantly underestimated; this may be because of faster structural variations of typhoon circulation at Chenggong station, resulting in radar data and ground meteorological attributes changing faster, and thus exhibiting high and unsteady variation; therefore, the models exhibited higher biases when estimating rainfall. At Lanyu station (Figure 12b), the overall rain patterns in the MP, REG, SVR, and XGBoost models were approximately similar to observed rain patterns. The peak time points in the MP, REG, SVR, and XGBoost models reflected underestimated rainfall variation and peak values. Both Chenggong and Lanyu stations exhibited significant peak rain patterns for Typhoon Matmo. Retrieval results mostly demonstrated that Lanyu station outperformed Chenggong station in grasping rainfall trends (i.e., fewer instances of underestimation); the reason might be smaller variations in the typhoon structure and circulation, and stable development of the typhoon structure, and the absence of topographical interference allowing radar reflectivity signals to more accurately reflect rainfall. Conversely, the typhoon circulation and structure rapidly changed when the typhoon circulation encountered land and the Central Mountain Range, resulting in greater fluctuations in radar reflectivity signals at Chenggong station; as a result, the ability of the retrieval model to reflect rainfall at Chenggong station was worse compared with at Lanyu station.
Next, the models were further compared in terms of their performance for each performance criterion. In absolute errors, MAE and RMSE indices were used to evaluate overall performance in all four typhoons; as presented in Figure 13a,b, the XGBoost model fared better than the MP, REG, and SVR models in absolute errors, and Lanyu station exhibited smaller absolute errors compared with Chenggong station. In relative errors, rMAE and rRMSE indices were used; Figure 13c,d shows that the XGBoost model outperformed the MP, REG, and SVR models and that Lanyu station exhibited smaller relative values compared with Chenggong station. Lastly, in terms of CE (Figure 13e), the XGBoost model produced the highest CE at both stations and that for Lanyu station was slightly higher, at 0.903, compared to 0.885 for the Chenggong station.

Conclusions
The aim of this study was to develop a typhoon-season rain retrieval model that could estimate possible rainfall in the study area when typhoons struck Taiwan. The studied sites were Chenggong weather station on Taiwan's southeast coast and Lanyu weather station in Taiwan's outlying islands. Rainfall retrieval cases and a process for evaluating the optimal model were designed, and the model case designs employed combined datasets of radar reflectivity factors from multiple elevation angles and ground meteorological attributes. The scope of the study was 14 typhoons from 2008 to 2017, and the radar reflectivity at nine elevation angles were provided by the VCP21 system at the CWB's Hualien weather surveillance radar station. The case models in this study were constructed using the ML algorithms REG, SVR, XGBoost, and MP.
The results at the experimental stations can be summarized as follows: • In the process of building the rainfall-retrieval models, combining radar reflectivity with ground meteorological attributes (Case 3) achieved superior rainfall-retrieval results compared with only inputting radar reflectivity (Case 1) or only ground meteorological attributes (Case 2).

•
When the experimental station radar elevation angles were evaluated, radar reflectivity at an elevation angle of 6.0 • combined with ground meteorological attributes were the optimal input variables for rainfall retrieval at Chenggong station; at Lanyu station, the optimal input variables were radar reflectivity at an elevation angle of 4.3 • combined with ground meteorological attributes.

•
Simulation results of the testing typhoons (Nanmadol in 2011, Tembin in 2012, Matmo in 2014, and Nepartak in 2016) demonstrated that Lanyu station exhibited smaller error index values in model retrieval than Chenggong station. This study speculated that this is because Lanyu station is situated on the ocean, where a typhoon circulation encounters little to no topographical interference to affect its structure when passing; as a result, the radar reflectivity signals are better reflected off the variations (gradients) of water vapor and possibly rain. By contrast, Chenggong station is affected by rapid changes in typhoon circulation and structure when a typhoon circulation encounters land and the Coastal Mountain Range and the Central Mountain Range, resulting in greater fluctuations in radar reflectivity signals. As a result, the Chenggong station retrieval models were worse at predicting rainfall than those at Lanyu station.

•
In terms of model errors, the XGBoost model at both Chenggong and Lanyu stations exhibited smaller error indices than the MP, REG, and SVR models (including absolute errors (MAE and RMSE) and relative errors (rMAE and rRMSE)). In terms of efficiency performance during retrievals, Lanyu station's XGBoost model had the highest efficiency coefficient (0.903), and Chenggong station's XGBoost model had the second highest (0.885).
Finally, based on the radar reflectivity at optimal radar elevation angles and ground meteorological attributes verified in this study's evaluation process, entering the combined dataset into a high-performance algorithm (XGBoost) can effectively improve the accuracy of rainfall retrieval during typhoon season. As a result, the concrete study results also demonstrate the contribution of this study.