2.3. Calculation of Features
Given that the definitions of TC genesis are not always the same according to different scientific research and operational agencies [
42,
54], in this study, we define that a TC forms when its 1 min maximum sustained wind speed reaches 21 knots (1 knot equals about 0.5144 m s
−1) for the first time, which approaches the lower bound of TD (10.8 m s
−1) defined by the China Meteorological Administration (CMA).
In order to determine the temporal and spatial range of features, the occurrence time of LMI and location of TCs are investigated. As
Figure 2a shows, storms with an LMI level of TD or TS (weak TC) will not travel too far from their genesis location at maximum intensity, especially TDs (
500 km). However, the medians of minor and major TCs (strong TC) are all in the range of 1500–2000 km, with quite small differences among them. The mean LMI location of weak TC is about 5° north and 5° west of the genesis location, while that for strong TC is about 7.5° latitude and 14° longitude (
Table 2 and
Figure 3). The mean genesis location of strong TC (13.470° N) is to the south of weak TC (17.101° N), but there is not much difference between the mean latitude of their LMI location (21.078° N and 21.925° N, respectively). These results are understandable, since strong TCs are created under more favorable environmental conditions (e.g., warmer sea surface temperature (SST)) and are potentially fueled by more energy to travel after genesis. On the other hand, higher latitude usually comes with worse environmental conditions for intensification, thus TCs reach LMI at a similar latitude no matter how strong they are. In order to obtain as much useful information as possible during their early lifetime and considering the asymmetrical structure of TCs, the corresponding variables are averaged within 8 arc-shaped sectors of different radii (600 km for the inner circle, referring to a TC’s main circulation, and 600–1500 km for the outer circle, referring to the surrounding environment) and orientations in a storm-centered area (
Figure 1). Compared with the calculation method introduced by Ditchek et al. [
41], this method better considers the round shape of TC circulation, and features are independent of each other. Moreover, we also found including an axisymmetric average (i.e., a circle over the TC center) as one of the features in the machine learning cannot change the results materially in terms of what variables are the most important for LMI, but otherwise performs badly on testing.
Figure 2b shows the interval between TC genesis and LMI. The distribution is similar to that in
Figure 2a, indicating that generally the stronger the LMI, the longer the TC interval. It is interesting that almost every strong TC (only 3 exceptional cases) experienced a “developing stage” for at least 2 days before reaching LMI after genesis. For weak TCs, the interval was quite short (e.g., less than 48 h for all TDs). Therefore, information from the first 48 h is available to represent early-stage conditions of strong TCs.
Similar to the process in the Statistical Hurricane Intensity Prediction Scheme (SHIPS) [
6,
7,
55], features are divided into 2 groups in this study: (1) TC state features, which are scalars that describe the current status or variation trend of a TC such as size, moving direction, and translation speed (
Table 3); and (2) environmental features, which are multidimensional variables that depict the dynamic or thermodynamic conditions of a TC, such as air temperature, relative humidity, and vertical wind shear (
Table 4). Some of these parameters are crucial predictors in SHIPS for intensity prediction (e.g., SHRS and SHRD) [
55], and some have a huge impact on TC genesis (e.g., translation speed) [
11,
12,
56]. All of them are derived from ERA5 hourly reanalysis and the IBTrACS dataset. The method using reanalysis and actual best track data to establish a statistical model is known as the “perfect prognostic” methodology [
57].
As for the variables mentioned in
Table 3, each one is averaged every 12 h during the first 2 days of a TC’s lifetime to be a feature, except for JDAY (absolute value of genesis year-day minus 248). Specifically, the variables related to TC size are computed from 10 m wind data from ERA5 [
58], since the corresponding information in IBTrACS is incomplete. In this way, the piecewise cubic Hermite interpolating polynomial (PCHIP) method [
59] is employed to extract the radius of 3 m s
−1 wind speed (R
3) and the radius of maximum wind (
RMW) from the storm-relative azimuthal-mean radial profiles. They represent the storm sizes of the inner and outer core, respectively. Similar to the concept of TC fullness [
60] as the ratio of the TC’s outer-core wind skirt to outer-core size, tightness is calculated by:
By quantitively measuring the TC’s outer-core wind structure, this variable describes the destructiveness of the storm to some extent.
Variables listed in
Table 4 are also averaged within 8 sectors to be a feature in the model after temporal averaging (
Figure 1). The maximum potential intensity (
MPI) used in this study is calculated by an empirical function derived from the observed maximum intensity of TC with respect to SST [
55,
61], rather than the theoretical form raised by Emanuel [
62]:
The coefficients in this exponential function are given by A = 38.21 kt, B = 170.72 kt, C = 0.1909 , and T0 = 30.0 , and 185 kt is set as the upper boundary of MPI.
2.4. Ensemble Learning Method
The decision tree model mimics how people think about a problem and finally make decisions, based on the rules organized in a tree shape [
63]. It has a variety of forms, and one of them is the classification and regression tree (CART), which typically uses the Gini index as the rule to choose the best splitting feature at each node in classification [
64]:
where
and
refer to the original dataset and the selected feature, respectively,
is the total number of features,
represents the number of possible values of
,
is the probability of the sample belonging to class
, and
is the subset split by
. The
Gini index shows the “impurity” of the subsets by calculating the possibility that two randomly chosen samples in a subset have different actual labels. A low Gini index suggests that the subset split by
is quite homogeneous, hence it is useful for classification [
65]. After the training is finished, the model will be able to classify new samples into certain categories by judging their features step-by-step. CART can handle both classification and regression issues well, with good capacity for interpretation, and acquires less training data than artificial neural networks [
66]. The detailed algorithms for CART are provided in
Appendix A.
Since a single decision tree is prone to overfit the training data by generating too many branches [
67], we use “pre-pruning” procedures (e.g., restricting the maximum depth of a single tree) to prevent an unnecessarily complicated structure, and use ensemble to resist overfitting. Ensemble learners contain sets of weak learners, and three ensemble learning methods based on CART were applied in this study: Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBoost), and random forest [
68,
69,
70]. AdaBoost and XGBoost are boosting models that train base models in series to reduce the bias by changing the weight distribution of samples at each step. Random forest is a typical “bagging” algorithm that has a parallel framework to reduce variance by constructing many decision trees. The detailed algorithms for tree-based ensemble models are provided in
Appendix B. Generally speaking, ensemble learning methods are much more accurate and robust than individual decision tree models [
71,
72].
Similar to the individual decision tree model, the tree-based ensemble model not only has good performance on classification and regression tasks, but is also available to trace the contribution of each feature. Along with node division by the values of splitting features, the decreased impurity in subsets is maximized at each step. Mean decrease impurity (MDI) is employed to judge the importance of feature
when the splitting point is set as s at node
, whose value equals the mean decrease of selected metric
over all nodes and all trees [
70]:
where
is the number of decision trees in ensemble model
,
is the fraction of subset at node t in a decision tree,
refers to the decreased impurity measured by the selected splitting criterion,
is the weight of decision tree
(
in random forest), and
is the value of the feature used in partition. Since we chose the Gini index as the splitting criterion for all models, we call the normalized MDI the Gini importance index (GII; not the same as the Gini index).
However, critical features assessed by only one criterion may be misleading, as the GII will be abnormally high when applied to high cardinality features [
64]. To ensure the robustness of selected features, two other criteria, mean minimum tree depth (MMTD) and total split time (TST), are also considered quantitative indicators of feature importance. In tree-based models, the earlier and more frequently a feature is selected, the more important it is. Therefore, if the feature has a high GII, a small MMTD, and a large TST, then it is significant for LMI estimation.
2.5. Workflow of the Model
In order to better capture the detailed factors of LMI for TCs with different intensity, we developed a two-step model to estimate the LMI of a formed TC based on a classifier and a regressor (
Figure 4). The first step of the model is to judge whether or not a storm will become a strong TC by learning its genesis features (step 1). Since we are less interested in the specific intensity that a weak TC will finally reach, the next step of the model further explores the exact intensity of strong TCs only (step 2), where features during the first 48 h after genesis are considered.
TC cases are randomly divided into two parts to establish the model: the training set, used to tune the parameters of the model, and a testing set, used to evaluate its performance. The ratio of the two subsets is 5/1 in this study. During the training process, the three ensemble methods mentioned above are applied to the training set to tune its critical parameters in the two steps (
Appendix B). Meanwhile, k-fold cross-validation [
73] is applied to the training set to verify the capability of the model (k = 10 in this study). The training set is divided equally into k subsets; then, training and testing are performed for k iterations. During each iteration, one subset is selected for validation while the remaining k–1 subsets are used to tune the parameters without overlap, so that each sample of the dataset can be used for training and validation. Finally, the well-tuned model is assessed in the testing set. In step 1, we use accuracy and F1-score as the metrics to evaluate the fitting capability of classifier:
where
P is precision and
R is recall, calculated as:
The meanings of the double-letter variables in Equations (6)–(9) are explained in the confusion matrix (
Figure 5). Accuracy indicates the correctness of all decisions, and the F1-score is a comprehensive term that judges the robustness of a classifier. The model will get a high F1-score only when precision and recall are both high, with precision measuring the quality of predicting true positive cases and recall measuring the completeness of the classifier’s judgment. In step 2, the coefficient of determination (R
2) and root mean square error (RMSE) are the two main metrics to evaluate the fitting capability of the regressor.
After the three ensemble methods are well tuned for their optimum parameters, the one showing the best performance on the testing set is selected as the benchmark in steps 1 and 2. To better understand the contributions of different features to the LMI of TCs, the GII, MMTD, and TST of the benchmark are assessed to determine the most important features. After that, the leading features are analyzed through storm-centered composites of different LMI groups. Comparing their horizontal distribution and temporal variation can show how the differences happen at the early stage of a TC’s lifetime.