ADASYN-LOF Algorithm for Imbalanced Tornado Samples

: Early warning and forecasting of tornadoes began to combine artiﬁcial intelligence (AI) and machine learning (ML) algorithms to improve identiﬁcation efﬁciency in the past few years. Applying machine learning algorithms to detect tornadoes usually encounters class imbalance problems because tornadoes are rare events in weather processes. The ADASYN-LOF algorithm (ALA) was proposed to solve the imbalance problem of tornado sample sets based on radar data. The adaptive synthetic (ADASYN) sampling algorithm is used to solve the imbalance problem by increasing the number of minority class samples, combined with the local outlier factor (LOF) algorithm to denoise the synthetic samples. The performance of the ALA algorithm is tested by using the supporting vector machine (SVM), artiﬁcial neural network (ANN), and random forest (RF) models. The results show that the ALA algorithm can improve the performance and noise immunity of the models, signiﬁcantly increase the tornado recognition rate, and have the potential to increase the early tornado warning time. ALA is more effective in preprocessing imbalanced data of SVM and ANN, compared with ADASYN, Synthetic Minority Oversampling Technique (SMOTE), SMOTE-LOF algorithms.


Introduction
Tornadoes are small and medium-scale extreme weather events, usually generated at the bottom of thunderstorm clouds, with destructive power that can tear houses and trees and roll into the sky. Tornadoes occur less frequently in China than in the United States each year, and the majority of tornadoes occur from noon till evening in the summer months (June, July, and August) [1]. A tornado can be classified as EF0 to EF5 level according to the damage degree and wind speed [2,3]. With the upgrade of radar detection capabilities, tornado recognition algorithms went through the following process: tornadic vortex signature (TVS) criteria [4]-mesocyclone detection algorithm (MDA) [5,6]-tornado detection algorithm (TDA) [7]-tornadic debris signature (TDS) [8]. With the upgrading of computer technology in the past few years, artificial intelligence (AI) algorithms and classification models are gradually applied to tornado detection. For example, tornado detection algorithm based on neuro-fuzzy system and fuzzy logic [9,10], the S-band radar adaptive neuro-fuzzy tornado detection system [11], forecasting tornado with random forests [12], using a convolutional neural network (CNN) and image to predict tornadoes [13]. Artificial intelligence in the future tornado detection can reduce the tornado false alarm rate, increase the early warning time, and lower the experience restrictions on weather forecasters.
When applied to detect tornadoes, artificial intelligence algorithms usually suffer from the class imbalance problem. The class imbalance problem means the instances of one class are much more than the instances of another class [14], and the performance of classifiers leans to be partial towards the majority class in the imbalanced data set [15]. The imbalance might make it difficult to develop effective classifiers [16] in many applications such as sensor and detection [17,18]. Imbalanced models result in poor detection and high false

Weather Radar
Fast-scanning and high-resolution weather radars, such as S, X, Ka-band, and phased array radars, are widely applied to detect and warn tornadoes [32][33][34][35]. The S-band China new generation of Doppler weather radar (CINRAD SA) plays an essential role in monitoring and forecasting tornadoes. The CINRAD SA's maximum distance resolution is 0.25 km, and the maximum detection range is 460 km. The radial resolution is 1 degree. The reflectivity (Z) distance resolution is 1km, ranging from 0 to 460 km. The detection range of Doppler velocity (V) and velocity spectrum width (W) is 230 km, and the distance resolution is 0.25 km [36]. The CINRAD SA scans in the volume coverage pattern (VCP) 21, elevation angles from 0.5 to 14.5 degrees, with 8 effective elevation data, 0.5, 1.5, 2.5, 3.4, 4.3, 6.0, 10, and 14.5 degrees. The scale of tornadoes usually ranges from tens meters to two kilometers. High-resolution radar networks can improve the acquisition and retrieval of tornado features [37]. The distance resolution limited CINRAD SA's capability to detect the structure of tornadoes finely. However, CINRAD SA can warn and identify tornadoes by monitoring mesocyclone and tornado velocity signatures, which means that artificial intelligence tornado recognition algorithms based on CINRAD SA's tornado characteristics are feasible.

Tornado Samples
An interpolation algorithm was first used to increase Z distance resolution to 0.25 km when constructing the tornado sample set. Z, V, and W were combined at the same moment and elevation angle. Additionally, the combined data was divided into many 4 × 4 blocks. The characteristics related to tornadoes were calculated in each block, such as maximum, minimum, and average of ZVW, tornado velocity signature, and the range of W, et al., 32 features in total, as shown in the Table A1. The time and coordinate information of tornadoes were used to classify samples (class: yes-tornado = 1 (yes), non-tornado = 0 (no)). The small-scale characteristic of tornadoes leads to the tiny tornado area in Plan Position Indicator (PPI) data. The feature of short generation and disappearance time of tornado results in a tiny proportion of tornado data in the radar database. These two characteristics cause a small number of yes-tornado samples (positive samples), and a large number of non-tornado samples (negative samples) in the tornado sample set obtained by the block segmentation, which will lead to a considerable difference in the proportion of the two-class samples forming imbalanced data, as is shown in Figure 1. Negative samples belong to the majority class samples for the sample set, and positive samples belong to the minority samples. Minority samples tend to have higher importance than majority samples in the tornado classification model. The prediction model obtained from an imbalanced sample set will reduce the recognition effect of the minority class in order to obtain high overall classification accuracy [23]. Calculating the historical data of tornadoes recorded by CINRAD SA from 2005 to 2015, there are a total of 3897 samples, 97 tornado samples (minority class samples), and 3800 non-tornado samples (majority class samples). The class imbalance ratio is relatively high, and the results of tornado detection models are flawed.

ADASYN
One training sample set D tr = {x i , y i }, i = 1, . . . , m, where x i is a sample vector with n-dimensional features, and y i ∈ Y = {1, 0}, and the m indicates the total number of samples. Firstly, calculate the number of synthetic minority class samples that need to be generated according to Equation (1). The m s and m l , respectively, indicate the number of minority class samples and the number of majority class samples in the D tr , m s ≤ m l and m s + m l = m. The β ∈ [0, 1] is used to specify the D tr balance level after the generation of the synthetic samples.
Secondly, find the k-nearest neighbors for each minority sample x i according to the Euclidean distance in n-dimensional space, and calculate the ratio r i according to Equation (2), where ∆ i is the number of majority class samples in the k-nearest neighbors of x i and k is equal to the number of k-nearest neighbors. Then, the r i is normalized to ther i according to Equation (3), where ther i is the density distribution and ∑ m s i=1r i = 1.
Thirdly, calculate the number of synthetic samples needed to be generated for each minority class sample x i , according to Equation (4).
Finally, generate g i synthetic samples for each minority class sample x i , according to Equation (5), where the x zi is randomly selected from the minority samples in the k-nearest neighbors and δ is a random number, δ ∈ [0, 1], as is shown in Figure 2 (left).

LOF
After the ADASYN algorithm, the tornado sample set can obtain a balanced ratio, where the number of minority samples: majority samples = 1:1. The synthetic minority samples may have noise samples, and the local outlier factor (LOF) algorithm is used to identify and eliminate noise [31]. The detailed process of the algorithm can refer to reference [38].
For a sample p, the local outlier factor of p is calculated by Equation (6), where the LOF value is the average ratio of the local reachability density of p and those p's k-nearest neighbors. The LOF value of one sample that is not noise is approximately 1. When the LOF value of a sample is significantly greater than 1, it can be labeled as noise, as is shown in Figure 2 (right).

Machine Learning Models
Supporting vector machine (SVM) classification algorithm constructs a hyperplane that separates training samples into binary class, and the SVM is a linear classifier defined in a very high dimensional feature space [39]. The SVM formulation corresponds to the problem of minimizing ||w|| 2 /2 under the constraints where the w is the weight vector that is perpendicular to the separating hyperplane, b is the bias, and l is the number of observations [19]. If the training samples are nonlinearly separable in the feature space, the kernel function is used to increase the dimension of sample space, and the nonlinear problem is converted to a linear problem in a high dimension space, shown in Figure 3 SVM, and Chang et al. developed a library for SVM, including C-SVC, v-SVC, and SVR et al. [40]. The SVM usually outputs classification probabilities by using the Platt scaling method [41].
Artificial neural networks (ANN) algorithm has attracted much research in the past few years, and several studies have been applied to the weather radar, such as a study that combined the generative adversarial networks (GNN) and super-resolution reconstruction of weather radar echo images [42]. Another study applied a deep convolutional neural network (DCNN) to NEXRAD PPI scans, and the increased resolution and frequency content improved observation capabilities [43]. The structure of ANN includes: one input layer, several hidden layers, one output layer, and the hidden layers connect the input and output (as is shown in Figure 3 ANN) [44]. The ANN uses functions, such as tanh and sigmoid, to map and activate neurons, and the ANN requires multiple rounds of iterative training to minimize loss and achieve good accuracy [45,46]. Binary ANN usually uses 0.5 as the threshold of classification probability to classify samples.
Breiman proposed the random forest (RF) algorithm in 2001. RF constructs multiple classification trees through randomly sampling samples and randomly selecting features and uses a voting mechanism to make prediction and classification, and outputs probabilities according to the voting results. (shown in Figure 3 RF) [47][48][49]. The RF usually uses ID3, C4.5, and GINI methods [50,51]. ID3 cannot handle the problem of continuous attributes, but the C4.5 algorithm can handle it. The Gini index reflects the purity of a dataset, and the smaller the value, the higher the purity. The RF is a multivariate nonlinear classification model, avoiding model overfitting with less sensitivity to noise [52]. RF has been widely used in the field of remote sensing [53][54][55] and extreme weather warnings [12,56,57].

Experiment 1
In order to obtain qualitative differences between models with and without ADASYN-LOF algorithm, the numerical results need to be compared.  Figure 4. We created a copy of the training samples that were directly used to build models (SVM (IBD), ANN (IBD), RF (IBD)). The original training samples were processed by the ADASYN approach, so the number of positive samples was equal to the number of negative samples; then, the LOF algorithm was used to identify the noise of balanced data. After the LOF approach, models (SVM (BD), ANN (BD), RF (BD)) were obtained. ADASYN's k = 20, LOF's k = 20, and LOF eliminated 93 noise samples during this experiment. The testing samples were directly used to obtain models' quantitative performance, and the binary classification confusion matrix (Table 1) was used. In the confusion matrix, the TP is the number of correct yes-tornado samples predicted by the model, FP is the number of non-tornado samples that the model misclassifies as yes-tornado samples, FN is the number of yestornado samples that are misclassified as non-tornado samples, and TN is the number of non-tornado samples correctly classified by the model. According to TP, FP, FN, and TN, the accuracy (7), precision (8), F-score (9), and G-mean (10) can be obtained, and the F-score equals to F1-score when β = 1 and Recall = TP/(TP + FN). In addition, in order to compare the performance of different models, the Area Under Curve (AUC) score was used. AUC is defined as the area under the receiver operating characteristics curve enclosed by the coordinate axis. The larger the AUC value, the better the average performance of the model. When assessing the weather forecast model, the contingency table was usually used to evaluate the forecast accuracy. So, combining the confusion matrix and the 2 × 2 contingency table (Table 2), POD (11), FAR (12), and CSI (13) were obtained. The different model performance results show in Table 3.

Experiment 2
In order to compare the performance of different models in actual tornado detection, while making full use of all available samples, all tornado samples were used to train models. The experiment steps are shown in Figure 5. Create a copy of the training samples that were directly used to build models (SVM (IBD), ANN (IBD), RF (IBD)). For the original training samples, the ADASYN algorithm was used to balance the ratio of yes-tornado samples and non-tornado samples to 1 : 1. The LOF approach identified the noise samples of synthetic samples. Then, models (SVM (BD), ANN (BD), RF (BD)) were obtained, and models were used to detect tornadoes from 2016-2018, and the results are shown in Section 5.2. ADASYN's k = 20, LOF's k = 20, and LOF eliminated 246 noise samples during this experiment.

Results and Discussion
The sample set that has the class imbalance problem is imbalanced data, forming imbalanced models, such as SVM (IBD), ANN (IBD), RF (IBD). Similarly, the balanced data, without the class imbalance problem, forms balanced models, such as SVM (BD), ANN (BD), RF (BD).

Model Performance
In Table 3, the different models' results were compared. The proposed approach by combing ADASYN and LOF in handling training samples is called the ADASYN-LOF approach (ALA), and the NONE indicates that the models were built by the original training samples (copy). After the ALA, the SVM's ACC and PRE decreased, the ANN's ACC and PRE increased, and the RF's ACC and PRE increased. The balanced models had a better F1-score, G-mean, and AUC than imbalanced models, which indicates that the ALA improves the performance of models. For the AUC after the ALA, the SVM's AUC had the maximum performance improvement, and the AUC score order was: SVM > ANN > RF, indicating that the average performance of the balanced model is: the SVM is the best, followed by ANN, and final RF. The balanced SVM's POD was greatly improved, and the CSI increased, but the FAR also increased. The balanced ANN had a better POD, CSI, and FAR than the imbalanced ANN. The balanced RF had a better performance of POD and CSI and worse FAR performance than imbalanced RF. In terms of POD, FAR and CSI after the ALA, the biggest improvement was ANN. The POD order after the ALA was SVM > ANN > RF. Although the POD of SVM was greater (>0.15) than the ANN and RF, the SVM's FAR was much higher than the POD of ANN and RF. The high FAR caused the SVM's CSI to be slightly smaller than the CSI of ANN and RF.
The yes-tornado and non-tornado samples are unequally distributed in the imbalanced sample set, which leads to the models having a high misclassification rate of yes-tornado samples and relatively low G-mean, F1-score, POD, and CSI. After the two class samples are in a balanced distribution, the models' ability to carry out predictive accuracy in determining the yes-tornado samples is improved, thereby increasing the G-mean, F1-score, POD, and CSI.

Tornado Detection Results
When using the models to detect tornado cases, the historical tornado events that were not included in the training and testing samples were used from 2016 to 2018. The case requirements are met: the distance between the tornado and the radar center is no more than 130 km, and the Meteorological Bureau has official records about the tornado. In this section, the model detection results are represented by black asterisks, the value is the classification probability of the model, and the results are displayed in reflectivity Z.   The second tornado case occurred in Dongtai, Jiangsu Province at around 11:00 (Beijing time, UTC+8), on 2 July 2017. The tornado was 77 km away from the radar center, and the sample variable value calculated by the block segmentation was small, which caused the imbalanced models to fail to recognize this tornado, as shown in Figure   The tornado identification results of models at 11:01 (Beijing time, UTC+8) 0.5-degree (the black circle centered at the tornado location with a radius of 1.5 km, the SVM, ANN, RF represent Support Vector Classifier, Artificial Neural Network Classifier, Random Forest Classifier, the V represents Doppler Velocity, and the W represents Doppler Velocity Spectral Width. The BD indicates that the classifier was formed on a balanced tornado dataset, and the IBD indicates that the classifier was formed on an imbalanced tornado dataset.).
The third case was the tornado that occurred in the outer circulation of Typhoon Wembia No.1815 in 2018, which touched the ground in Xuzhou, Jiangsu Province at around 18:40 (Beijing time, UTC+8), on 18 August. The tornado was far away from the radar center, and the distance was 120.5 km. When the detection range of CINRAD SA is more significant than 100 km, CINRAD SA suffers from beam broadening and power attenuation, so only partial information of the tornado can be obtained. In Figure 9 V, although V = |V − − V + | = 26.5 m/s at the 0.5-degree elevation, the radar TVS product did not issue a tornado warning because the thresholds of TVS were not met. The imbalanced models were used to detect this tornado, and no tornado warnings were issued, as shown in Figure 9 SVM detection results (IBD), ANN detection results (IBD), and RF detection results (IBD). The balanced SVM and ANN model identified this tornado, as shown in Figure 9 SVM detection results (BD), ANN detection results (BD). However, the balanced RF model did not issue this tornado, as shown in Figure 9 RF detection results (BD). In the first tornado case, the balanced and imbalanced models were used to compare the tornado's early warning time. The first tornado warning of the imbalanced models was at 14:14 (Beijing time, UTC+8), and the first tornado warning of balanced models, SVM and RF, was at 14:08 (Beijing time, UTC+8). The balanced models increased the tornado early warning time from 16 min to 22 min, indicating that the ALA optimizes the distribution of samples and can advance the tornado early warning time. In addition, the balanced models had a higher probability than the imbalanced models (SVM BD: 0.99 > SVM IBD: 0.98, ANN BD: 0.99 > SVM IBD: 0.97, RF BD: 0.99 > RF IBD: 0.81), which indicates that the results of the balanced models have higher credibility than the results of the imbalanced models.
In the second tornado case, the scale of the tornado and the sample features were small, which caused the imbalanced models cannot identify this tornado. The balanced models recognized the tornado, and the balanced models had better F1-score and G-mean score than the imbalanced models in Table 3, which confirms that balanced models have better classification performance and can warn more tornado cases than the imbalanced models, making up for the shortcomings of the imbalanced models. In addition, it is worth mentioning that there were two asterisks in the Figure 8 ANN detection results, this was because: when the models were used to detect tornadoes, the intersection between adjacent blocks was also calculated, as shown in Figure A1.
In the third case, the tornado was far away from the radar. The radar was heavily affected by beam broadening and power attenuation, resulting in the TVS algorithm failing to issue tornado warnings. For similar reasons, in the detection results at 18:45 (Beijing time, UTC+8), the imbalanced models could not identify the tornado, but the balanced SVM and ANN model identified the tornado. The balanced and imbalanced RF models did not issue any tornado warnings. It is speculated that the negative velocity value of the tornado was small, which caused the failure of RF models. In this tornado case, the performance of the balanced model was: ANN > SVM > RF and the average performance of the balanced model obtained in Table 2 was SVM > ANN > RF. The difference in performance is because the internal classification criteria of different models is different, which indicates that multiple models should be coordinated in the actual tornado warning.
In addition to using specific tornado cases to test the models, this experiment compared the noise immunity performance of balanced models and imbalanced models (figures omitted), and the results show that the balanced models have more robust noise immunity performance than the imbalanced models. Especially when the radar is of poor quality, the balanced models issue fewer or no false warnings than the imbalanced models.
Before studying the ALA, weight and cost methods were used to solve the imbalance. However, due to the small number of positive samples, the methods (adding weights for different class) did not generate new samples and did not improve the problem of missing tornadoes. The study compared the performance of the ADASYN-LOF, ADASYN, SMOTE-LOF, and SMOTE algorithms on the dataset, as shown in Table A2. For the SVM, the ADASYN-LOF's ACC, PRE, F1-score, G-mean, AUC, FAR, and CSI were better than the ADASYN, SMOTE-LOF, and SMOTE. For the ANN, the ADASYN-LOF's AUC, F1-score, G-mean, AUC, POD, CSI were better than the other algorithms. For the RF, the PODs of ADASYN-LOF and SMOTE-LOF were equal. Generally, if using SVM or ANN as a classifier, it is better to use ADASYN-LOF to preprocess imbalanced data. For RF, the SMOTE-LOF could be better.
The LOF algorithm can also be used for unsupervised classification, and it is hoped that subsequent research will apply this method to the detection of tornadoes (outliers).

Conclusions
The tornado sample set usually has the class imbalance problem that might cause the machine learning models to have a poor tornado detection effect. The adaptive synthetic (ADASYN) sampling approach is used to solve the problem, and the local outlier factor (LOF) algorithm is applied to identify noise data in synthetic samples. The ADASYN and LOF approach is called the ADASYN-LOF approach (ALA). The SVM, ANN, RF models are used and the main conclusions are as follows.

1.
After the ALA, the accuracy and precision are increased or decreased, the F1-score, G-mean, AUC, POD, CSI are significantly improved, the average performance is improved, and models have better noise immunity performance than the models without the approach. 2.
Using specific tornado cases to test models, the balanced models have the following advantages after the ALA.
• In the early tornado warning, the models have the potential to increase the early warning time of tornadoes touching the ground. • The balanced models can identify some tornadoes that cannot be identified by the imbalanced models. • The models can identify tornadoes that cannot be detected due to the limitation of the tornado velocity signature (TVS) algorithm threshold.

3.
Compared with the ADASYN, SMOTE-LOF, and SMOTE algorithms, the ALA performs better in preprocessing imbalacned data if SVM or ANN is used as the classifier. If RF is used, the SMOTE-LOF algorithm could work better.
There are three directions for future research: • optimize the k value of the ALA and appropriately reduce the dimension of sample features; • study how to appropriately decrease the majority samples when applying the ALA; • use more datasets (such as tornado datasets in the United States) to evaluate the ALA and apply outlier detection algorithms to detect tornadoes.

Acknowledgments:
We thank the reviewers for their constructive comments and editorial suggestions that significantly improved the quality of this paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: