Next Article in Journal
The 2017 Mega-Fires in Central Chile: Impacts on Regional Atmospheric Composition and Meteorology Assessed from Satellite Data and Chemistry-Transport Modeling
Next Article in Special Issue
The 3D Neural Network for Improving Radar-Rainfall Estimation in Monsoon Climate
Previous Article in Journal
The Effect of Non-Compliance of Diesel Vehicle Emissions with Euro Limits on Mortality in the City of Milan
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Convective Storm Location Prediction Model Based on Machine Learning Methods

Department of Electrical and Electronics Engineering, Pusan National University, Busan 46241, Korea
*
Author to whom correspondence should be addressed.
Atmosphere 2021, 12(3), 343; https://doi.org/10.3390/atmos12030343
Submission received: 28 January 2021 / Revised: 3 March 2021 / Accepted: 4 March 2021 / Published: 6 March 2021
(This article belongs to the Special Issue Weather Radar in Rainfall Estimation)

Abstract

:
A weather radar is a frequently used device in remote sensing to identify meteorological phenomena using electromagnetic waves. It can observe atmospheric conditions in a wide area with a remarkably high spatiotemporal resolution, and its observation results are useful to meteorological research and services. Recent research on data analysis using radar data has concentrated on applying machine learning techniques to solve complicated problems, including quality control, quantitative precipitation estimation, and convective storm prediction. Convective storms, which consist of heavy rains and winds, are closely related to real-life and cause significant loss of life and property. This paper proposes a novel approach utilizing the given convective storms’ temporal properties based on machine learning models to predict future locations. The experimental results showed that the machine learning-based prediction models are capable of nowcasting future locations of convective storms with a slight difference.

1. Introduction

Convective storms are hazardous meteorological events that are accompanied by heavy precipitation, lightning, and strong winds. They influence various fields ranging from stopping human activities to losses of life and property. Consequently, it has been considered one of the primary goals in meteorological fields to make a short-term forecasting (or nowcasting) model. Despite the various approaches that have been introduced and widely used in practice over the years, nowcasting convective storms remains a challenging problem due to the complexity of the atmospheric conditions and relevant dynamical processes [1]. Although many devices and methods, including satellite, Doppler radar, and numerical weather prediction (NWP), are available to obtain useful meteorological information, the Doppler radar is the most preferred selection because it provides three-dimensional structures of the convective storms with a high spatiotemporal resolution by using rapid volumetric scanning with broad coverage [2]. These exceptional properties of the Doppler radar allow monitoring and analyzing properties of convective storms. Traditional radar-based nowcasting approaches consist of two broad categories: cross-correlation and centroid-based methods.
The cross-correlation based nowcasting method uses two-dimensional radar reflectivity data. It partitions the data into features and identifies the vector field that maximizes the correlation between identified features along consecutive time. A representative example of this type of method is TREC (Tracking Radar Echo by Correlation) [3,4]. The advantage of this method is that it can derive more precise speed and direction information. On the other hand, it is incapable of identifying and tracking individual convective storms, which cannot extract each convective storm’s quantitative characteristics. The centroid-based nowcasting method analyzes a series of radar reflectivity data obtained along time to identify convective storms and find their past trajectories. After that, it extrapolates the identified convective storms’ motion using a linear trend model to predict where they will be in the future. The advantages of this method are that it tracks individual convective storms effectively and provide their temporal properties. Indicative examples of this method are TITAN (Thunderstorm identification, tracking, analysis, and nowcasting) [5], SCIT (Storm Cell Identification and Tracking) [6], and TRACE3D [7].
Among those methods, TITAN has significantly influenced its post-researches by providing the following assumptions to predict future locations of convective storms: A storm tends to move along a straight line; A storm growth or decay follows a linear trend; Unexpected departures from the above behavior occur. Although those assumptions make the given problem straightforward, they make the forecasting model vulnerable to predicting the convective storms’ complicated movements. Recent research has been aware of these facts and suggested ways to improve the situation by applying machine learning, which can solve complex and nonlinear problems [8]. For example, Rossi et al. [9] uses the Kalman filter for probabilistic nowcasting to overcome the limited ability of deterministic approaches. Also, Han et al. [10] applies the support vector machine to predict one of the contiguous boxes containing a centroid of a convective storm in the future. Furthermore, Xingjian et al. [11] and Han et al. [12] utilize deep learning methods, which show superior performances in various practical fields, in a different context. Xingjian et al. [11] adopts a Convolutional LSTM to switch the nowcasting problem to a sequence-to-sequence problem, while Han et al. [12] utilizes a convolutional neural network to solve a problem caused by the manual construction of spatiotemporal features.
This paper proposes a novel approach using machine learning-based models to predict a convective storm’s future location. In other words, the proposed approach forecasts future centroid coordinates of the given convective storm using temporal properties from its trajectory. First, we derive distances and contained angles from vectors through centroid coordinates that lie nearby over time and have similar characteristics: they represent the given convective storms’ movement between sampling time. We selected several machine learning-based models as an autoregressive model [13] using the computed distances and contained angles. Furthermore, it is difficult to derive a sufficient number of time-varying characteristics when few convective storms in the given trajectory. Three distances and two contained angles can be derived, for instance, when the given convective storm has three observation results in the past. Therefore, this paper proposes additional novel method for dealing with insufficient time-varying characteristics (less than two observation results in the past) using machine learning-based models and other temporal features consisting of physical properties and descriptive statistics of the radar observation results between contiguous times. In summary, this paper provides four main contributions, as shown below.
  • Machine learning-based approach to predicting future centroid coordinates of the given convective storm using temporal properties derived from its trajectory
  • Flexible adjustments of sampling interval and maximum nowcasting range by increasing or decreasing the number of prediction models
  • Applicable to analyze time-varying properties of the given convective storm along the same lines of the proposed method, including size-related parameters and variance of peak intensity
  • Applicable to much meteorological analysis of which the future movement matters
This paper is organized as follows. Section 2 describes the data used in this paper. Section 3 introduces the methodology, and Section 4 analyzes and discuss the experimental results. Finally, the conclusions are presented in Section 5.

2. Data

The data used in this paper consists of 1872 three-dimensional composite radar data from 13 independent observation days from June to August 2018. The selected dates are concentrated in the summer because Korean summers are scorching and humid that are optimal conditions for forming convective storms. It is also necessary to obtain a more extensive observation range for precise analysis, mostly when the convective storms live long, although it is possible to examine trajectories of convective storms using observation data from a single radar. Therefore, as shown in Figure 1, this paper uses the composite radar data provided by Korea Meteorological Administration (KMA) combined with ten dual-polarization Doppler radars that observe the entire Korean region’s overall weather conditions. The size of composite radar data is 2049 × 2049 × 200, where the spatial resolution in x-axis, y-axis, and z-axis are 500 m, 500 m, and 50 m, respectively. Moreover, the observation interval is ten minutes.
From the three-dimensional composite radar data, 768 reference tracks consisting of 5384 convective cells, are manually extracted and verified by meteorological experts. As shown in Figure 2, many convective cells in the reference tracks survive for 70 min on average. The longest life in the observation is 340 min, and the shortest is 10 min. Considering that the machine learning model needs a sufficient amount of learning data, it is challenging to predict the given convective storm’s future locations further than a specific time due to a lack of observation cases with verification. Therefore, this paper limits prediction bounds up to 60 min at an interval of ten minutes. Table 1 indicates the number of learning data pairs for each prediction model. From those data pairs, it is possible to extract various types of features, including descriptive statistics. Moreover, this paper divides training data and test data for each model because one of the essential issues for designing a machine learning model is to split training and test data. If not, the model delivers unreliable and over-optimistic prediction results due to the overfitting problem [14]. Table 1 shows that each data pair is divided into training and test data by applying an 8:2 ratio. For instance, 2004 data pairs for the ( t + 1 ) prediction model are divided into 1603 and 401 data pairs for training and test.

3. Methods

In this section, the entire process for convective storm location prediction is elucidated. The operating principles of the proposed method follows the centroid-based method. Its process consists of three primary components as shown in Figure 3: identification, tracking, and location prediction.
Four kinds of observations obtained by dual-polarization radars are applied for the proposed convective storm prediction: corrected reflectivity (CZ), differential reflectivity (DR), cross-corrlation (RH), and vertically integrated liquid (VIL). CZ data is selected among the observations because the centroid-based prediction method [5] uses CZ data for the identification process. It groups contiguous points in the given radar data sequentially along the x-axis, y-axis, and z-axis. It is equivalent to hierarchical clustering with a single-linkage method using the three-dimensional kernel in a bottom-up fashion. The single-linkage clustering is adopted in this paper because it is better than the sequential approach in the time and computational complexities perspective.
It is crucial to match CZ’s coordinates to DR, RH, and VIL because there is a possibility not to one-to-one correspondence. In other words, the observed coordinate in CZ may not exist in DR, RH, or VIL due to observation properties. In the spatial feature extraction process, various properties are derived: two-dimensional and three-dimensional centroid coordinates, size-related features, and their descriptive statistics. Because CZ, DR, RH, and VIL have nonnegative values, entropies in image processing with base-2 logarithm are also computed as shown in Equation (1) by contemplating them as gray-scale images.
H ( X ) = x p ( x ) log 2 1 p ( x )
where p ( x ) indicates the normalized histogram counts of each identified convective storm in observation results.
Moreover, there are other newly proposed characteristics, named cluster VIL. The standard VIL in existing method is computed using Equation (2).
VIL = h B h T M d h = a h B h T Z b d h = 3.44 × 10 6 h B h T Z 4 7 d h
where Z , h B , h T indicate CZ values, top and bottom altitudes, respectively. As shown in Equation (2), VIL integrates the reflectivity on the z-axis, which means that the altitude-based information will become indistinguishable. In other words, if several convective storms exist in the overlapping region on xy-plane with different altitudes, their distinctive characteristics can be squashed. Therefore, it is beneficial to utilize another VIL property of individual convective storm. As a result, two kinds of VIL-related features are generated in this paper. The identification method is straightforward: determine as a convective storm if a given object has a larger volume and higher reflectivity than thresholds { θ ν , θ Z } = { 50 km 3 , 35 dBZ } based on a sensitivity analysis presented by [5].
After the identification process, many valuable features can be extracted. Based on those features, it is possible to derive temporal features to understand the development of changes and trends of identified storms’ characteristics. As shown in Table 2, 52-dimensional temporal features are derived, such as Euclidean distances, trends of size-related and fundamental statistical features-related changes and trends. By including distance metrics, it is unnecessary to set a specific decision boundary by the users, unlike the traditional methods mostly refer to the TITAN method that uses a Hungarian algorithm. Instead of finding all possible links of given convective storms, this paper converts the problem as a binary classification. In other words, the tracking method in this paper finds a connection between a given identified convective storm at the time ( t ) and identified convective storms at the time ( t 1 ) based on the extracted temporal features. When all connections between the identified storms at the time ( t ) and ( t 1 ) are considered, the process is moved to the time ( t 1 ) and ( t 2 ) . With the iterative manner, it is possible to track the convective storm in reverse order of time, as shown in Figure 3.
There are several successful prediction methods for the convective storm’s future location. Those methods are mostly based on the box-based method, which selects one of the adjacent boxes that will contain the future centroid coordinate of the given convective storm. This paper proposes a novel approach for convective storm location prediction from the time ( t + 1 ) to ( t + 6 ) by utilizing temporal properties in two-dimensional space. Finding the centroid coordinate of time ( t + 1 ) at time ( t ) needs to apply trigonometrical functions and L 2 -norm. As shown in Figure 4, for instance, assuming that the goal is to find E coordinate using A , B , C , and D, using Equations (3)–(6) can be derived coordinates for each axis.
x E = x D + D E 2 · cos tan 1 y D y C x D x C θ ( E , D , C )
y E = y D + D E 2 · sin tan 1 y D y C x D x C θ ( E , D , C )
θ ( E , D , C ) = θ ^ ( t + 1 ) = f Angle θ ( C , B , A ) , θ ( D , B , C )
D E 2 = d ^ ( t + 1 ) = f Distance C D 2 , B C 2 , A B 2
where θ indicates the contained angle, · 2 implies L 2 -norm, f Angle ( · ) , θ ^ ( t + 1 ) , f Distance ( · ) and d ^ ( t + 1 ) mean the prediction models and their results of the contained angle and distance at time ( t + 1 ) , respectively. Repeatedly applying the same principle to time, it is possible to extend the model’s prediction range. In this paper, the maximum limit of prediction range is ( t + 6 ) , considering the given ground-truth dataset’s condition.
As shown in Equations (5) and (6), the three previous centroid coordinates can provide two contained angles and three L 2 -norms to angle and distance prediction models. It is a prerequisite of the proposed location prediction method: it must have a sufficient number of tracked centroid coordinates, more than three previous coordinates, for deriving temporal properties. However, at the beginning of the convective storm’s development, it is impossible to provide enough coordinates to derive temporal properties because its track has a short length.
This paper resolves the situation by dividing the trajectories into three occurrence cases, as shown in Figure 5: “Case 01” when the track has ( t ) and ( t 1 ) coordinates, “Case 02” when the track has ( t ) , ( t 1 ) , and ( t 2 ) coordinates, and “Case 03” when the track has ( t ) , ( t 1 ) , ( t 2 ) , ( t 3 ) , and more coordinates. The track, which consists of only a coordinate at ( t ) , leaves out of consideration because it can be a noise signal and has insufficient properties to derive its movements through time. Considering that “Case 01” and “Case 02” have not enough previous centroid coordinates, they predict a distance between the current location and future location of given convective storms using 52-dimensional temporal features and machine learning-based models as shown in Table 2. Also, they adopt previous advancing angles by following the TITAN method’s first assumption: A storm tends to move along a straight line.
When the number of coordinates satisfies a specific condition regardless of observed and predicted, angles and distances are derived using nonlinear autoregressive models. In other words, the nonlinear autoregressive model forecasts the third future coordinate ( x ^ t + 3 , y ^ t + 3 ) in “Case 01”, the second future coordinate ( x ^ t + 2 , y ^ t + 2 ) in “Case 02”, and the first future coordinate ( x ^ t + 1 , y ^ t + 1 ) in “Case 03”, as shown in Figure 5.

4. Results and Discussion

This paper selected four representative machine learning methods: artificial neural networks (ANN) [15], linear regression model (LM) [16], random forests (RF) [17], and support vector regression (SVR) [18]. Those methods are well-known machine learning-based models and prove their capabilities to solve classification and regression problems in the real world. Moreover, this paper implemented the linear regression model with double exponential smoothing [19], which is the nowcasting method of TITAN, for comparison with maximum number of time points n t is 6 and weight parameter α is 0.5 . It can be a good criterion for evaluating the proposed machine learning-based methods because TITAN is a standard model for convective storm prediction. Considering that the proposed machine learning-based methods’ goal is to predict the future location of the convective storm, the nowcasting method of TITAN forecasts only the centroid coordinates by using Equation (7).
p t = p 0 + d p d t δ t
where p t and p 0 indicate the predicted and the current value, and d p / d t is the estimated rate of change.
The nowcasting method of TITAN can also predict storm volume and the parameters of the projected-area ellipse. The predicted results are combined and evaluated like dealing with binary classification results. Namely, the prediction result is right when the forecasted storm position and actual radar echoes at the forecast time exist in a specific grid area. On the other hand, the prediction result indicates wrong when either the forecasted (failure case) or the actual echoes (false alarm case) at the forecast time does not exist in a specific grid area. It is not easy to apply the same evaluation process to the proposed methods because they predict only the centroid coordinates, making no way to derive failure and false alarm cases which allow deriving evaluation metrics such as the probability of detection (POD), critical success index (CSI), and false alarm ratio (FAR). Therefore, the root mean squared error (RMSE) is selected for performance verification in this paper, as shown in Equation (8).
RMSE = 1 n i = 1 n x i ^ x i 2 + y i ^ y i 2
Table 3 describes the performance of five models. As shown in Table 3, The hyperparameters of each machine learning-based model were empirically set to produce better results from the simple structure: ANN with a single hidden layer contained ten neurons with hyperbolic tangent sigmoid activation function, and an output layer contained a single neuron with linear activation function; RF with 25 subtrees with ten maximum splits; and SVR with radial basis function kernel. And the nowcasting method of TITAN has only four RMSE-based performances from ( t + 3 ) to ( t + 6 ) because it needs five historical data for predictions as mentioned above. On average, ANN shows better than others in the contained angle prediction, and RF is better than others in the distance prediction. Furthermore, almost all machine learning-based models proposed in this paper have better performance than the nowcasting method of TITAN. Considering that the angle and distance models are mutually independent, it is unnecessary to utilize homogeneous models for prediction. Therefore, this paper also conducts experiments using both ANN and RF for angle and distance prediction, respectively.
This paper selected two representative trajectory examples in the test data to visually compare and analyze the experimental results: the convective storm in the first example, as shown in Figure 6, moves linearly for 90 min along the coastline in the southern region of Korea; the convective storm in the second example, as shown in Figure 7, shows the sudden movement of the centroid coordinates along the inland area in the capital region of Korea. The different shapes of the trajectory and the different geographical characteristics can help analyze the accuracy and the efficiency of the proposed methods. Moreover, the experimental results, as shown in Figure 8, comparing the nowcasting method of TITAN as the standard model can demonstrate the proposed machine learning-based methods’ superiority.
Figure 6a illustrates the experimental results at “Case 01” when (t) and ( t 1 ) are given. As shown on the left side of Figure 6a, the objective is to derive future locations of (2), which are (3) to (8), using information derived from (1) and (2). Due to a lack of temporal movement information, all models draw deviated results from the reference track coordinates, as shown on the right side of the Figure 6a. Figure 6b indicates the experimental results at “Case 02” when (t), ( t 1 ) , and ( t 2 ) are given. As shown on the left side of Figure 6b, the objective is to derive future locations of (3), which are (4) to (9), using information derived from (1) to (3). In this case, the RF-based method derives better results, as shown on the right side of Figure 6b, because the predicted locations exist near the reference track coordinates. The lowest RMSE values of the RF-based method provide numerical evidence for the results in Figure 6b. Other methods, which have greater RMSE values than the RF-based method, draw somewhat deviated (ANN, ANN+RF, and LM) or shrunk results (SVR). On the other hand, Figure 6c describes the successful experimental results at “Case 03” when (t), ( t 1 ) , ( t 2 ) , and ( t 3 ) are given. As shown on the left side of Figure 6c, the objective is to derive future locations of (4), which are (5) to (10), using information derived from (1) to (4). In this case, the combined method of ANN and RF derives better results, as shown on the right side of Figure 6c, because the predicted locations exist near the reference track coordinates. Although the RF-based method and ANN-based method seem to show similar performances, the trajectory’s detailed results prove the combined method slightly better than the RF-based or ANN-based method alone. Moreover, the RMSE values of the combined method substantiate the results, as shown in Figure 6c. Other methods draw somewhat deviated and shrunk trajectory results (LM and SVR). In summary, the RF-based method is useful when there are insufficient temporal properties, whereas the combined method of ANN and RF derives optimistic predictions when sufficient temporal data is guaranteed.
Likewise, Figure 7a describes the experimental results at “Case 01” when (t) and ( t 1 ) are given. As shown on the left side of Figure 7a, the objective is to derive future locations of (2), which are (3) to (8), using information derived from (1) and (2). Although all models draw nowcasting results close to the reference track coordinates from (3) to (5), they keep moving away from the reference after (6) due to insufficient temporal movement information, as shown on the right side of the Figure 7a. Figure 7b indicates the experimental results at “Case 02” when (t), ( t 1 ) , and ( t 2 ) are given. As shown on the left side of Figure 7b, the objective is to derive future locations of (3), which are (4) to (9), using information derived from (1) to (3). In this case, all models draw more deviated results from the reference track coordinates, as shown on the right side of Figure 7b. On the other hand, Figure 7c represents the optimistic experimental results at “Case 03” when (t), ( t 1 ) , ( t 2 ) , and ( t 3 ) are given. As shown on the left side of Figure 7c, the objective is to derive future locations of (4), which are (5) to (10), using information derived from (1) to (4). In this case, the combined method of ANN and RF derives better results, as shown on the right side of Figure 7c, because the predicted locations exist near the reference track coordinates. The lowest RMSE values of the combined method of ANN and RF, as shown in Figure 7c, also verify the results. Other methods draw somewhat deviated results (ANN, RF, LM, and SVR). In summary, it is difficult to forecast when the centroid coordinates show sudden movements and there are insufficient temporal properties, whereas the combined method of ANN and RF derives promising results when sufficient temporal data is guaranteed.
Figure 8 illustrates the experimental results for comparison between the nowcasting method of TITAN and the proposed methods. As mentioned above, each model employs information derived from (1) to (6) because the hyperparameter ( n t ) for the nowcasting method of TITAN is set to 6. Naturally, the common objective is to derive future location of (6), which are (7) to (10). As shown in Figure 8a, the nowcasting method of TITAN shows the worst result due to overpredict the distance, although it shows a similar linear movement. Furthermore, as shown in Figure 8b, the nowcasting method of TITAN not only deviates from the reference track coordinates but draws shrunk trajectory results due to the centroid’s abrupt direction change. Furthermore, the highest RMSE values of the nowcasting method of TITAN corroborate the results that the proposed machine learning-based prediction models are better, as shown in Figure 8a,b.
From the experimental results in Table 3, Figure 6, Figure 7 and Figure 8, the proposed machine learning-based method proves the following advantages than the nowcasting method of TITAN: First, it can relieve restrictions on the maximum number of time points. Second, it can learn how to deal with nonlinear and abrupt movements from data. Third, it can predict the given convective storms’ future locations more accurately under the same condition. Fourth, it can derive prediction results efficiently when the model has completed its learning process, whereas the nowcasting method of TITAN needs to compute the weighted linear regression every time.

5. Conclusions

Convective storms are hazardous meteorological events that are accompanied by torrential rain and strong winds. They influence various fields and have been considered primary goals in meteorological fields to make a nowcasting model. This paper proposes a novel method using machine learning-based models to predict a convective storm’s future location. In other words, the proposed approach forecasts future centroid coordinates of the given convective storm using temporal properties from its trajectory. The experimental results showed that the machine learning-based prediction models could forecast future locations of convective storms with superior performance to the nowcasting method of TITAN. As future work, we will consider exogenous variables as inputs, including satellite images, thermodynamic-related variables, numerical weather prediction results, wind, and buoyancy. The exogenous variables may derive promising results, such as improving prediction performance and dealing with more complicated trajectories.
Moreover, this paper proved that the machine learning-based model in a nonlinear autoregressive fashion utilizing only the dual-polarization radar data could derive the given convective storm’s future locations. However, there is a strong underlying assumption in the reference tracks: the connections between identified convective storms in contiguous time have a one-to-one correspondence. It is critical to deal with the mergers and splits of the convective storm in practical fields. Unfortunately, the splitting and merging cases of convective storms were insufficient, and most of them did not drastically influence the changes of the centroid coordinates’ positions. Considering that the machine learning-based methods cannot achieve expected performances if the learning data is insufficient and indistinguishable, we trained the proposed models under the mentioned assumption. As future work, we will apply classification methods based on machine learning for dealing with the splitting and merging cases of convective storms: classifier will determine the given convective storm will split, merge, or keep as it goes. We expect that the size-related input variables and their temporal trends significantly influence the classifier. With collecting a sufficient number of merging and splitting cases with meteorological experts’ verification, we expect that it is possible to derive promising results to handle the split-merge condition. Furthermore, it is possible to apply the proposed machine learning-based nonlinear autoregressive model to predict essential information, such as peak intensity of each convective cell and the trend of size changes through time.

Author Contributions

All authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (number 2018R1A2B6005123).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. The data was obtained from Korea Meteorological Administration.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wilson, J.W.; Crook, N.A.; Mueller, C.K.; Sun, J.; Dixon, M. Nowcasting thunderstorms: A status report. Bull. Am. Meteorol. Soc. 1998, 79, 2079–2100. [Google Scholar] [CrossRef]
  2. Sun, J.; Xue, M.; Wilson, J.W.; Zawadzki, I.; Ballard, S.P.; Onvlee-Hooimeyer, J.; Joe, P.; Barker, D.M.; Li, P.W.; Golding, B.; et al. Use of NWP for nowcasting convective precipitation: Recent progress and challenges. Bull. Am. Meteorol. Soc. 2014, 95, 409–426. [Google Scholar] [CrossRef] [Green Version]
  3. Rinehart, R.; Garvey, E. Three-dimensional storm motion detection by conventional weather radar. Nature 1978, 273, 287–289. [Google Scholar] [CrossRef]
  4. Tuttle, J.D.; Foote, G.B. Determination of the boundary layer airflow from a single Doppler radar. J. Atmos. Ocean. Technol. 1990, 7, 218–232. [Google Scholar] [CrossRef] [Green Version]
  5. Dixon, M.; Wiener, G. TITAN: Thunderstorm identification, tracking, analysis, and nowcasting—A radar-based methodology. J. Atmos. Ocean. Technol. 1993, 10, 785–797. [Google Scholar] [CrossRef]
  6. Johnson, J.; MacKeen, P.L.; Witt, A.; Mitchell, E.D.W.; Stumpf, G.J.; Eilts, M.D.; Thomas, K.W. The storm cell identification and tracking algorithm: An enhanced WSR-88D algorithm. Weather. Forecast. 1998, 13, 263–276. [Google Scholar] [CrossRef] [Green Version]
  7. Handwerker, J. Cell tracking with TRACE3D—A new algorithm. Atmos. Res. 2002, 61, 15–34. [Google Scholar] [CrossRef]
  8. Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  9. Rossi, P.J.; Chandrasekar, V.; Hasu, V.; Moisseev, D. Kalman filtering–based probabilistic nowcasting of object-oriented tracked convective storms. J. Atmos. Ocean. Technol. 2015, 32, 461–477. [Google Scholar] [CrossRef]
  10. Han, L.; Sun, J.; Zhang, W.; Xiu, Y.; Feng, H.; Lin, Y. A machine learning nowcasting method based on real-time reanalysis data. J. Geophys. Res. Atmos. 2017, 122, 4038–4051. [Google Scholar] [CrossRef]
  11. Xingjian, S.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28, 802–810. [Google Scholar]
  12. Han, L.; Sun, J.; Zhang, W. Convolutional Neural Network for Convective Storm Nowcasting Using 3-D Doppler Weather Radar Data. IEEE Trans. Geosci. Remote. Sens. 2019, 58, 1487–1495. [Google Scholar] [CrossRef]
  13. Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  14. Hawkins, D.M. The problem of overfitting. J. Chem. Inf. Comput. Sci. 2004, 44, 1–12. [Google Scholar] [CrossRef] [PubMed]
  15. Bishop, C.M. Neural Networks for Pattern Recognition; Oxford University Press: New York, NY, USA, 1995. [Google Scholar]
  16. Seber, G.A.; Lee, A.J. Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2012; Volume 329. [Google Scholar]
  17. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  18. Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef] [Green Version]
  19. Abraham, B.; Ledolter, J. Statistical Methods for Forecasting; John Wiley & Sons: Hoboken, NJ, USA, 2009; Volume 234. [Google Scholar]
Figure 1. Ten dual-polarization radar locations and their observation ranges in Korea.
Figure 1. Ten dual-polarization radar locations and their observation ranges in Korea.
Atmosphere 12 00343 g001
Figure 2. Histogram of the reference track length.
Figure 2. Histogram of the reference track length.
Atmosphere 12 00343 g002
Figure 3. Overview of proposed convective storm nowcasting method.
Figure 3. Overview of proposed convective storm nowcasting method.
Atmosphere 12 00343 g003
Figure 4. An example of deriving a future location.
Figure 4. An example of deriving a future location.
Atmosphere 12 00343 g004
Figure 5. The proposed convective storm future location prediction method.
Figure 5. The proposed convective storm future location prediction method.
Atmosphere 12 00343 g005
Figure 6. Observed trajectories and prediction results: (a) Case 01, (b) Case 02, (c) Case 03.
Figure 6. Observed trajectories and prediction results: (a) Case 01, (b) Case 02, (c) Case 03.
Atmosphere 12 00343 g006
Figure 7. Observed trajectories and prediction results: (a) Case 01, (b) Case 02, (c) Case 03.
Figure 7. Observed trajectories and prediction results: (a) Case 01, (b) Case 02, (c) Case 03.
Atmosphere 12 00343 g007
Figure 8. Observed trajectories and prediction results for comparing proposed methods and TITAN.
Figure 8. Observed trajectories and prediction results for comparing proposed methods and TITAN.
Atmosphere 12 00343 g008
Table 1. The number of learning data pairs for each model (Training:Test = 8:2).
Table 1. The number of learning data pairs for each model (Training:Test = 8:2).
Model ( t + 1 ) ( t + 2 ) ( t + 3 ) ( t + 4 ) ( t + 5 ) ( t + 6 )
Number of data pairs20041458999685486344
Training data pairs16031166799548389275
Test data pairs4012922001379769
Table 2. The extracted temporal features from consecutive convective storms between (t) and ( t 1 ) .
Table 2. The extracted temporal features from consecutive convective storms between (t) and ( t 1 ) .
NumberFeaturesDescriptions
1dist_3dEuclidean distance (3D)
2dist_w3dWeighted Euclidean distance (3D)
3dist_m3d3D Euclidean distance (maximum reflectivity)
4dist_ed2D Euclidean distance (ellipse)
5d_volumeVolume difference
6d_areaArea difference (ellipse)
7d_eccEccentricity difference (ellipse with k=1)
8d_mean_CZMean CZ difference
9d_max_CZMaximum CZ difference
10d_min_CZMinimum CZ difference
11d_median_CZMedian CZ difference
12d_iqr_CZIQR CZ difference (Inter-Quartile Range)
13d_mode_CZMode CZ difference
14d_entropy_CZEntropy CZ difference
15d_std_CZStandard deviation CZ difference
16d_cv_CZCV CZ difference (Coefficient of Variation)
17d_mean_DRMean DR difference
18d_max_DRMaximum DR difference
19d_min_DRMinimum DR difference
20d_median_DRMedian DR difference
21d_iqr_DRIQR DR difference (Inter-Quartile Range)
22d_mode_DRMode DR difference
23d_entropy_DREntropy DR difference
24d_std_DRStandard deviation DR difference
25d_cv_DRCV DR difference (Coefficient of Variation)
26d_mean_RHMean RH difference
27d_max_RHMaximum RH difference
28d_min_RHMinimum RH difference
29d_median_RHMedian RH difference
30d_iqr_RHIQR RH difference (Inter-Quartile Range)
31d_mode_RHMode RH difference
32d_entropy_RHEntropy RH difference
33d_std_RHStandard deviation RH difference
34d_cv_RHCV RH difference (Coefficient of Variation)
35d_mean_VILMean VIL difference
36d_max_VILMaximum VIL difference
37d_min_VILMinimum VIL difference
38d_median_VILMedian VIL difference
39d_iqr_VILIQR VIL difference (Inter-Quartile Range)
40d_mode_VILMode VIL difference
41d_entropy_VILEntropy VIL difference
42d_std_VILStandard deviation VIL difference
43d_cv_VILCV VIL difference (Coefficient of Variation)
44d_mean_clus_VILMean cluster VIL difference
45d_max_clus_VILMaximum cluster VIL difference
46d_min_clus_VILMinimum cluster VIL difference
47d_median_clus_VILMedian cluster VIL difference
48d_iqr_clus_VILIQR cluster VIL difference (Inter-Quartile Range)
49d_mode_clus_VILMode cluster VIL difference
50d_entropy_clus_VILEntropy cluster VIL difference
51d_std_clus_VILStandard deviation cluster VIL difference
52max_cv_VILCV cluster VIL difference (Coefficient of Variation)
Table 3. Performance evaluations using root mean squared error (RMSE) for location prediction results.
Table 3. Performance evaluations using root mean squared error (RMSE) for location prediction results.
Case 01Case 02Case 03
ModelHyper-
Parameters
Predicted
Time
DistanceDistanceAngleDistance
ANN1 hidden layer
10 neurons
( t + 1 ) 0.6784-1.70120.3818
( t + 2 ) -0.18242.99220.7689
( t + 3 ) --1.90050.3814
( t + 4 ) --0.52283.9906
( t + 5 ) --1.44200.0269
( t + 6 ) --3.62291.7449
Avg.--2.03031.2158
Std.--1.11401.4822
LM- ( t + 1 ) 2.0859-6.07771.3242
( t + 2 ) -0.47302.63890.0714
( t + 3 ) --3.68440.6866
( t + 4 ) --0.87490.6694
( t + 5 ) --6.98760.2317
( t + 6 ) --10.44921.0764
Avg.--5.11880.6766
Std.--3.43640.4782
RF25 subtrees
10 splits each
( t + 1 ) 0.6121-4.84060.3120
( t + 2 ) -0.09130.13980.2722
( t + 3 ) --5.68960.3742
( t + 4 ) --0.57620.2214
( t + 5 ) --5.35830.5860
( t + 6 ) --9.17020.7567
Avg.--4.29580.4204
Std.--3.41280.2078
SVRRBF kernel ( t + 1 ) 2.5191-5.59310.3209
( t + 2 ) -0.18241.95010.1533
( t + 3 ) --3.78061.0203
( t + 4 ) --3.31211.1352
( t + 5 ) --5.38551.5833
( t + 6 ) --10.18402.0986
Avg.--5.03421.0519
Std.--2.86370.7386
TITAN- ( t + 1 ) ----
( t + 2 ) ----
( t + 3 ) ---1.7137
( t + 4 ) ---2.2190
( t + 5 ) ---3.3513
( t + 6 ) ---3.9340
Avg.---2.8045
Std.---1.0178
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Lee, H.; Kim, J.; Kim, E.K.; Kim, S. A Novel Convective Storm Location Prediction Model Based on Machine Learning Methods. Atmosphere 2021, 12, 343. https://doi.org/10.3390/atmos12030343

AMA Style

Lee H, Kim J, Kim EK, Kim S. A Novel Convective Storm Location Prediction Model Based on Machine Learning Methods. Atmosphere. 2021; 12(3):343. https://doi.org/10.3390/atmos12030343

Chicago/Turabian Style

Lee, Hansoo, Jonggeun Kim, Eun Kyeong Kim, and Sungshin Kim. 2021. "A Novel Convective Storm Location Prediction Model Based on Machine Learning Methods" Atmosphere 12, no. 3: 343. https://doi.org/10.3390/atmos12030343

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop