Survey on the Application of Deep Learning in Extreme Weather Prediction

: Because of the uncertainty of weather and the complexity of atmospheric movement, extreme weather has always been an important and difﬁcult meteorological problem. Extreme weather events can be called high-impact weather, the ‘extreme’ here means that the probability of occurrence is very small. Deep learning can automatically learn and train from a large number of sample data to obtain excellent feature expression, which effectively improves the performance of various machine learning tasks and is widely used in computer vision, natural language processing, and other ﬁelds. Based on the introduction of deep learning, this article makes a preliminary summary of the existing extreme weather prediction methods. These include the ability to use recurrent neural networks to predict weather phenomena and convolutional neural networks to predict the weather. They can automatically extract image features of extreme weather phenomena and predict the possibility of extreme weather somewhere by using a deep learning framework.


Introduction
One minute the sky is clear, and the next minute there will be dark clouds and strong winds; pedestrians have difficulty standing, trees are cut off, and simple singlestory buildings are even reduced to rubble. Between 2000 and 2020, the cumulative number of disasters worldwide reached 13,345, and more than 1.5 million people died [1]. When extreme weather occurs, there will be short-term heavy rainfall, thunderstorms, gales, tornadoes, hail, etc. These are all catastrophic weather phenomena. They have the characteristics of suddenness and locality. Although the duration is not long, they are very destructive. At present, there is no effective method for artificial weakening and prevention. Therefore, prevention should be given priority and a combination of prevention and rescue should be achieved. It is very important to strengthen the theoretical research on the severe convective weather system and improve the forecasting level of severe convective weather. With the development of science and technology, radar and geosynchronous satellite are gradually applied to the meteorological field to detect and monitor weather changes. Various weather stations use a weather radar to forecast the occurrence of severe convective weather and monitor its activities. Through the analysis of satellite cloud images taken continuously by meteorological satellites, we can explore and track the occurrence, development, movement, and extinction process of severe convective weather; combined with the analysis of weather situation maps by experienced weather forecasters, the forecast of severe convective weather can be improved level.
As the most important branch of machine learning, deep learning has developed rapidly in recent years and has been applied to various fields. It has attracted widespread This article will be able to summarize the previous research, combined with deep learning to make a relatively complete summary of the prediction of extreme weather. This paper is organized as follows: Section 1 outlines the main real learning knowledge in extreme weather prediction; Section 2 focuses on the related models and theories of deep learning in artificial intelligence and the traditional methods of weather prediction; Section 3 focuses on the depth of artificial intelligence Learn the existing applications of extreme weather prediction, and finally, Section 4 summarizes the extreme weather prediction method based on deep learning.

Traditional Methods
In this section, we will focus on the existing traditional method theories or conclusions of weather prediction related to deep learning methods. There are mainly weather map forecasting methods, statistical forecasting methods, numerical forecasting methods, and single-station fore-casting methods. The following will list the current weather forecasting methods commonly used or related weather forecasting knowledge.

Weather Map Forecast Method
Since the appearance of the telegraph, weather map forecasting methods [6] have existed for more than 100 years. Data can be collected in time to the National Meteorological Center to analyze weather maps. Looking at the weather chart, the low-pressure system is moving. We have learned from the weather map to analyze the weather system and predict its future movement and intensity changes (including generation and elimination), which is the main basis for the weather map forecasting method. We analyze weather maps or other auxiliary maps to timely discover the weather system that causes changes in the weather around us. The correctness of weather map analysis is the premise of the weather map forecasting method. The first step of the weather map forecasting method is to predict the weather, that is, predict the existing weather system and its future trends on the weather map. These include changes in the intensity of the new weather system. In weather forecasting, the simplest method is extrapolation, which assumes the movement and some of the weather system in the future. This method is also called a continuous method. Secondly, in the practice of natural gas prediction, weather forecasters summarize the empirical laws of weather system movement or intensity changes. Rules of thumb also play an important role in weather forecasting. In addition, from the theory of dynamic meteorology, the rules of weather conditions can be derived, and weather forecasters can predict future weather conditions based on these data.
Lou Maoyuan et al. [7] introduced in detail a design idea and application method that can be used to predict cloud and snow weather maps. Based on many years of forecasting practice, they selected the relative humidity, GFS and EC numerical forecast data (usually used for forecasting operations) related to GRAPES cloud forecasting and filtered out the wet layer reflecting the high cloud, thus forming the intuitive profile view. The intuitive profile view can be used to reduce the lack of total cloud volume. Select the temperature element and superimpose it on the graph to form a comprehensive curve of temperature and humidity changes with height. Combined with the predicted points, the "air column method" is used to simulate precipitation, and intuitively judge whether the snowflakes melt during autumn. It can overcome the shortcomings caused by the current station's use of atmospheric temperature indicators to predict snowfall.

Weather Forecast Based on Weather Radar
Radar can emit short-wavelength radio waves from the sky-white line. When it encounters weather phenomena in the distant atmosphere such as typhoons, thunderstorms, and rainstorms, its radio waves will be reflected and displayed on the radar screen. Therefore, we can see the overall appearance and internal structure of typhoons, thunderstorms, and rainstorms on the screen. On the weather radar chart, the bright area represents the area monitored by the radar, and the area where the colored blocks appear is the active site of the rainfall system. Generally speaking, the area corresponding to the blue echo means that the area is covered by precipitation clouds, but the rain has not yet appeared; the area covered by the green echo represents light rain in the area; the area covered by the yellow to red echo has moderate to heavy rain; The purple echo area has the highest precipitation intensity. The area is experiencing heavy rain or even heavy rain, and it may be accompanied by severe weather such as thunder, gale, or even hail. Sokol Z [8] summarized the role of weather radar in rainfall estimation and its application in meteorological and hydrological simulation.
Hu Lijun et al. [9] designed a weather display system based on radar data, which is characterized in that the radar data collector is connected with cloud detection radar, rain detection radar, and wind detection radar. Li S et al. [10] used FY-4A satellites and Doppler weather radar to observe the precipitation characteristics of sudden heavy rain events in the complex terrain of Southwest China. Xie B et al. [11] use satellite radar images for machine learning to estimate losses after natural disasters. They will use satellite radar images and geographic data as input to classify the damage status of individual buildings after a major disaster event. They believe that the application of their damage estimation method in real-world natural disaster events will have great potential to improve social resilience.
In terms of weather radar detection theory and methods, Ge Runsheng and others studied the calibration and measurement methods of various weather radar parameters, and theoretically analyzed the measurement error and accuracy of echo intensity. Researched the radar signal processing method, analyzed the x-band and c-band weather radar's ability to detect rainfall, theoretically analyzed the error and accuracy of the weather radar measurement, and studied the radar signal processing method [12,13].
For extreme weather forecasts, quality control and preprocessing of radar data can also be performed. Bi Yongheng [14] tried to find a method suitable for attenuation correction of x-band dual-polarization radar during precipitation. They hope to carry out quality control and preprocessing of radar data before calibration and on the basis of analyzing existing calibration methods at home and abroad. They choose and improve the adaptive constraint algorithm as the radar reflectivity attenuation correction method. Goudenhoofdt E et al. [15] estimated the regional frequency of extreme rainfall in Belgium based on radar, and they assessed the potential of single weather radar for 12-year quantitative precipitation estimation. During the period 2005-2016, they compared radar estimates of extreme rainfall within 1 h and 24 h.

Numerical Prediction Methods
Weather forecasting is always based on the principles of meteorology. With the development of computing technology and detection technology, in addition to the traditional weather map method and mathematical statistics method for forecasting, meteorological radar and satellite detection data are also used in forecasting operations, and numerical forecasting methods have been developed. This method can predict the physical process of the atmosphere by determining the principles of conservation of atmospheric mass, energy, and momentum, and significantly improve the quality of weather forecasts, thereby promoting the objective quantification of weather forecasts [16]. As a discrete calculation model of the earth's atmosphere, the numerical weather prediction model is a typical nonlinear system with a large amount of calculation. In order to complete all calculations in a shorter time than the actual weather evolution, high-speed computers have become a decisive key technology.
Du Jun et al. [17] believe that the definition of abnormal weather should naturally be based on the local and then climate background ( Figure 2). According to this view, the researchers defined the anomalous degree of a weather element as the difference between it and the actual climate average. In addition, considering that the rate of change of weather elements in different places and different seasons is very different (for example, in general, the rate of change is larger in high latitudes and smaller in low latitudes, and larger in winter and larger in summer). In order to facilitate unified comparison, the difference is standardized by the actual climatic standard deviation of the quantity, namely formulas (1) and (2) and called "standardized anomaly" or SA (Standardized Anomaly) [17]: The normalized anomaly degree SA is a function of location x and time t For the actual situation o, the standardized anomaly degree A s_o (x, t) is calculated with the observed value OBS(x, t) of the element and the climatic mean MEAN_clim(x, t) and the climatic standard deviation SD_clim(x, t) of the actual atmosphere, that is, formula (1); for the forecast f, the standardized anomaly degree A s_f (x, t) is calculated with the forecast value FCST(x, t) instead of the observed value OBS(x, t), that is, formula (2).

Figure 2.
A schematic diagram of Standardized Anomaly (SA) approach for comparing a forecast or an observed parameter (thick black line) with its climatological mean (thin black line) and standard deviation (dash lines) to measure this parameter's abnormality, i.e., a departure from the mean by how many standard deviations [17].
The horizontal resolution and vertical resolution of the global model and the regional model have been significantly improved. At the same time, the physical process of the model has also been improved. The numerical prediction business model has entered the stage of massively parallel computing. Major developed countries and China are committed to developing their own new service numerical forecast models: non-static (multi-scale) integrated models or non-static mesoscale models. A new generation of comprehensive numerical models of weather has begun to operate in some countries. Operational numerical forecasting models are developing in the direction of continuous improvement. With the improvement of model resolution, increasing amounts of attention have been paid to cloud physical processes, surface processes and turbulence processes, parameterization schemes of radiation processes considering aspect factors, and the selection of model vertical coordinates.
Heng BCP et al. [18] invented a data assimilation system for convective-scale numerical weather prediction in Singapore. The background error covariance obtained by them is in a false vertical structure containing higher-level models, which may reduce prediction performance. They found that precipitation forecasts are sensitive to horizontal resolution and lateral boundary conditions. Their observation system tests show that although satellite radiation assimilation has a significant help to the precipitation forecast in the region, it hurts the background temperature and high altitude wind. Compared with the isolated forecast model, the research system is more valuable for the precipitation forecast near the forecast range, but the lead time is longer.

Application of BP(Back Propagation) Neural Network in Extreme Weather
BP algorithm also played a role in extreme weather forecasting. The topological structure of the BP neural network model includes the input layer, hidden layer, and output layer [19]. The neural network is mainly composed of three parts, namely: the parameter learning algorithm of network architecture activation function to find the optimal weight. BP algorithm is one of the more widely used parameter learning algorithms [20,21]. BP Neural Network is the concept of error back propagation algorithm proposed by scientists led by Rumelhart and McClelland in 1986. It is a multi-layer feedforward neural network based on training. Since we cannot directly obtain the weight of the hidden layer, can we first adjust the weight of the hidden layer indirectly by obtaining the output result of the output layer and the error of the expected output, the BP algorithm is designed in this way. The basic idea is that the learning process consists of two processes: signal forward propagation (looking for loss) and error back propagation (error back propagation).
According to the basic idea of BP algorithm, the general process of BP algorithm can be obtained: Forward FP (calculate loss). In this process, we calculate the loss value and actual value between the final output value and the output value according to the input sample and the given initial weight value W and offset the process of back propagation of the item value b. If the loss value is not a given Within the range; otherwise, stop W and B update.
The three-layer error back propagation algorithm model is the most commonly used model in weather forecasting. It is a BP neural network structure diagram composed of N nodes in the input layer, M nodes in the hidden layer, and P nodes in the output layer-see Figure 3. Literature [22] proposed a BP neural network prediction model based on a simulated particle swarm algorithm. The experimental results show that the optimization algorithm has a good pre-calibration effect for network parameter optimization. Literature [23] proposed wavelet analysis and BP neural network wind speed prediction model. The results show that the prediction results of this model are more in line with the actual wind speed change trend and have better generalization ability. Literature [24] proposed an optimized BP neural network prediction model based on error correction. The results show that considering the error factor can improve the accuracy of wind speed forecasting. Wen [25] proposed a neural network wind speed prediction model, which considered meteorological factors, but not only meteorological factors. This method can effectively improve the accuracy of wind speed prediction. The wind speed prediction measurement model is a complex system affected by many factors. This method can only describe parts or fragments of information, and it is difficult to comprehensively and accurately understand its change rules and predict.
At present, weather prediction mainly uses basic physical processes to form mathematical models for prediction research [26]. These traditional linear models are difficult to predict with better prediction accuracy for nonlinear time-series predictions. In recent years, artificial neural networks have been widely used in the field of weather forecasting and have achieved good forecasting results. When predicting problems, combining the neural network and the least square method can make up for the shortcomings of the least square method, and make full use of the advantages of both to accurately predict the average minimum temperature in the future. At the same time, the effective combination of neural network prediction and the least square method can solve the problem of the nonlinear system well. Niu Zhijuan et al. [27] used the basic principles of the BP neural network and the least square method to establish a network model. This model is used to predict extreme temperatures. They convert the nonlinearity into a linear optimization method and calculate the network weights by using the least square method to avoid local minimums. The value problem lies in the training process of the traditional BP network. The experimental results show that the least-squares-optimized BP neural network has a higher prediction accuracy than a single BP neural network when predicting the monthly average minimum temperature, and it reduces the complexity of the prediction method. The fitting and prediction accuracy of the model in this paper is better than the error and accuracy of the single BP neural network, which can better meet the needs of actual prediction. The results show that the BP neural network optimized by the least-squares method has better generalization ability, the prediction of the average minimum temperature is more stable, and the prediction accuracy is higher.
K Dou et al. [28] proposed a long-term weather forecast based on GA-BP neural network. This method uses genetic algorithms to optimize the weights and thresholds of the BP neural network and improves the drawbacks of the BP neural network. For example, it is not sensitive to weights and thresholds. Li B et al. [29] proposed multi-dimensional research on agro-meteorological disasters based on the gray BP neural network. Based on the gray BP neural network, they used the gray correlation analysis method to analyze the correlation between the main agro-meteorological disaster factors and the output of food crops in Henan Province, China.

Application of Recurrent Neural Network in Extreme Weather
The recurrent neural network is a kind of directed loop formed by connections between units. The artificial neural network is a natural extension of a feedforward neural network in the sequence data processing. The difference is that its input not only includes the input examples currently seen. It also includes the information perceived by the network at the last minute [30]. Using this attribute, information can circulate in the network for any length of time.
The recurrent neural network is a type of pattern used to recognize text, sequence data, such as speech, or digital input time-series data generated by sensors or stock markets. The recursive neural network fully considers the characteristics of the motion state of the time series and improves the self-learning adaptive ability through network learning, so that it can better process sequential data. Recurrent neural networks are suitable for sequence modeling. Accurate forecasting of storm surge can greatly reduce casualties and economic losses and has important practical value. Leisen et al. [31] proposed a storm surge model water increase prediction method based on a regression neural network. In this paper, the storm surge timing data are processed specifically, and a recurrent neural network with a reasonable structure is designed to complete the preprocessing measurement of timing data. Compared with the traditional BP neural network, the recurrent neural network can better deal with the prediction of time series data. This method is applied to the prediction of water increase in Weifang Water Station, and the results show that, compared with BP neural network, the recurrent neural network can obtain better prediction results with smaller errors.
Zhang Shuai et al. [32] studied three models, namely, feedforward neural network model (fuzzy neural network), wavelet neural network model (algorithm), and integrated autoregressive moving average, which were selected to verify and compare recurrent neural network models. Additionally, the long-and short-term memory (LSTM) unit weight and the updated network solution's time backpropagation algorithm in the long run. The simulation results show that the recurrent neural network model replaces other models, the training result is close to the actual value, and the prediction accuracy is high. As shown in Figure 4, the RNN standard network consists of three layers: input layer, hidden layer, and output layer. An LSTM memory cell is shown in Figure 5. A memory block is mainly composed of four parts: input gate, self-connected neuron, forget gate, and output gate. These gates are used to modulate the interaction between the storage unit itself and the environment. In this way, the state of the storage unit can remain unchanged from one moment to another.  Shi X et al. [33] proposed Convolutional LSTM (ConvLSTM). They built an end-to-end trainable model for precipitation nowcasting problems by stacking multiple ConvLSTM layers and forming a coding prediction structure. By extending the Fully connected LSTM (FC-LSTM), both the input state and the output state have a convolution structure state-tostate transition. In order to establish a good temporal and spatial relationship model, they extended the concept of FC-LSTM to ConvLSTM. Both state transition and state-to-state transition have a convolution structure.
In ConvLSTM, all the inputs X 1 . . . X t , cell outputs C 1 . . . C t , hidden state H 1 . . . H t , and gates i t , f t , g t , other 3D tensors in RP × M × N, where the first dimension is either the number of measurement (for inputs) or the number of feature maps (for intermediate representations), and the last two dimensions are spatial dimensions (M rows and N columns). The key equations of ConvLSTM are shown as follows [33]: where σ is sigmoid activation function, * and denote the convolution operator and the Hadamard product, respectively. The use of the input gate it, forget gate f t , output gate o t , and input-modulation gate g t controls information flow across the memory cell C t . In this way, the gradient will not be trapped in the memory and disappear quickly. Han F [34] applied RNN to nowcasting. By using PredRNN (Predictive RNN), by modeling historical radar data, the radar echo within the next hour can be predicted. The purpose of spatiotemporal sequence prediction learning is to generate future images through the learning of historical frames, in which spatial representation and temporal changes are the two key structures of historical frames. The core of this network is a new Spatio-temporal LSTM (ST-LSTM) unit, which simultaneously extracts and memorizes spatial and temporal representations. PredRNN achieves the most advanced prediction performance on three video prediction data sets. It is a more general framework and can be easily extended to other prediction learning tasks through integration with other architectures. Wang Y et al. [35] proposed PredRNN++, which is a recursive network for spatiotemporal prediction learning. In pursuit of powerful short-term video dynamic modeling capabilities, they made our network deeper by using a new loop structure called Causal LSTM and cascaded dual memory. PredRNN with the ST-LSTM is shown in Figure 6 [35]. Where C k t is the temporal memory, and M k t is the spatial memory, where the t denotes the time step, The superscript k th represents the hidden layer in the overlay causal LSTM network. The current temporal memory is directly dependent on its previous state C k t−1 , and f t is a forget gate, i t is an input gate, and g t is an input modulation gate. The current spatial memory M k t depends on M k−1 t in the deep transition path. Update equations of the causal LSTM at the k th layer can be presented as follows [35]: where * is convolution, is the element-wise multiplication, σ is the element-wise Sigmoid function, the square brackets denote the series of tensors, and the round brackets denote the system of equations. W 1∼5 are convolutional filters, where W 3 and W 5 are 1 × 1 convolutional filters. The final output H k t is co-determined by the dual memory states M k t and C k t . Compared with the shallow neural network model, the recurrent neural network model has better performance. Compared with other models, the recurrent neural network model is better. The recurrent neural network model can better describe the dynamic and nonlinear changes of rainfall under extreme weather. The performance of this model is also better than other models, and it is more suitable for rainfall time series modeling.

Application of Convolutional Neural Network in Extreme Weather
Convolutional neural networks [36][37][38][39] are widely used in machine learning applications, such as image classification [40,41], natural language [42], and speech processing [43,44]. Convolutional neural network can extract features directly from the original information without complicated preprocessing; With the combination of network from front to back, features from shallow to deep, removing the limitations of artificial features, convolutional neural network has become a multidisciplinary field in recent years. Convolution neural network has been applied in many meteorological prediction fields. It is also very helpful for extreme weather forecasting. The accuracy of the convolutional neural network algorithm comes at the cost of its huge computational complexity. Convolutional neural networks are usually accelerated on general-purpose processors or graphics processing units. The Convolutional neural network is a deep neural network that can achieve the latest accuracy and high performance for most computer vision tasks. The training of convolutional neural networks requires a lot of calculations and may take several days to complete. Training convolutional neural networks on large data sets require a lot of computation, which has led to a lot of research and development on open source parallel implementations on GPUs. However, there are few studies to evaluate the performance characteristics of those implementations. Amodei D [45] proposed an end-to-end deep learning framework that can be used to predict various extreme weather events. The framework is based on diffusion graph convolutional recursive neural networks and transmission elastic dynamic capture algorithms. The model aims to obtain the temporal and spatial dependence of transportation elasticity from the directed graph at the same time.
In order to achieve this goal, the temporal and spatial prediction based on the topological information of the urban road network; the use of big data in the real world to quantitatively study the spatial and temporal characteristics of urban road elasticity. In terms of calculation, the convolution operation is also used to replace the pooling layer [46], as shown in Figure 7, the convolutional neural network model considers the data of dozens of extreme weather events, and not just case studies of certain disasters. Chen Qiaote et al. [47] invented a time-period telemetry wind speed prediction method, which can predict extreme windy weather conditions. This method is based on convolutional neural networks. It is characterized by first constructing the input characteristic mapping of the model, and then establishing a depth-based volume based on the prediction model of the product neural network, and finally, the prediction model is established based on the time-period telemetry wind speed prediction. The advantage is that the sliding window is used to predict the data structure characteristics of the two-dimensional map from the historical data and the numerical weather forecast model. This form of input data retains the original data sequence information and can participate in the convolution operation to establish a prediction model. The one-dimensional convolutional neural network is used to extract the shallow and local characteristics of adjacent meteorological variables in the time domain, and the two-dimensional convolutional neural network1 is used to explore the potential and abstract feature information in the shallow local characteristics and return to the prediction data layer to provide effective deep features to improve the overall performance of the model.
The scattered echo images of the new generation of Doppler weather radar are affected by non-rainfall echoes, resulting in a decrease in the accuracy of refined short-term weather forecasts. Shin H C et al. [48] proposed a radar noise image semantic segmentation method based on deep convolutional neural network (DCNN). First, they designed a deep-use convolutional neural network model (DCNNM) to train the data, and extract feature images through the process of forwarding propagation to extract high-dimensional global semantic information and local feature details. Then, use the training error value to iteratively update the network parameters of the backpropagation to achieve the optimization of model collection and convergence effects. Finally, use this model to segment and process weather radar image data. This method can more effectively process extreme weather data. Experimental results show that this method is effective for weather radar images [49]. Compared with optical flow methods, full convolutional network (FCN), and other methods, this method has a better effect on real echoes and noise echoes in weather radar images. The noise effect recognition accuracy is high, and the image pixel accuracy is high. Wang. S et al. [50] proposes a radar quantitative precipitation estimation algorithm based on the temporal and spatial network model (ST-QPE) and designs the convolutional time-series network QPE-Net8 and the multi-scale feature fusion time-series network QPE-Net22.

Application of Capsule Neural Network in Extreme Weather
Geoffrey Hinton is one of the pioneers of deep learning and the inventor of classic neural network algorithms such as back propagation. He and his team proposed a brandnew neural network based on a structure called a capsule and also published a dynamic routing algorithm between capsules for training capsule networks.
The artificial neuron outputs a single scalar. The convolutional network uses the convolution kernel to superimpose the calculation results of the same convolution kernel in each region of the two-dimensional matrix to form the output of the convolution layer. The maximum angle in variance achieved by the pooling method, because the maximum pool continues to look for the area of a two-dimensional matrix and the largest number in the selected area, so it meets the activity invariance we want (that is, we slightly adjust the input and the output remains unchanged), In other words, we slightly change the detection of the object in the input image, and we can still detect that the object model pooling layer has lost valuable information, and does not consider the relative spatial relationship between the encoded features, so we should use capsules. In capsule detection, all important information about the characteristic state will be encapsulated as a vector (neurons are scalars).
Capsule network is a novel and promising neural network in the field of deep learning. By encoding features into capsules, establishing partial and overall relationships, it shows good performance in image classification. However, the original capsule network has weak feature extraction capabilities, many training parameters, and has the characteristics of explaining all the content in the image, so it is not suitable for complex background images. To solve the above problems, S. Yang et al. [51] proposed an advanced capsule network RS-Capsnet, which uses Res2Net blocks to extract multi-scale features, uses squeeze (SE) blocks to highlight useful features and suppress useless features. At the same time, the linear combination method between the capsules is adopted to improve the ability of the capsules to express the object to be detected and reduce the number of capsules. In addition, they also proposed a method that first constructs an intermediate capsule that can represent most of the subjects under test, and then uses the intermediate capsule and the main capsule together to construct a classification capsule. Ashesh et al. [52] believe that numerical weather prediction models need to increase computing time and resources, but extreme weather is sometimes difficult to predict. They introduced a data-based framework, based on simulation predictions (using similar patterns from the past), and new deep learning pattern recognition techniques capsule neural networks, CapsNets, and influence-based automatic labeling strategies. Utilizing fully coupled earth from a large collection, the data of the system model, CapsNets, is trained on the large-scale tropospheric circulation model (Z500), the model number is 0-4, depending on the existence of extreme temperature on the North American surface and the geographical area in the next few days. The trained network can only be used Z500 predicts the area of cold wave or heatwave that occurred one to five days ago. Figure 8 shows the Distribution of extreme weather for summer and winter [52]. Two state-of-the-art deep learning techniques are used for pattern recognition: convolutional neural network and more advanced capsule neural network. Compared with traditional image processing techniques, the main advantage of these two methods is that the feature extraction filter of each data set is learned through an algorithm called back propagation, rather than manually designed and specified in advance. Since 2011, Convolutional Neural Network has become a breakthrough method, it has changed the way of image processing, but because CapsNet has the characteristics of equal variables discussed later, it is expected to be better used in our time and space weather data.

Conclusions
It can be seen from the introduction of Sections 2 and 3 that the application of deep learning in extreme weather forecasting is extensive and effective and has great potential for further improving the accuracy of forecasting. Through the combination of deep learning and meteorological science, researchers have drawn more conclusions and contributed to better weather forecasting in the future, using deep learning to expand more ways to help predict extreme weather problems.
The following Table 1 summarizes the entire article. Table 1. Summary of deep learning and its application in extreme weather methods.

Method Specific Method Generalize Features
Traditional Methods

Weather Map Forecast Method
The first step of the weather map forecasting method is to predict the weather conditions, that is, predict the existing weather system and its future trends on the weather map.

Simple and intuitive
Weather Forecast Based on Weather Radar Radar can emit short-wavelength radio waves from the sky-white line.
When it encounters weather phenomena in the distant atmosphere such as typhoons, thunderstorms, and rainstorms, its radio waves will be reflected back and displayed on the radar screen.
Weather radar has a wide range of functions, providing necessary weather information for weather forecasting, rocket, missile and spacecraft launch and flight.

Numerical Prediction methods
Numerical weather forecasting is a set of equations describing fluid mechanics and weather evolution thermodynamics. A large computer performs numerical calculations under certain initial and boundary conditions according to the actual conditions of the atmosphere. This is a way to predict atmospheric movement in a certain period in the future, and weather phenomena. A typical nonlinear system requires a lot of calculation. In order to complete all calculations in a shorter time than the actual weather evolution.
Deep learning applications BP Neural Network BP neural network is a multi-layer feed-forward neural network. The main feature of the network is the forward transmission of signals and the backward propagation of errors.
Strong self-study adaptability Generalization and strong fault tolerance

Recurrent Neural Network
Recurrent neural network is a type of pattern used to recognize text, sequence data.
Its input not only includes the input examples currently seen. It also includes the information perceived by the network at the last minute. Using this attribute, information can circulate in the network for any length of time.

Convolutional Neural Network
From the input of original information, self-learning features, along with the network from front to back, the combination of features, from shallow to deep. The research focus of this subject area.
Researchers do not need to care about specific weather conditions, only need to understand the results after training, and realize the encapsulation of feature extraction.

Capsule Neural Network
The artificial neuron outputs a single scalar. The convolutional network uses the convolution kernel to superimpose the calculation results of the same convolution kernel in each region of the two-dimensional matrix to form the output of the convolution layer.
(1) A small amount of data can be used to learn a good representation effect; (2) The way of thinking is closer to the human brain, and the hierarchical relationship of internal knowledge representation is better modeled in the neural network. The intuition behind this capsule is very simple and elegant.

Discussion
Extreme weather prediction has always been a scientific problem faced by meteorologists all over the world, and it is particularly difficult to predict extreme weather events. At present, the level and technology of monitoring and predicting extreme weather events are relatively limited, and the irregularities and mechanisms of extreme weather events are still unclear. Therefore, in the future, it is necessary to study the statistical characteristics of extreme weather events first and carry out a relatively complete arrangement of the effective data set, including the frequency and intensity of events, seasonal changes, and the characteristics of annual weather changes. Additionally, study the main factors that cause extreme weather events, combine deep learning and artificial intelligence, and other broader fields to conduct more in-depth research and prediction of extreme weather.
Author Contributions: Investigation, W.F., Q.X., L.S. and V.S.S.; Supervision, L.S. and V.S.S.; Writingoriginal draft, W.F. and Q.X.; Writing-review and editing, W.F. and Q.X. All authors have read and agreed to the published version of the manuscript.