1. Introduction
Radar echo nowcasting is a crucial method in the field of atmospheric science. The goal of this task is to carry out prediction timely and accurately for weather conditions of local areas in a relatively short period (such as 0–2 h) in the future [
1,
2,
3]. Currently, this technology has been widely applied to provide flood prevention information for resident trip, agricultural production, flight safety, and other aspects. It is both convenient for people and conducive to disaster prevention and mitigation, and it has always been a pivotal task in weather forecast field. With climate change and the rapid process of urbanization, atmospheric conditions have become more complex, various meteorological phenomena frequently happen, such as precipitation, hail, high temperature, typhoon, etc. Climate change has brought about many adverse impacts on the life and work of people and increased many dangers of uncertainty. If effective forecast and precaution can be made regarding the aforementioned meteorological phenomena, the losses will be reduced dramatically [
4]. However, for nowcasting, the process of future strong convection formation is affected with the current convection situation in the region and previous change trend. Under the influences of these climate factors, it is difficult to determine the shape and size of convection, the distribution of convection presents a complex change trend, which requires prediction model and data with strong spatiotemporal correlation to solve the problem [
5,
6]. On the other hand, due to the higher requirements for accuracy and timeliness compared with traditional forecast tasks, the work is very challenging and also gradually becomes a hot study topic in the meteorological community.
At present, the regular methods of nowcasting are mainly based on radar echo extrapolation methods [
7,
8]. The radar echo extrapolation can be divided into traditional methods and deep learning methods. Specifically, traditional methods generally include the centroid tracking method, cross correlation method, and the optical flow-based methods. Initially, a simple tracking algorithm [
9] obtains the moving vector by continuously tracking the position of the centroid in an alpine region, so as to predict the position of the echo, but it is only suitable for strong single echo. The TRACE3D algorithm [
10] identifies convective cells and tracks them by exclusively using radar reflectivity data as input. This method shows promising preliminary results for centroid tracking. An enhanced centroid tracking method [
11] makes least square fitting for the position of the echo centroid at the adjacent moment, and obtains the atmospheric parameters such as the moving vector, the maximum reflection factor and the centroid coordinates of the single echo, and proposes a dynamic constraint-based combinatorial optimization method to track storms. This method is effective for the echo block with larger intensity, when the echo splits or merges, the accuracy of tracking and prediction is low. A cross correlation method based on TREC vectors [
12] and an improved cross correlation method [
13] are mainly used to calculate out spatial correlation of two consecutive moments and establish fitting relation for echoes, but for the strong convection echoes that evolve fast, these methods cannot ensure the effect of tracking, the data utilization rate and nowcasting accuracy are obviously low. In addition, the optical-flow-based methods in computer visual technology [
14,
15] has proved to be effective for radar echo extrapolation prediction with fast evolution, especially the real-time optical flow by variational methods for echoes of radar algorithm (ROVER) that was recently proposed [
16]. ROVER completes echo prediction by use of the changes of pixels of image sequence in time threshold and calculation of optical flow correlation between adjacent frames, combing with climate and other factors. However, the optical flow estimation step and the radar echo extrapolation step are separated, so it is difficult to decide model parameters to obtain good prediction, which results in a limitation of the optical-flow-based method for nowcasting of strong convection. Moreover, an excellent review of these methods was given by Keenan et al. [
17], it presents an overview and comparison of nine existing nowcasting systems deployed in the forecast demonstration project during the 2000 Olympic Games in Sydney, Australia. This fully reflects the practicability of the nowcasting technique in big events. Traditional radar echo nowcasting extrapolation methods only assume simple linear evolution of echo, and the utilization of massive radar echo image data is low, so there are defects in nowcasting accuracy.
Compared with traditional radar echo prediction methods, deep learning is able to better mining and analyze big data in depth and improve prediction performance of models [
18], which has been practically applied in many fields. Therefore, the application of this technology in weather nowcasting field is also a meaningful research task. The dynamic convolutional layer network [
19] was first proposed and used for short range weather nowcasting, it applies a dynamically varied convolution kernel to extrapolate radar echoes, but it is limited to extrapolating only one echo at a time. For the characteristics of convolutional neural networks such as local perception of images and feature extraction, the recurrent dynamic convolution neural network model (RDCNN) [
20] further was established to learn changing features of echoes by adding a dynamic sub network and a probability prediction layer, improved the accuracy of echo extrapolation prediction. In recent years, the recurrent neural network (RNN) [
21] and long short-term memory (LSTM) [
22] bring some new solutions for radar echo nowcasting task [
23,
24]. RNN has excellent effect in dealing with time series problems, and as an emerging technology driven by big data, deep learning can make full use of large amount of collected radar echo data, this will train the network model more effectively and predict the future echo trend more accurately. The unsupervised video representation learning model based on LSTM structure was proposed [
25], and by using this encoding-decoding structure, multi-frame actions in the future can be predicted, which has laid a foundation for the spatiotemporal sequence prediction. Subsequently, in order to capture long-term time features more fully, the bidirectional LSTM network with 1D CNN model [
26] was constructed to solve precipitation nowcasting problem. The forecast of radar echoes has comparatively strong spatiotemporal correlation, spatiotemporal information at previous moment can decide the prediction of next moment, but general, LSTM does not consider spatial correlation in temporal dimension. Considering the problems of LSTM structure such as containing too much redundant data and easy spatial information loss, a convolutional LSTM network (ConvLSTM) [
27] was proposed on this basis, which can learn the spatial features and temporal features at the same time, and it is more suitable to solve the problem of radar echo prediction. Tan et al. [
28] proposed a hierarchical convolutional LSTM network named FORECAST-CLSTM. The model is designed to fuse multi-scale features in the hierarchical network structure to predict the pixel value and the morphological movement of the cloudage simultaneously. Thereafter, a ST-LSTM method [
29] with convolution calculation and spatiotemporal memory flow was introduced into a radar nowcasting task, which makes it possible to extract spatiotemporal features of echoes in different time and sizes, but, the computational complexity is increased. Furthermore, a 3D convolution method [
30] was proposed to capture motional information between consecutive frames, it made convolutional neural networks be suitable for dealing with the information of spatiotemporal features. Compared with the method [
30], a 3DCNN video generation model combining generative adversarial networks was proposed [
31], this method used 3D convolution network to extract spatiotemporal features efficiently and generated new dynamic echo sequences.
Therefore, this paper proposes a novel 3DCNN-BCLSTM radar echo nowcasting model with encoding-forecasting structure to tackle the challenging task of low forecast accuracy and easy spatiotemporal information loss. Because inputs and outputs are both multi-frame radar echo sequences, the prediction of radar echo evolution trend can be expressed as a video sequence prediction with spatiotemporal features [
32]. In order to achieve a more accurate nowcasting result, it first introduces a 3D convolution network that is usually used for feature extraction of continuous video frames. This can preserve the feature information of motion in the temporal dimension and extract local short-term spatiotemporal features of consecutive images more effectively, which then enters the bi-directional convolutional LSTM networks. Its state to state transitions are all convolutional structures, and the bi-directional structure can learn the global long-term motion trend of the front and back echoes more fully, then completes prediction of future echoes through forecasting network. Finally, we evaluate and compare it with traditional extrapolation algorithms and other deep learning algorithms, the experiment fully proves that the comprehensive evaluation of the improved deep learning model proposed in this paper is always better than other compared models.
2. 3DCNN-BCLSTM Model
In order to further improve nowcasting accuracy and make better use of spatiotemporal correlation between radar echo images, this paper proposes a encoding-forecasting structure combining 3DCNN and bi-directional convolutional LSTM according to the multiple deep learning technologies, this can capture spatiotemporal feature relation of consecutive radar echoes more effectively and enhance transmission ability between spatiotemporal features, the specific model architecture is shown in
Figure 1.
First of all, the consecutive radar image sequences are constructed as model input with uniformly spatial and temporal dimensions, for this treatment of data dimension, tensors with complete spatiotemporal features can be obtained. In terms of main structure, a generative model of encoding-forecasting structure is established which is mainly consisted of two networks—one is encoding network and the other one is forecasting network. Second, this paper extracts local short-term spatiotemporal features of consecutive multi-frame images through 3DCNN, then learns dependencies of global long-term bi-directional spatiotemporal features through three-layer bi-directional convolutional LSTM networks, and compresses captured and learned echo motion features into hidden state tensors (the former part is the encoding network of model). After that, the forecasting network is composed of three-layer bidirectional convolutional LSTM connected with the internal states of the encoding network and the last layer of 3DCNN, which is used to fuse the multi-frame spatiotemporal states, the spatiotemporal feature information learned by the encoding network is transmitted into the forecasting network, the future echo image sequences are reconstructed according to the current input and feature information. In addition, the batch normalization (BN) method [
33] is introduced, and the rectified linear unit (ReLU) as nonlinear activation function is used to replace the traditional Sigmoid to improve network convergence speed and alleviate the over fitting phenomenon. This deep learning structure can obviously enhance learning capability of model, and the model possesses stronger expression capability of spatiotemporal features for multi-frame radar echo images; therefore, the prediction accuracy is improved effectively.
2.1. Construction of 3D Spatiotemporal Data
In terms of radar echo prediction problem, the original input data dimensions can no longer meet the requirements of network model, its main disadvantage is that the convective spatiotemporal feature information cannot be encoded completely. In order to solve this problem, all input, unit output and cell states need to be transformed to 3D tensors
, where R denotes the domain of atmospheric data features. The first dimension T is temporal dimension, the second dimension W, and the third dimension H are spatial dimensions of row and column, respectively. In fact, the 3D spatiotemporal data is different to the use of the volumetric data of the weather radar. As showed in
Figure 2, the original single echo image has been transformed to vectors of multi-frame temporal dimension in spatial grid, a 3D spatiotemporal stereostructure is generated by stacking consecutive images in turn, then the neural networks may predict future states of unit in grid through local adjacent information and past states.
For the 3DCNN-BCLSTM network structure, input data dimensions of echo images need to be restructured, the temporal dimension and spatial dimension are constructed respectively. During the process of spatiotemporal feature extraction and motional information learning, input and output are both 3D tensors, the transitions between states are also convolution calculation of 3D tensors, which makes the radar echo data have a unified dimension, preserves all spatial and temporal features at the same time, and the radar echo nowcasting in the region is more comprehensive and accurate.
2.2. 3DCNN Module
The convolutional neural networks are very suitable for image data processing due to its local connection, feature mapping and weight sharing. Even though traditional 2DCNN possesses strong feature extraction capability of image data, when it deals with consecutive echo image tasks, it fails to consider the impact of relation between multi-frame images on prediction, and is easy to lose motion trend information of target features, thus cannot solve the problem of motional echo prediction effectively. We utilize constructed 3DCNN instead of traditional 2DCNN for more accurate results. The calculation formula for 3DCNN is showed as follows:
There are multiple convolution kernels in the convolution layers of the neural networks, each convolution kernel corresponds one echo feature, the more convolution kernels, the more feature maps are generated. In the formula, the value at position (W, H, T) on the jth feature map in the ith layer is given by
,
is the size of the 3D kernel along the temporal dimension.
is the (p, q, r)th value of the kernel connected to the mth feature map in the previous layer.
is the bias for this feature map,
is a nonlinear activation function introduced to improve the expression capability of neural networks. This 3DCNN structure can preserve more information of continuous multi-frame images and can be used for meteorological nowcasting tasks effectively. In the process of dimension reconstruction of the input radar echo images, several consecutive frames of uniform spatial size are stacked in time order to form 3D data with spatiotemporal features. Then, as shown in
Figure 3, the 3D convolution kernel is used for operation in this continuous 3D data, the 3D convolution kernel in the figure contains three frames of temporal dimension, that is, the convolution operation for three consecutive maps are required. The feature data extracted by 3DCNN in the last layer of the encoding network will be transmitted to the next network as input [
30]. In this structure, every feature map in convolution layer will be connected to several consecutive frames in the previous layer, and the specific value of each position of feature map is obtained through local feeling of successive multiple same positions in the previous layer, thereby captures spatiotemporal motional information of echo images.
In the encoding network part of the radar echo extrapolation model, we improve the problems of multi-frame images that are difficult to deal with, and the spatiotemporal information is easily lost. The input of the network is composed of consecutive image sequences, and then successively enters to Conv1 and Conv2 for short-term feature extraction. This part is mainly composed of two Conv3D layers, each Conv3D layer is followed by a batch normalization (BN) and a ReLU nonlinear activation function layer. The convolution kernels of two-layer Conv3D are small size 3 × 3 × 3, the number of filters is 16 and 32, respectively, and each 3D convolution kernel has the same weight coefficient. In order to keep the size of feature maps constant, the padding operation is carried out before convolution operation. In order to accelerate the deep learning network training and effectively avoid the related gradient problems, we increase the BN after each 3D convolution layer [
33] and normalize the data distribution of each batch in the network calculation process. The derivative range of the traditional activation function is less than 1, and the gradient will be continuously attenuated when passing through each layer, with the deepening of the network structure, the gradient may disappear. Thus, the ReLU activation function is selected to replace the traditional Sigmoid activation function. The formula is as follows:
When the input x is less than 0, the mandatory output will be 0; when input x is larger than 0, it is constant. ReLU increases sparsity of networks and makes convergence rate grow, then the generalization capability of the feature extraction is stronger, the over fitting phenomenon is alleviated, and the accuracy is improved to a certain extent. 3DCNN module uses two shallow layers here, this is to capture spatiotemporal features of images more effectively by combining bi-directional convolutional LSTM layers afterward; this reduces feature loss and accelerates convergence speed of neural networks.
A 3DCNN network is also used in the forecasting network part, followed by a ReLU nonlinear activation function layer. The number of filters is set to 1 here, so that the model can finally generate the gray images with the same channel number as the original input and outputs the visualization results.
2.3. Bi-Directional Convolutional LSTM Module
Recurrent neural network (RNN) can handle the time series problem of meteorological forecast, long short-term memory (LSTM) is a special structure based on RNN, this network structure is used for learning the changes with temporal sequence factor. In recent years, LSTM is frequently used in fields such as natural language processing (NLP), and in this paper, we try to learn spatiotemporal dependencies of consecutive echo images through improved LSTM structure.
As a special variant of a recurrent neural network, the innovation of LSTM is the memory units whose essence is the place for continuous update and interaction of information. However, the traditional recurrent update structure cannot either realize update and filter of information or meet long distance dependency of information; therefore, the three gates structure is introduced to fulfill those requirements. LSTM relies on memory units to update continuously state information of current moment uses forget gate, input gate, and output gate to decide what information to forget, what information to input, and what information to output. LSTM network solves long-term dependency problem of RNN and extends extrapolation timeliness, which makes input sequences effectively map to hidden nodes, and can learn the relation between the front and the back of the long time series through training. The LSTM structure possesses strong capability of solving time series problems; however, for the processing of spatial data, it contains too much redundant information. Spatiotemporal information cannot be encoded; if it is directly applied to radar echo nowcasting, the loss of spatiotemporal information will be inevitable. A convolutional LSTM [
27] was proposed whose structure is still LSTM in essence, but the transitions between states is changed from multiplication to convolution. It establishes a time series relation like LSTM does and also depicts spatial features like CNN does, effectively overcoming the problem of spatial information loss in sequence transmission process. Based on this structure, this paper constructs bi-directional convolutional LSTM, the structure is showed in
Figure 4.
A bi-directional convolutional LSTM network is composed of one forward transmission and one backward transmission. This network comprehensively combines the forward and backward information, outputs the radar echo results, and solves the problem that single directional transmission cannot handle the information from the back to the front. In the network, each bi-directional convolutional LSTM memory unit contains the spatial and temporal output from 3D convolution network, the calculation process of each part in the structure is as follows:
where
is the input of current moment;
is the output of t-1 moment;
,
, and
denote forget gate, input gate, and output gate in CLSTM respectively; and W and b are connection weight and bias of gate structure. Let the convolution operation
replace original multiplication of LSTM, and let
denote the Hadamard product, which is the multiplication of corresponding elements of matrix. The nonlinear activation function
used here is Sigmoid, with the formula
, and the value range of three gates is controlled to [0,1].
is state update unit which is the core part of bi-directional convolutional LSTM.
The 3DCNN-BCLSTM model proposed in this paper, three-layer bi-directional convolutional LSTM is placed at encoding network, three layers are in prediction network, and the number of filters in two parts is both 32, 48, 64 with size of convolution kernel 3x3. In the bidirectional convolutional LSTM, the padding operation is also performed in order to make the size of spatiotemporal features unified, and each layer is followed by a layer of Batch Normalization. The spatiotemporal information of continuous multi-frame image sequences is transmitted by bi-directional convolutional LSTM, which has been effectively fused in the global long-term range. Compared with single direction convolution LSTM, bi-directional convolutional LSTM can learn the global long-term feature dependencies in the forward and reverse directions and completes the nowcasting task more efficiently.
2.4. EncodingFforecasting Network Structure
For the radar spatiotemporal sequence nowcasting, when there is a set of 3D tensor sequence data
, given the previously fixed length of L observation sequence data, the radar echo image sequences with the future length of K
can be generated through the encoding-forecasting network structure, where the t denotes current moment, and
represents the prediction output, as shown in Equation (8), taking the past prediction echoes as the condition,
represents the maximum probability to make the prediction of future moment as close to reality as possible.
The generative model of encoding-forecasting network structure showed in
Figure 5 is mainly used in this paper, it is composed of encoding network and forecasting network [
25]. The network combines encoding network of stacked two-layer 3DCNN and three-layer BCLSTM, and the forecasting network of three-layer BCLSTM and one-layer 3DCNN, which receives internal state of encoding network. This structure compresses the captured feature information of motional echoes into hidden tensor format by the encoding network and then the forecasting network will unfold hidden state tensors and generate new radar echo prediction results based on the feature information of last moment. The network is as follows.
Step 1: Constructing input data with spatiotemporal features
When data is input, radar echo images in the dataset need to be narrowed to single channel gray images with 100x100 pixel spatial dimension, then images are transformed to array format and save it in numpy array to wait for extraction and use. For pre-processing process, temporal dimension of data also needs to be constructed, which constitutes eight consecutive frames of images input and predicts right frames of images in the future. Thus, radar data is transformed to 3D tensors with spatiotemporal features to facilitate model inputting and training.
Step 2: Extracting local short-term spatiotemporal feature information
Consecutive radar echo images with uniformly spatial and temporal dimensions after dimension construction processing as a whole are the network input. Thirty-two echo feature maps are extracted through two layers of 3DCNN, and the ReLU activation function is used to replace the original Sigmoid to alleviate over fitting phenomenon and effectively increase prediction accuracy.
Step 3: Learning global long-term spatiotemporal feature information
Then, the global long-term spatiotemporal correlation from delivered feature information is learned through three-layer BCLSTM, the learned spatiotemporal features are compressed into hidden state tensors. Up to this point, it is the encoding network of first half of whole generative network, the forward and backward structure in this part can learn bi-directional spatiotemporal feature dependency fully.
Step 4: Reconstructing and generating predicted radar images
Finally, a forecasting network composed of three-layer BCLSTM and the last layer 3DCNN is constructed, the atmospheric spatiotemporal feature information learned by the encoding network is transmitted into the forecasting network, and the future prediction images are generated according to the current input and hidden states.
4. Conclusions
Utilizing and mining massive radar echo data is low in traditional radar echo nowcasting, and the meteorological process of future echo formation is affected with the current echo situation in the region and previous change trend, which possesses strong spatiotemporal correlation. In this paper, a novel deep learning model of 3DCNN-BCLSTM encoding-forecasting structure is proposed and applied to radar echo nowcasting task. This model captures and learns spatiotemporal feature dependencies of consecutive radar echo images more effectively by utilizing the constructed 3D spatiotemporal data and the encoding-forecasting network combining two spatiotemporal convolutional structures. Three-dimensional spatiotemporal data contains spatial and temporal dimensions in the atmospheric change, which is more suitable for radar echo nowcasting tasks with strong spatiotemporal correlation. The constructed 3DCNN is first used to extract the local short-term spatiotemporal features, avoids the confusion of spatial features caused by directly utilizing the convolutional LSTM network for learning, and the bi-directional convolutional LSTM structure can learn the global long-term motion trend of the forward and backward radar echoes more fully. This model improves the situation of vague prediction images and solves the problems of easy spatiotemporal information loss and low forecast accuracy. It is shown in the evaluation result that the performance of this model is obviously better than other models under various rainfall threshold conditions, and the predicted future echo images are more accurate, which fully proves the effectiveness of this method.
In future work, we will try to integrate the generative adversarial network (GAN) into the meteorological deep learning network proposed in this paper. The current encoding-forecasting network model is regarded as a generator, and we plan to add a discriminator network and reconstruct the loss function to force the generation of more accurate echo images through the adversarial training. On the other hand, although our model has some advantages under various rainfall thresholds by setting the weights of different rainfall levels, there is still room for improvement in heavy rainfall or severe weather; therefore, we argue that the model needs to try more heavy rainfall data further or introduce the parameters of rainfall intensity change such as humidity and topography for correction.