1. Introduction
Every year, extreme heavy precipitation causes serious disasters in urban areas, which seriously threatens the safety of people’s lives and property. Such intense precipitation is highly heterogenous spatially and temporally. Therefore, the meteorological department has an important responsibility to study the characteristics of intense rain and carry out forecasts for disaster prevention.
The study of precipitation involves many fields such as hydrology, physics, and atmospheric circulation. High-resolution, accurate, real-time quantitative precipitation forecasting (QPF) is especially useful for preventing flood disasters and reducing socioeconomic impacts [
1]. However, the characteristics of convective preciptation, such as rapid development, a short life cycle, and highly nonlinear dynamics make it challenging for prediction. According to the forecast period, precipitation forecasts can be divided into nowcasting (0–2 h) [
2], short-term forecast (0–6 h) [
3], short-range forecast (0–72 h) [
4], medium-range forecast (3–15 days) [
5], and long-range forecast (10–15 days) [
6]. In general, for short-range and medium-term range forecasts, the numerical weather prediction (NWP) models provide superior predictions, but models have poor performance in nowcasting [
7]. For precipitation nowcasting, meteorological radars provide precipitation observations with much higher resolutions than rain gauge networks, and there is a correlation between the distribution and intensity of radar echoes and the precipitation rate [
8]. Therefore, radar-based quantitative precipitation forecasting [
9] can obtain more detailed spatial structure and temporal evolution characteristics of precipitation, and has become a research hotspot.
Precipitation nowcasting needs to extract highly nonstationary features and predicts precipitation’s intensity, distribution, movement, and evolution in the coming hours. At present, radar echo extrapolation technology is currently a popular technology of precipitation nowcasting. Traditional optical flow methods [
10] calculate the optical flow of consecutive radar maps under the assumption that consecutive frames will not change rapidly. However, the assumption may not hold when radar echo has a complex evolution [
11]. Still, in order to predict precipitation, the precipitation should be retrieved according to the Z-R relationship [
12] after the radar echo extrapolation. Therefore, the first step is to achieve radar echo extrapolation, and the second step is to convert radar reflectivity into rainfall rates through the Z–R relationship, but predicting the precipitation in the two steps will easily cause the superposition of errors and reduce the accuracy of the precipitation nowcasting. Over the past few years, deep learning techniques have been increasingly applied in quantitative precipitation forecasting. Wang et al. [
13] proposed Eidetic 3D long short-term memory (E3DLSTM), which replaces the forget gate with the recall gate structure. Specifically, the forget gate determines whether past information can be “forgotten” like standard LSTMs. The recall gate uses an attentive module to compute the relationship between the encoded local patterns and the whole memory space. Wang et al. [
14] proposed a spatiotemporal prediction model called PredRNN, which adds spatiotemporal memory units and connects them through a zigzag structure to integrate temporal and spatial features. By applying differencing operations on the nonstationary and approximately stationary properties in spatiotemporal dynamics, Wang et al. [
15] proposed memory in memory (MIM) networks to capture complex nonstationary features in radar echo extrapolation. To alleviate the blurring and unrealistic issues for radar echo extrapolation, Geng et al. [
16] proposed enforcement of the idea of the generative adversarial network and developed a generative adversarial network-residual convolution LSTM (GAN-rcLSTM) method. For short-term QPF, radar echo extrapolation remains a powerful method because of the high temporal and spatial resolutions of radar echo maps. However, these radar extrapolation-based QPF techniques suffer from the problem of uncertainty in converting radar reflectivity to rainfall amount, and thus are still limited in improving the accuracy of the QPF.
Direct use of precipitation data as input to predict rainfall within two hours is also a method by which to achieve precipitation nowcasting. Kevin Trebing et al. [
17] proposed the small attention UNet (SmaAt-UNet) model, which uses the attention modules and depthwise-separable convolutions (DSC) to extract spatial features in the process of precipitation development. Song et al. [
18] present a self-attention residual UNet (SE-ResUNet) model, which uses UNet as the backbone network and adds residual structure to extract spatiotemporal information. Cong et al. [
19] introduced a new framework for precipitation nowcasting named Rainformer. In this work, they utilized the global features extraction unit and the gate fusion unit (GFU) in order to extract features. These methods use precipitation data as the only predictor. Directly using precipitation maps can avoid the uncertainty in converting radar reflectivity to rainfall amounts through a Z-R relationship. However, due to the sparse distribution of ground observation stations, it is difficult to achieve precise precipitation nowcasting.
In addition, ground-based radars are efficient tools for observing precipitation and its microphysical structure. Some researchers consider taking multi-source meteorological data as input to forecast the precipitation. Zhang et al. [
20] proposed a dual-encoder recurrent neural network called RN-Net. It takes the rainfall data of automatic weather stations and radar-echo data as input to predict rainfall for the next 2 h. Wu et al. [
21] used echo-top height and hourly rainfall datasets to establish a new dynamical Z-R relationship and achieve the radar-based quantitative precipitation estimation (RQPE) [
22]. In fact, utilizing multiple variables such as radar reflectivity and precipitation rate can capture richer physical information in QPF. However, many methods cannot realize the effective fusion of multiple input variables.
This paper proposed an attention fusion spatiotemporal residual network (AF-SRNet). Inspired by multimodal fusion and spatiotemporal prediction (MFSP-Net) [
23] and squeeze-and-excitation (SE) blocks [
24], we investigate this model, which includes spatiotemporal residual network and attention fusion block. The spatiotemporal residual network extracts the spatiotemporal information from radar data and precipitation data independently, and then the attention fusion block combines spatiotemporal information at the highest semantic level.
2. Related Work
According to the first section, radar-based quantitative precipitation forecasting (RQPF) has been widely used in precipitation nowcasting in recent years due to the spatiotemporal discontinuity of precipitation station data. The process of RQPF is shown in
Figure 1. At first, station rainfall grid data is obtained after interpolation, and radar mosaic grid data is obtained after quality control; then, these two data points are input into the spatiotemporal sequence forecast model to predict precipitation in the future. Spatiotemporal feature extraction and feature fusion are important parts of the spatiotemporal sequence forecast model. However, there are some weaknesses in these two parts.
Shi et al. [
25] modeled it as a spatiotemporal sequence forecasting problem, introducing the encoding-forecasting structure of ConvLSTM to achieve quantitative precipitation forecasting. Luo et al. [
26] introduced a sequence-to-sequence architecture called PFST-LSTM for RQPF. However, these works can only predict radar echoes. After the radar echo extrapolation, the predicted radar echo intensity needs to be converted into rainfall rates relying on the Z–R relationship, but different regions and different scales of precipitation systems have different Z-R relationships, which causes errors in precipitation nowcasting.
Subsequently, Bouget et al. [
27] fused radar echo images and wind velocity to predict precipitation, and they also directly used rainfall as the target to enhance the effect of QPF. Zhou et al. [
28] proposed a model called LightningNet to achieve lightning nowcasting by combining multisource observation data at different channels. However, these methods cannot effectively fuse the spatiotemporal information of multisource data by simple summation or channel concatenation.
In addition, more and more spatiotemporal sequence forecasting models are applied in prediction tasks. Wang et al. [
29] proposed the PredRNN++ with the structure of causal LSTM and highway units to capture spatiotemporal features. Chai et al. proposed CMS-LSTM to capture multi-scale spatiotemporal flows. However, the precipitation system concludes with a more complex spatiotemporal motion, and the spatial and temporal information will affect each other in these methods.
Overall, most of the previous work for RQPF has some deficiencies. First, when extracting the spatiotemporal information from the precipitation system, some features can be lost because the temporal and spatial information will affect each other, making it very difficult to achieve precise nowcasting. Secondly, radar and precipitation have not been effectively fused, and it is difficult to extract microphysical features, resulting in underestimation of high-intensity precipitation areas.
With regard to the above problems and the improvement of the QPF quality, this research includes two important techniques in deep learning, namely encoder–decoder [
30] and attention mechanism [
31]. Methods based on encoder–decoder were developed for natural language processing but are widely used in spatiotemporal sequence forecasting. Attention mechanisms can adaptively learn to reassign the importance of variable features, and have been proven to perform well in fusing features. We use the structure of encoder–decoder to improve spatiotemporal extraction and the attention mechanism to improve the effect of feature fusion.
4. Experiments
4.1. Dataset
This study uses the radar echo data and gridded precipitation observations from April to September in 2019–2021 in Jiangsu Province, China.
Radar reflectivity dataset: This dataset is the time series of radar echo data, the physical meaning of which is the radar-based reflectivity at the height of 3 km. The higher the concentration of water droplets in the atmosphere, the higher the radar reflectivity. This dataset is obtained after quality control and networking of several S-band weather radars in Jiangsu, covering the entire area of Jiangsu Province. The data value range is 0–70 dBZ, the horizontal resolution is (about 1 km), the time resolution is 6 min, and the grid size of single-time data is 480 × 560 pixels.
Precipitation dataset: This dataset is obtained by interpolating the precipitation data of automatic meteorological observation stations in Jiangsu to a uniform grid through the Cressman interpolation method [
36]. Moreover, precipitation is the accumulated precipitation of the automatic station in 6 min; that is, the accumulated value of the precipitation observation in 6 min up to the current time. The value range is 0–10 mm. The horizontal resolution, the time resolution, and the horizontal size are the same as the radar.
In terms of data preprocessing, we first downsampled the original resolution data to a size of 120 × 140 pixels through max pooling, considering the limitations of computational costs. The horizontal resolution after downsampling is
(approximately 4 km) considering the limitations of computing power and training costs. We downsampled the original resolution data to a size of 120 × 140 pixels, and the horizontal resolution after downsampling is
(approxmately 4 km), as shown in
Figure 6. It can be seen from the figure that there is a good correspondence between the high radar echo area and the heavy precipitation area. Secondly, in order to predict the precipitation in the next two hours, we determined that we should use the past 20 times (2 h) data to predict the precipitation of the next 20 times (2 h). The data was divided into 5143 groups; each group included 40 consecutive frames. The first 20 frames are used as the model’s input, and the last 20 frames were used as the ground truth. After that, we divided these sequence data into the training set, validation set, and test set according to the ratio of 8:1:1. Finally, we normalized both the precipitation data and the radar data to a range of 0–1.0.
4.2. Loss Fuction
The statistical results of precipitation distribution according to different rainfall intensities are shown in
Figure 7. It can be seen that there exists the problem of imbalanced frequencies of different rainfall levels in precipitation data. Specifically, among these categories, rainfall above 2 mm is the lowest proportion with a percentage of 2.2%, and rainfall between 0 mm to 0.2 mm is larger than rainfall above 2 mm.
Therefore, according to [
37], we adopt a weighted mean absolute error (WMAE) loss scheme. The loss function is defined as follows:
where
represent the actual
tth six minutes accumulated rainfall and
represent the predicted version.
4.3. Implementation Details
PyTorch [
38] implements all models in this paper with a NVIDIA RTX A100 GPU. All models were trained by using the Adam optimizer [
39] with a starting learning rate of
. In addition, to ensure that experimental results are comparable, all models had the same hyperparameters. All described models were trained for a maximum of 30 epochs, and we also used an early stopping strategy when the validation loss did not increase. The batch size was set to 8.
4.4. Performance Metric
In order to evaluate the performance of our model quantitatively, we use multiple metrics from the meteorological field. Because meteorologists are more concerned about the model performance under different rainfall levels, we binarize our prediction and the ground truth with different thresholds. If the value is larger than the given threshold, we set the corresponding value to 1; otherwise we set it to 0. Then, we calculate the number of positive predictions TP (truth = 1, prediction = 1), false-positive predictions FP (truth = 0, prediction = 1), true negative predictions TN (truth = 0, prediction = 0) and false negative predictions FN (truth = 1, predition = 0). At last, the critical success index (CSI), probability of detection (POD), false alarm rate (FAR) and Heidke skill sore (HSS) [
40] can be computed as follows:
Note that for CSI, POD, and HSS, the larger the better, whereas for FAR, the smaller the better.
Generally speaking, precipitation is divided into five categories: light rain, moderate rain, heavy rain, rainstorm, and downpour. As shown in
Table 1, we classified 6-min rainfall into five different grades according to the study mentioned in [
41].
For 1-h rainfall, according to [
23], precipitation is divided into four categories, as shown in
Table 2, so we choose 0.5 mm/h, 2 mm/h, and 5 mm/h as the thresholds for 1-h precipitation evaluation.
4.5. Experimental Results and Comparisons with SOTAs
We use some spatiotemporal prediction models as the benchmark models for precipitation nowcasting, including ConvLSTM, PredRNN, Memory In Memory, and SE-ResUNet. In order to ensure the comparability of the experiments, all models utilize radar and precipitation data, and we concatenate fields at the “channel” dimension of the two data as the input of these models. Each model uses the past 20 times as input and predicts the following 20 times in the future; that is, it predicts the rainfall in the next 0–2 h. When the validation loss no longer decreases during the training phase, the model with the smallest validation loss is selected as model well trained for prediction. Due to the low rainfall amounts of 6 min, we calculated the cumulative rainfall for an hour to evaluate the performance of these models. The average evaluation results of one frame of precipitation amount nowcasting in the first hour are shown in
Table 3, and the average evaluation results of two frames of precipitation amount nowcasting in the first two hours are shown in
Table 4.
As shown in the above tables, we can see that AF-SRNet performs best in almost all metrics. This means that the SRNet proposed in this paper can effectively extract the spatiotemporal information in the precipitation system. Moreover, the model can fully use the correlation between the radar high echo area and the precipitation high-intensity area by using an attention fusion block, which improves the accuracy of short-term heavy rainfall prediction to a certain extent. Secondly, MIM performs better than ConvLSTM, the PredRNN, and the SE-ResUNet, as it can capture short-term dynamic features. As is well known, convective precipitation has the the characteristic of rapid development, and the MIM can model the non-stationary and extract complex features. Last but not least, it can be seen from
Table 4 that with the increase of prediction time, the effect in SE-ResUNet is constantly enhanced, even under the threshold of 5 mm/h. The POD is higher than our model, but the FAR is also increasing at the same time. We suspect that the SE-ResUNet model can extract spatial features well, but it is difficult to extract temporal evolution information and capture the decline process in areas of high precipitation intensity.
In order to further visually represent the AF-SRNet prediction ability for high-intensity rainfall, a randomly chosen of visualization example is shown in
Figure 7.
The first two columns of each row are the 6-minute precipitation results, and the last two are the 1-hour precipitation results. First,
Figure 8 shows that our model can predict the area and the intensity of high-intensity rainfall that is much better than other SOTAs. However, our model undeniably suffers from overestimation in some areas related to the deep fusion of radar and precipitation features. Secondly, as the prediction time increases, ConvLSTM and MIM present bad performance on predicting high-intensity precipitation. The SE-ResUNet seems to have better prediction details, but the effect of the rainfall area forecast is worse than other models. This is not surprising because these models cannot fully utilize the advantages of multi-source data through a simple fusion. In summary, the observation suggests that as the intensity of rainfall increases, the superiority of AF-SRNet becomes more obvious.
We draw
Figure 9 to describe the MSE curves of all models at all nowcasting lead time stamps on the whole test set. We can see that our model has a lower prediction error in the first seven times. It is worth pointing out that except for SE-ResUNet, our model has a lower prediction error than other models’ rest times, demonstrating our approach’s effectiveness. Similar to the above analysis, We suspect that the UNet model can extract spatial features well but cannot extract temporal evolution information, resulting in a high false alarm rate.
4.6. Ablation Experiments and Analyses
In this section, we explore the impact of the spatiotemporal residual network and attention fusion block proposed in this paper by ablation experiments. Because spatiotemporal LSTM (STLSTM) is a common spatiotemporal feature extraction network, we use STLSTM as the baseline model.
The results of the AF-SRNet without attention fusion block (denoted by SRNet) and STLSTM are compared to see the impact of the spatiotemporal residues feature extracted by SRNet. As shown in
Table 5 and
Table 6, it can be seen that SRNet outperforms STLSTM under all thresholds. The results indicate that SRNet extracts features from temporal information and spatial information, respectively, which can effectively avoid the mutual interference of temporal and spatial information.
To demonstrate the validity of attention fusion block (AF), we compare the results of AF-SRNet without AF (denoted as SRNet) with the complete AF-SRNet. AF-SRNet outperforms SRNet in all metrics. In addition, it can also be seen that when the threshold is 5 mm/h, the effect of STLSTM with AF block (denoted as AF-STLSTM) is better than SRNet. Consequently, we can say that the AFblock proposed in this paper is superior in fusing radar echo and precipitation features.
To make a visual comparison of the four methods, we depict the prediction results on an example in
Figure 10.
We can make the following conclusions from the above figure. First, the method with the AF block performs well in the prediction of high-intensity precipitation, which indicates that our method of fusing radar echoes and precipitation features is effective. Secondly, SRNet performs significantly better than STLSTM, which indicates that the separate modeling strategy of temporal and spatial features proposed in this paper can extract more complex motion information. Finally, combining SRNet with AF block can achieve more accurate precipitation forecasts.
5. Discussion
Currently, some precipitation nowcasting methods based on spatiotemporal sequence prediction suffer from the problem that spatial and temporal information affect each other. Moreover, most radar-based quantitative precipitation forecasting methods only use a simple fusion method to utilize radar and precipitation data. It is difficult to effectively establish microphysical constraints in developing precipitation systems. To solve these problems, we explore the combination of independent spatiotemporal modeling and multimodal fusion in precipitation nowcasting. The AF-SRNet proposed in this paper uses both radar echo data and precipitation grid data as input to predict the rainfall in the next 0–2 h. By comparing the experimental results and visualization cases, we can draw the following conclusions.
First, the precipitation grid data is obtained by interpolating station data, causing the characteristic of weak continuity. Although the radar high echo area has a good correspondence with the heavy precipitation area, we can effectively improve the quantitative precipitation forecasting effect by fusing radar and precipitation features, especially for heavy precipitation forecasting.
Secondly, the extraction of temporal and spatial evolution information plays an important role in precipitation nowcasting, which affects the area and intensity of precipitation accordingly. For this, the proposed AF-SRNet utilizes multiple residual spatiotemporal encoders to get a wider spatiotemporal receptive field and establish long-term and short-term dependencies.
Thirdly, it can be seen from the experimental results that all models tend to blur with the increase of forecasting steps. Therefore, we hope to improve the details of precipitation nowcasting in our future study.