Enhancing the Encoding-Forecasting Model for Precipitation Nowcasting by Putting High Emphasis on the Latest Data of the Time Step

: Nowcasting is an important technique for weather forecasting because sudden weather changes signiﬁcantly affect human life. The encoding-forecasting model, which is a state-of-the-art architecture in the ﬁeld of data-driven radar extrapolation, does not particularly focus on the latest data when forecasting natural phenomena. This paper proposes a weighted broadcasting method that emphasizes the latest data of the time step to improve the nowcasting performance. This weighted broadcasting method allows the most recent rainfall patterns to have a greater impact on the forecasting network by extending the architecture of the existing encoding-forecasting model. Experimental results show that the proposed model is 1.74% and 2.20% better than the existing encoding-forecasting model in terms of mean absolute error and critical success index, respectively. In the case of heavy rainfall with an intensity of 30 mm/h or higher, the proposed model was more than 30% superior to the existing encoding-forecasting model. Therefore, applying the weighted broadcasting method, which explicitly places a high emphasis on the latest information, to the encoding-forecasting model is considered as an improvement that is applicable to the state-of-the-art implementation of data-driven radar-based precipitation nowcasting.


Introduction
With the increase in the number of successful cases of application of deep learning in real life, such as in autonomous driving, healthcare, and smart cities [1][2][3][4][5][6][7][8][9], various attempts have been made to apply deep learning to weather-related fields using numerical models [10] to improve the performance of weather forecasting [11][12][13][14][15]. In the field of meteorology, nowcasting is a popular research topic in which deep learning techniques are being actively applied to the analysis of spatiotemporal data, such as radar and satellite data [16][17][18][19].
Precipitation nowcasting is the prediction of the spatiotemporal distribution of rainfall that will occur within a relatively short period of time. It is a very important weather forecasting technique for securing a golden time in natural disasters, such as flooding caused by sudden torrential rain. The extant nowcasting methods can be classified into traditional methods based on numerical models or optical flows and machine learning methods based on statistical methods [19]. In the past, optical flow methods were mainly employed because numerical models required a long initial driving time. Nevertheless, in recent years, deep learning using statistical methods is being actively studied because it surpasses traditional methods in performance [19][20][21].
conformal conic projection was applied to the CAPPI, and a dataset was constructed using 256 × 256 grids with a horizontal resolution of 2 km. Because the CAPPI provides reflectivity (unit: dBZ), the Z-R relationship [28], which is used to convert reflectivity to the intensity of rainfall as shown below, was applied to obtain the rainfall rate for every 10 min. R = 10 Z 10 200 5 3 (1) where R is rainfall rate (mm h −1 ) and Z is reflectivity (dBZ). Furthermore, the input values were rescaled to (0,1) through the min-max normalization using the range of 0 mm/h (minimum) to 110 mm/h (maximum) during data preprocessing. Of the entire dataset, odd-numbered days from 2012 to 2017 were used as training data, and even-numbered days were used as the validation and test sets. From the even-numbered days, even-numbered months of even-numbered years and odd-numbered months of odd-numbered years were used as the validation set, whereas the odd-numbered months of even-numbered years and even-numbered months of odd-numbered years were used as the test set. By configuring the data as such, the training, validation, and test datasets showed a similar distribution.
The final configuration of the datasets is as listed in Table 1. For the experiment of the encoding-forecasting model, we prepared an experimental environment for nowcasting that forecasts rainfall 3 h in the future after observing 3 h of past data. With a temporal resolution of 10 min, the time-step lengths of both the input and output sequences of the model were 18 each. Therefore, the total time-step length of a single data instance comprising input and output sequence was 36 (18+18).
Because of the nature of rainfall (i.e., it does not often rain all day long), a vast amount of training data was composed of sequences without rainfall. Figure 1 shows the distribution of the training data by rainfall intensity. As shown, the data distribution is concentrated at a rainfall intensity of 0.8 or below. When the model is trained in this state, the model may be trained with the bias toward the periods without rainfall. To address this issue, this study trained the model using only the data for which the average rainfall intensity of a data instance comprising 36 × 256 × 256 grids was 0.8 or higher. In this way, the problem of uneven distribution of rainfall data was mitigated for the training process.
Finally, after filtering the data of each category with a rainfall intensity threshold of 0.8, 3335 training sequences, 2321 validation sequences, and 1894 testing sequences were used as sequence samples. spatiotemporal analysis [20]. The ConvLSTM cell achieves stable and powerful spatiotem-146 poral modeling performance by combining the long-range temporal dependence analysis 147 of the existing long short-term memory (LSTM) [29] and the spatial characteristic analysis 148 capability of the CNN. 149 The key formulas of each module constituting the ConvLSTM cell are shown in Equa-150 tion (2): , , and represent the short-term state, long-term state, and input value, re-152 spectively. , , and are used to control the state and input value. More specifically, 153 Figure 1. Distribution of the average rainfall intensity for training data.

Proposed Model
The weighted broadcasting (WB) based encoding-forecasting model proposed in this study is a model optimized for nowcasting a relatively short term of approximately 3-6 h in the near future. To improve the performance of the existing encoding-forecasting model, the concept of weighted broadcasting block (WB-Block), which emphasizes the latest data in the time step from past observations, is introduced. In addition, we combine the WB-Block with the convolutional layer of the encoding-forecasting model to improve nowcasting performance.

• ConvLSTM Cell
The convolutional long short-term memory (ConvLSTM) cell is the most commonly used method for spatiotemporal sequence modeling because it can perform simultaneous spatiotemporal analysis [20]. The ConvLSTM cell achieves stable and powerful spatiotemporal modeling performance by combining the long-range temporal dependence analysis of the existing long short-term memory (LSTM) [29] and the spatial characteristic analysis capability of the CNN.
The key formulas of each module constituting the ConvLSTM cell are shown in Equation (2): H t , C t , and X t represent the short-term state, long-term state, and input value, respectively. f t , i t , and o t are used to control the state and input value. More specifically, f t controls the part of the long-term state that should be erased. i t controls the part of the input value that should be added to the long-term state. o t controls the part of the long-term state that should be read and output as the result of the current time step. W and b denote the weight matrix and bias, respectively. Finally, σ and tan h represent the sigmoid and hyperbolic tangent function, respectively.
• Encoding-forecasting model The encoding-forecasting model employs a sequence-to-sequence (seq2seq)-based network structure [30], which combines an encoding network and a forecasting network to address the spatiotemporal sequence forecasting problem. The encoding network analyzes spatiotemporal patterns of past data to generate latent vectors, and the forecasting network uses latent vectors of the encoding network to forecast future rainfall. In general, the encoding network is constructed by stacking a convolutional layer, which performs downsampling and spatial abstraction, and a recurrent layer, which compresses temporal patterns. The forecasting network is constructed by stacking a deconvolutional layer, which performs upsampling and spatial concretization, and a recurrent layer, which predicts future patterns from compressed temporal patterns.

Model Description
This study proposes a model that combines the weighted broadcasting and the encoding-forecasting model to improve the performance of nowcasting. The architecture of the proposed model is shown in Figure 2, where (A × A, B) denotes the output of the layer constituting the network, e.g., (128 × 128, 16), (64 × 64, 32), (32 × 32, 64). The symbols A and B denote the size and number of channels, respectively, of the resulting feature map, whereas k and s in the convolutional layer represent the size of the kernel and stride, respectively. the sigmoid and hyperbolic tangent function, respectively.
• Encoding-forecasting model The encoding-forecasting model employs a sequence-to-sequence (seq2seq)-based network structure [30], which combines an encoding network and a forecasting network to address the spatiotemporal sequence forecasting problem. The encoding network analyzes spatiotemporal patterns of past data to generate latent vectors, and the forecasting network uses latent vectors of the encoding network to forecast future rainfall. In general, the encoding network is constructed by stacking a convolutional layer, which performs downsampling and spatial abstraction, and a recurrent layer, which compresses temporal patterns. The forecasting network is constructed by stacking a deconvolutional layer, which performs upsampling and spatial concretization, and a recurrent layer, which predicts future patterns from compressed temporal patterns.

Model Description
This study proposes a model that combines the weighted broadcasting and the encoding-forecasting model to improve the performance of nowcasting. The architecture of the proposed model is shown in Figure 2, where (A × A, B) denotes the output of the layer constituting the network, e.g., (128 × 128, 16), (64 × 64, 32), (32 × 32, 64). The symbols A and B denote the size and number of channels, respectively, of the resulting feature map, whereas k and s in the convolutional layer represent the size of the kernel and stride, respectively.  Before examining the proposed model, it must be noted that the convolutional and recurrent layers are stacked repeatedly in the encoding and forecasting part of the existing encoding-forecasting model. While the shallow convolutional layers capture the regional and detailed features of the spatial distribution of precipitation, the deeper convolutional layers capture the global and abstracted features; this is the most significant characteristic of the convolutional layer. In addition, the recurrent layers between the convolutional layers analyze the past rainfall patterns over time at each level abstracted by the convolutional layers; this is the most significant characteristic of the recurrent layer. The rainfall pattern information analyzed in each recurrent layer at each abstraction level of the encoding network is transferred to the recurrent layer corresponding to the concretization level of the forecasting network for prediction of the future rainfall pattern. At this stage, the output information of the deconvolutional layer at each concretization level constituting the forecasting network is combined to supplement the information generated from the previous concretization level.
In this study, the performance of the existing encoding-forecasting model is improved by applying a WB-Block, which puts more weight on the most recent feature map in the convolutional layer of the encoding network. While predicting future rainfall, patterns of past observations are analyzed, and the most recent rainfall patterns have the greatest influence on future patterns. Therefore, when predicting the output radar R' t+1~R ' t+n using the input radar data R t-m+1~Rt , the module in Figure 3 is added to reflect the feature map of the t-th time step of the encoding network with greater importance. Essentially, in the encoding-forecasting model, the recurrent layers of the encoding network and the forecasting network are combined using a state copy to generate the structure of the seq2seq model. Hence, m and n, which are the sequence lengths of the encoding network and the forecasting network, can be different from each other. Figure 3 shows that the WB-Block broadcasts feature maps of the last time step of the convolutional layer of the encoding network to all the time steps of the corresponding deconvolutional layer of the forecasting network. During the process, as the time step of the forecasting network incrementally increases from t + 1 to t + n, the influence of the feature map at the t-th time step of the encoding network decreases. The weight change that occurs as the time step increases is desirable. The influence of the feature maps at the t-th time step should decrease as the effect of a current event decreases over time. In general, the influence decreases as the time step increases; however, the reduction is not constant across the time steps. Therefore, the model should learn these patterns through the data while adjusting the weights to reflect the natural phenomena represented in the training data. In addition, because weight variables, represented as W = (w t+1 , w t+2 , w t+3 , . . . , w t+n-2 , w t+n-1 , w t+n ) in Figure 3, are shared during training through all convolutional layers, there is little overhead for computation and memory usage. When the time step increases and the effect of weight decreases, the influence of the latest feature map of the encoding network is reduced, and the feature map transmitted from the previous recurrent layer of the forecasting network is used with relatively higher importance to perform prediction. Unlike the existing encodingforecasting model, the number of input channels is doubled in the deconvolutional layer of the forecasting network. Through weighted broadcasting, the feature map is transferred to the forecasting network via skip connection represented as a feature map concatenation in Figure 2. The channel is temporarily doubled at the input of each deconvolutional layer and then reduced to half at the output of the deconvolutional layer. In other words, in the first deconvolutional layer of the forecasting network the input channel is 128 and the output channel is 64, and the input and output channels in the second deconvolutional layer become 64 and 32, respectively. In the third deconvolutional layer, the input and output channels are 32 and 16, respectively.

Experimental Setup
Based on the constructed dataset, experiments using the proposed model and comparison model were repeated 10 times each. The numerical values presented in this section represent the average values of the results of 10 experiments. We used NVIDIA Tesla P100 GPU (×2) and Google TensorFlow 2.1 version for the experiment, and for multi-GPU learning, the "NcclAllReduce" option was used as TensorFlow's MirroredStrategy. It took approximately 8 h per experiment. Adam [31] was used as the optimization algorithm of the neural network, and Adam's learning_rate, beta_1, beta_2, and epsilon were set to 0.0001, 0.9, 0.999, and 1 × 10 −7 , respectively. In addition, "glorot_uniform," "orthogonal," and "zeros" were used for kernel_initializer, recurrent_initializer, and bias_initializer of cells used, respectively. Considering the high-resolution radar data, the batch size was set to 4, and batch normalization was used as a regularization technique. MSE was used as the loss function, and early stopping [32] was adapted to terminate training when no improvement in validation loss was observed within 10 epochs.

Evaluation Metrics
To evaluate the performance of the proposed model, we present a distance-based metric, which is a basic measure of reconstruction evaluation, and a confusion matrix-based metric, for evaluation based on rainfall intensity.
First, the distance-based metrics include the MAE, MSE, and balanced MSE (B-MSE) [21]. B-MSE, adopted from Shi et al. [21], has a feature that applies a greater loss penalty as the rainfall level increases, reflecting the intensity of damage to humans. The distance-based metrics do not classify rainfall by level but they calculate the mean error between the predicted value and ground truth for all cases from light to heavy rainfall. Therefore, these metrics are used to compare the overall performance of the model for all categories of rainfall rather than to compare the performance of the model according to the rainfall levels. The equations used to calculate the metric values are as follows: Atmosphere 2021, 12, 261 In the equations, N, T, I, and J represent the number of test samples, the length of the future time step, the height of the 2D prediction space, and the width of the 2D prediction space, respectively. Table 2, which is adopted from Shi et al. [21], summarizes the weight of the loss penalty according to the level of the rainfall rate when computing the B-MSE. Next, the false alarm rate (FAR), possibility of detection (POD), critical success index (CSI), and the Heidke skill score (HSS) [33] were obtained using the following formulas based on the confusion matrix presented in Table 3. A method of measuring prediction performance according to the level of rainfall rate based on the confusion matrix, which is used in a classification problem, is a common method in the meteorological field [16,18,20,21,24,25,27]. After predicting rainfall through the model, we generate five confusion matrices based on the five thresholds (i.e., the rainfall rate of 0.5, 2.0, 5.0, 10.0, and 30.0 mm/h). We create each confusion matrix by converting the predicted value and ground truth to 1 if they are above the threshold and 0 otherwise. The confusion matrix-based metrics are useful for subdividing and evaluating rainfall by level. In other words, it is possible to determine in detail whether the model's predictive performance is excellent in heavy rainfall or light rainfall.   Table 4 summarizes the results of the distance-based performance metrics obtained from the experiment. The values in Table 4 were calculated using the values between 0 and 1, rescaled through the min-max normalization of 0 mm/h to 110 mm/h. In the distance-based performance metrics, a lower value indicates better performance. Although the MSE of the WB-based encoding-forecasting model is slightly worse (−0.25%), the MAE (+1.74%) and B-MSE (+1.08%) indicate that the proposed model is superior to the existing encoding-forecasting model.

Results
Next, the results of the performance metrics based on the confusion matrix are presented in Tables 5 and 6. In the FAR and POD performance metrics in Table 5, FAR shows the proportion of poorly predicted items among predicted items; hence, the lower the value, the better the performance. POD, on the other hand, represents the ratio of the items that are properly detected among the items to be detected; thus, the higher the value, the better the performance. These two are performance indicators that have a trade-off relationship with each other. As one value improves, the other value tends to deteriorate. As shown in Table 5, the proposed model has excellent detection capability in heavy rainfall.
In the CSI and HSS performance metrics in Table 6, a higher value indicates better performance. Here, CSI is an index that represents comprehensive performance evaluation by expressing the performance of FAR and POD as one metric. For all five categories of rainfall intensity, the proposed model using weighted broadcasting exhibits superior performance. In particular, the proposed model using weighted broadcasting shows more than 30% improved performance in CSI metric than the existing model when the rainfall intensity is 30 mm/h or higher, which is the intensity value when the influence of rainfall on human life becomes substantial, implying that the proposed model can be useful for applications such as flood prediction [34][35][36]. Figure 4 shows the CSI and HSS scores of the two compared models while varying the future time steps. As can be observed in the figure, the proposed model is superior to the existing model especially in the cases of rainfall with an intensity of 5, 10, and 30 mm/h. The comparisons for the other metrics corresponding to the future time steps are presented in Figures A1 and A2 (Appendix A).
As a final step in the quantitative performance evaluation, a Monte Carlo permutation test [37] was performed to assess whether the quantitative performance metrics had significant differences between the two models (see Table A1, Appendix B for details). The Monte Carlo permutation test is a statistical method for testing whether there is a significant difference between two groups even when the underlying distribution is unknown. The Monte Carlo permutation test showed significant differences for 20 metrics out of the 23 metrics examined in this study, indicating that the differences between the two models were predominantly significant. For only MSE, B-MSE, and HSS-30.0, the p-value was higher than 0.05.
Atmosphere 2021, 12, x FOR PEER REVIEW  the future time step. 5 As a final step in the quantitative performance evaluation, a Monte Carlo 6 tion test [37] was performed to assess whether the quantitative performance m 7 significant differences between the two models (see Table A1, Appendix B fo 8 The Monte Carlo permutation test is a statistical method for testing whether 9 Apart from the quantitative evaluation metrics, examples of the actual input frames and prediction frames for qualitative evaluation that can be interpreted by human visual perception are presented in Figures 5 and 6. Figure 5 shows an example of 18 radar observation frames used as inputs, and Figure 6 shows the prediction frames of the two comparison models (the encoding-forecasting model and WB-based encoding-forecasting model) and the actual ground truth frames.   As shown in Figure 6, particularly the predicted frames from time steps t + 3 to t + 338 the existing encoding-forecasting model shows that the shape of the heavy rainfall rate 339 elongated, whereas the WB-based encoding-forecasting model shows that the shape of th 340 heavy rainfall rate is similar to that of the ground truth. In addition, the predicted fram 341 from time steps t + 13 to t + 18 demonstrate that the existing model shows a separation 342 the shape of heavy rainfall rate, whereas the proposed model shows a shape similar 343 that of the ground truth. However, both models show that the results are smoothed to th 344 predictable scales overall. In addition, there is not only loss of variance, but also reductio 345 Figure 6. Output frames corresponding to the future time steps (first row: encoding-forecasting model; second row: WB-based encoding-forecasting model; third row: ground truth).
As shown in Figure 6, particularly the predicted frames from time steps t + 3 to t + 8, the existing encoding-forecasting model shows that the shape of the heavy rainfall rate is elongated, whereas the WB-based encoding-forecasting model shows that the shape of the heavy rainfall rate is similar to that of the ground truth. In addition, the predicted frames from time steps t + 13 to t + 18 demonstrate that the existing model shows a separation of the shape of heavy rainfall rate, whereas the proposed model shows a shape similar to that of the ground truth. However, both models show that the results are smoothed to the predictable scales overall. In addition, there is not only loss of variance, but also reduction of precipitation. Nevertheless, we can summarize that the WB-based encoding-forecasting model shows better performance than that of the existing encoding-forecasting model, following more closely the shape of the ground truth. Figure 7 shows the learned weight values of the WB-based encoding-forecasting model. As mentioned earlier, the model using the WB-Block is designed to learn the weight on its own according to the provided input data. The values of the weights obtained after going through 10 experiments are shown in gray line, and the average weight values are shown in blue line.

66
In this study, we proposed a WB-based encoding-forecasting model that improves As shown in Figure 7, data at time step t, the most recent rainfall pattern, has a stronger effect at the beginning of the future prediction; however, its effect gradually decreases as it moves to the later time steps. This pattern indicates that when information delivered from the convolutional layers of the encoding network is used in the corresponding deconvolutional layers of the forecasting network, it has different weights and influences according to the future time step. In the later part of the time step where the weight value is decreased, the feature map transferred from the recurrent layer of the forecasting network is used more heavily. The values of the weights were not manually set by humans; rather, they were learned by the model as it found an appropriate value according to the characteristics of the data.

Conclusions
In this study, we proposed a WB-based encoding-forecasting model that improves the performance of nowcasting by applying weighted broadcasting, which emphasizes the influence of the latest feature map of the observed data in the encoding-forecasting model. Through experiments, this study verified that the nowcasting approach based on the proposed model exhibits superior performance in many aspects compared with the existing encoding-forecasting model. The findings clearly indicate that applying the weighted broadcasting method that explicitly places more emphasis on the latest information in the convolutional layer, in addition to the pattern analysis over time implicitly performed in the recurrent layer, to the encoding-forecasting model improves nowcasting performance. The reason weighted broadcasting can improve the performance of nowcasting is that it obtains and uses important information that could not be analyzed through the recurrent layers alone. This combinatory approach proves that, when predicting the sequence of natural phenomena, the phenomenon of the last data transmitted through weighted broadcasting is crucial.
However, the limitations that can occur in this prediction schema are as follows. When making a prediction in a section in which the rainfall radar pattern changes in an instant, the past and future patterns may be completely different. In that case, the effect of weighted broadcasting may be limited. However, even in such a case, the model can still be adequate because the weight variable is adjusted according to the data during training, which automatically reduces the influence of information from the convolutional layers of the encoding network and increases the influence of the information from the recurrent layers of the forecasting network. At least, it does not limit the performance of the existing encoding-forecasting model. Therefore, weighted broadcasting can be used in combination with the encoding-forecasting model at any time. To calculate the weights of weighted broadcasting, only additional trainable variables equal to the length of the future time steps are needed.
In this study, when supplementing information missing from the recurrent layer of the existing encoding-forecasting model through weighted broadcasting of the convolutional layer, only the last feature map on the past time step of the convolutional layer was used. However, because useful information can be found not only in the last feature map but also in the previous feature map even if the probability is small, it may be further improved by applying the self-attention technique [38] to the feature map to extract and utilize selectively the most important feature map. We plan to explore this idea in our future research.  Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: https://data.kma.go.kr/data/rmt/rmtList.do?code=11&pgmNo=62 (accessed on 15 February 2021).

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A Model Comparison According to the Future Time
Step Figure A1 shows the model comparisons for distance-based metrics, such as MAE, MSE, and B-MSE, across the future time steps. As shown in Figure A1, the WB-based encoding-forecasting model is slightly worse than the existing encoding-forecasting model, in terms of MSE. However, the WB-based encoding-forecasting model is better than the existing model in terms of MAE and B-MSE. In addition, when we compare the models in terms of the MAE, which indicates the simple difference of the error, it can be observed that the WB-based encoding-forecasting model gradually performs better as the future time step increases. Figure A2 shows the model comparisons for the confusion matrix-based metrics, such as FAR and POD, across the future time steps. FAR and POD are performance indicators that have a trade-off relationship with each other. As shown in Figure A2, the existing model has better detection ability in light rainfall intensity such as 0.5 and 2 mm/h; however, the WB-based encoding-forecasting model has better detection ability in heavy rainfall intensity such as 5, 10, and 30 mm/h. Figure A1 shows the model comparisons for distance-based metrics, such a MSE, and B-MSE, across the future time steps. As shown in Figure A1, the WB-b coding-forecasting model is slightly worse than the existing encoding-forecasting in terms of MSE. However, the WB-based encoding-forecasting model is better existing model in terms of MAE and B-MSE. In addition, when we compare the m terms of the MAE, which indicates the simple difference of the error, it can be o that the WB-based encoding-forecasting model gradually performs better as th time step increases.