Improving Radar-Based Rainfall Forecasts by Long Short-Term Memory Network in Urban Basins

Radar-based rainfall forecasts are widely used extrapolation algorithms that are popular in systems of precipitation for predicting up to six hours in lead time. Nevertheless, the reliability of rainfall forecasts gradually declines for heavy rain events with lead time due to the lack of predictability. Recently, data-driven approaches were commonly implemented in hydrological problems. In this research, the data-driven models were developed based on the data obtained from a radar forecasting system named McGill Algorithm for Precipitation nowcasting by Lagrangian Extrapolation (MAPLE) and ground rain gauges. The data included thirteen urban stations in the five metropolitan cities located in South Korea. The twenty-five data points of MAPLE surrounding each rain station were utilized as the model input, and the observed rainfall at the corresponding gauges were used as the model output. The results showed superior capabilities of long short-term memory (LSTM) network in improving 180-min rainfall forecasts at the stations based on a comparison of five different data-driven models, including multiple linear regression (MLR), multivariate adaptive regression splines (MARS), multi-layer perceptron (MLP), basic recurrent neural network (RNN), and LSTM. Although the model still produced an underestimation of extreme rainfall values at some examined stations, this study proved that the LSTM could provide reliable performance. This model can be an optional method for improving rainfall forecasts at the stations for urban basins.


Introduction
Precipitation forecasts are a primary driver of flood forecasting, water management, and hydrologic modeling studies in urban areas. Quantitative precipitation forecasts (QPFs) with high spatial-temporal resolution and accuracy with several hours of lead time can provide rainfall forecasts, which are valuable in hydrological predictions of urban flood practices. Therefore, the success of hydrological predictions is highly dependent on the effectiveness of forecasted rainfall, which is still a challenge for the forecasting systems. QPF and its post-process methodology, however, mostly depend on the type of storm, location, and atmospheric model setup [1].
The radar-based rainfall forecasts used in the extrapolation approach have been widely applied in operational systems, including the McGill Algorithm for Precipitation nowcasting by Lagrangian Extrapolation (MAPLE; [2,3]) and Auto-Nowcast System (ANC; [4]). The efficient lead time of the predictions rests upon the meteorological circumstances and the approach used to evaluate the forecasts. In cases of large-scale rain field systems, the method of extrapolation is relatively useful [5], whereas in the heavy rainfall events and small-scale systems, the forecast quality is gradually decreased with increased lead time because of the rapid development and dissipation of the rain field [5]. These limitations on predictability by radar-based rainfall forecasts are caused by a combination of errors in estimating the advection field and the field growth during the predictive period [3]. system. Therefore, it is necessary to determine how the LSTM and other ANN models work at rain gauges (at the point scale).
This work aims to investigate the performance of the LSTM for improving rainfall forecasts of a radar-based system at urban rain gauges in short-duration heavy rainfall events. The comparison between LSTM with other data-driven methods, such as the basic RNN, MLP, MARS, and MLR, is examined in this study. Thus, a time-series data of heavy rain events will be analyzed between a radar-based forecasting system and the observed rainfall of ground gauges. The radar-based forecasting system used in this study is MAPLE, which is applied in Korea. Our main goal of this paper is to investigate the performance of different data-driven methods for post-processing; hence, this work does not interfere with the radar-based forecasting system itself. Notably, due to the training approach without considering the movement, growth and decay of the rain field, the proposed method could not reproduce the rain fields in the domain. However, improving the rainfall forecast at point scale is significant in hydrological applications and water resources management. The following section briefly introduces the study area and the procedure of data analysis. Section 3 presents the methods used in this work and data preparation for training. Sections 4 and 5 provide the results and discussion, and summary of the research, respectively. Notably, Table 1 provides explanations of some main acronyms used in this article.

Study Area
South Korea is situated in the middle latitudes of the Northern Hemisphere and in the subtropical zone of the North Pacific. Geographically, its location is on the east coast of the Eurasian Continent. Hence, the characteristics of the Korean climate are highly complex due to the influences of both continental and oceanic aspects. In the summer season, the Korean climate is influenced by the East Asia monsoon with strong precipitation events. In southern Korea, the range of annual precipitation is from 1000 to 1800 mm, while the annual range of the remaining regions is from 1100 to 1400 mm. Most of the annual precipitation comes in the summer season across the Korean Peninsula.
The metropolitan cities in Korea are highly urbanized with a large density of commercial, industrial, and residential areas. Figure 1 presents the location of five urban areas in South Korea, rain gauges, and MAPLE data points. The rain gauges used in this study belong to the Automatic Weather System (AWS) that is controlled by the Korean Meteorological Administration (KMA). In Figure 1, the rain gauges are denoted by red triangles and MAPLE data points are denoted by black dots. In this study, the cities such as Seoul, Busan, Gwangju, Daejeon, and Daegu are selected as the study areas (Figure 1a). In the cities, urban floods occur fast during a few hours in the strong rain events. The events usually appear from June to September. The rain gauges located in the urban areas are chosen to examine the proposed models.
belong to the Automatic Weather System (AWS) that is controlled by the Korean Meteorological Administration (KMA). In Figure 1, the rain gauges are denoted by red triangles and MAPLE data points are denoted by black dots. In this study, the cities such as Seoul, Busan, Gwangju, Daejeon, and Daegu are selected as the study areas (Figure 1a). In the cities, urban floods occur fast during a few hours in the strong rain events. The events usually appear from June to September. The rain gauges located in the urban areas are chosen to examine the proposed models.

Data Collection
Measured rainfall from 13 arbitrarily selected gauges of the AWS named 408, 414, and 422 (in Jeonnong, Seoul); 160, 904, and 938 (in Busan city); 722 and 788 (in Gwangju city); 642, 643, and 648 (in Daejeon city); 845 and 846 (in Daegu city) was obtained from the KMA website for training and testing. All AWS stations are recorded with a temporal

Data Collection
Measured rainfall from 13 arbitrarily selected gauges of the AWS named 408, 414, and 422 (in Jeonnong, Seoul); 160, 904, and 938 (in Busan city); 722 and 788 (in Gwangju city); 642, 643, and 648 (in Daejeon city); 845 and 846 (in Daegu city) was obtained from the KMA website for training and testing. All AWS stations are recorded with a temporal resolution of 1 min. A selection of heavy rainfall events pertinent to the period from 2016 to 2020 was chosen and analyzed. All the rainfall events have a short duration (less than 12 h). The thresholds regarding maximum 10 min rainfall intensity (≥ 5 mm/10 min) and 6 h accumulated rainfall (≥ 45 mm) were applied to define a heavy rainfall event [37,38]. Consequently, twenty-seven heavy rainfall events that normally occur from June to September were chosen. Based on these selected events, twenty-two events were chosen for calibration and validation in the training session and five events were used for the test stage. Table 2 illustrates the selected events used in this work. Simultaneously, forecasted data of the MAPLE system were also collected and processed to provide sequential inputs for the models. Note: The temporal resolution of the rain events is 10 min. The time unit is denoted by "hour:min" in the "Duration" column.
In this research, the radar-based forecast data from the MAPLE system operated by The Han River Flood Control Office (HRFCO) was used. The system applied in the dual-polarization radar network that generates 360 min QPFs every 10 min with a temporal resolution of 10 min. The domain of the system is 1050 × 1050 pixels with a spatial resolution of 0.5 km. Figure 1b displays the data points of the MAPLE system that overlap with Seoul city. In this work, the 180 min QPFs were deployed to develop the data-driven models due to the high uncertainty of the MAPLE data with 360 min forecast time and a short period of the examined rain events. The QPFs of twenty-five data points surrounding each AWS station were utilized as the input of the selected data-driven models. For the training session, the total considered time step of the QPFs of the data points was 60 min from the beginning of each rainfall event. Thus, the considered number of time steps of QPF was six for a selected event. The number of time steps of the MAPLE QPF can be varied depending on the period of each rainfall event and experimental results.
While the radar-based forecast data have a temporal resolution of 10 min, the measured rainfall at the gauges has a temporal resolution of 1 min. Therefore, the measured rainfall was accumulated to obtain 10 min temporal resolution of rainfall after using the inverse distance weight (IDW) method [39] to correct the missing rainfall data. The observed 10 min rainfall data were utilized herein as the output of machine learning methods for the training stage. To ensure the correspondence between the input and output of the models, the forecast rainfall of the data points and observed 10 min rainfall should have the same time order. Figure 2 presents the method of preparing the QPF data as the input and observed rainfall as the correct value for the data-driven models in the training stage. Notably, all the surrounding spatial-temporal data points at each AWS station were considered as an input matrix for the models. The detailed process regarding training datasets and model implementation is presented in Section 3.5. Note: The temporal resolution of the rain events is 10 min. The time unit is denoted by "hour:min" in the "Duration" column.

Figure 2.
Data preparation for the training stage between the MAPLE quantitative precipitation forecasts (QPFs) and observed rainfall at each station in an event. The considered period presents the period in which QPFs can be extracted.

Multivariate Adaptive Regression Splines (MARS)
As a nonlinear and non-parametric approach, MARS was first introduced for predicting a continuous dependent variable [40], and it recognizes nonlinear patterns of interest embedded in the dense datasets. MARS uses a set of basic functions and coefficients that is driven by data regression to construct the relationship between the variables. This process can be achieved by conducting a forward-backward procedure [41].
In the forward stepwise phase, the model added a considerable number of the basic functions that eventually leads to overfitting of the data. The functions of the MARS algorithm rely on a segment (spline) function that divides the set of data into different linear splines. At fixed points (known as knots), spline functions are joined. The MARS model can generally be defined as follows: where ( ) = { 1 , … , } denotes the twenty-five predictor variables corresponding to twenty-five spatial grids in a 10 min time step, 0 is the bias, ( ) denotes the nth basis functions, represents the coefficients of the ( ) that use the least square approach for estimating the values, and N denotes the total number of functions and coefficients of the last model generated in the forward-backward procedure. Unnecessary basic functions are removed in the backward step by using the generalized cross-validation method

Multivariate Adaptive Regression Splines (MARS)
As a nonlinear and non-parametric approach, MARS was first introduced for predicting a continuous dependent variable [40], and it recognizes nonlinear patterns of interest embedded in the dense datasets. MARS uses a set of basic functions and coefficients that is driven by data regression to construct the relationship between the variables. This process can be achieved by conducting a forward-backward procedure [41].
In the forward stepwise phase, the model added a considerable number of the basic functions that eventually leads to overfitting of the data. The functions of the MARS algorithm rely on a segment (spline) function that divides the set of data into different linear splines. At fixed points (known as knots), spline functions are joined. The MARS model can generally be defined as follows: where X(t) = {X t1 , . . . , X tn } denotes the twenty-five predictor variables corresponding to twenty-five spatial grids in a 10 min time step, c 0 is the bias, B n (x) denotes the nth basis functions, c n represents the coefficients of the B n (x) that use the least square approach for estimating the values, and N denotes the total number of functions and coefficients of the last model generated in the forward-backward procedure. Unnecessary basic functions are removed in the backward step by using the generalized cross-validation method to enhance the quality of the forecast; thus, the final number of functions (N) are defined. The detailed information about the MARS algorithm can be referred to in the articles [21,22,40,41].

Multi-Layer Perceptron (MLP)
The MLP model is the most popular type of ANNs, and it normally consists of one hidden layer, one input layer, and one output layer [32]. The structure of a classical threelayered MLP is shown in Figure 3a. In the MLP network, the disparate layers are linked by weights and bias, and some neurons are contributing to each layer. In Figure 3a, the rectangles are designated as neurons and the lines between rectangles are designated as weights. Firstly, the MLP calculates the weighted sum of the inputs that is expressed by the following equation.
where z represents the weighted sum which represents fed into the neuron, w i represents the weights, X i represents the input included the twenty-five predictor variables, and b represents the bias. The MLP must be set for nonlinear activation functions. Herein, the common sigmoid for any variable s is adopted as the hidden transfer function and output layer transfer function that is defined as follows: partial derivative of the cost concerning the input weights and the partial derivative of the cost with respect to the hidden weights of the previous state. As seen in Figure 3b, the dash-dot line denotes the gradient computation of the BPTT algorithm. Firstly, BPTT calculates the partial derivative of the output at time step t + 1 considering the state of the hidden neuron at time step t + 1 ( +1 ℎ +1 in Figure 3b). Afterward, it calculates the partial derivative of the state of hidden neuron at time step t+1 concerning the state at preceding step ( ℎ +1 ℎ in Figure 3b). The partial derivative of the error E considering a weight then sums up the contributions at every time step and can be expressed as follows: where +1 is the predicted value from the neuron. The BP process continues to the former neurons step by step in the same manner. The error of partial derivative increases gradually with the time step when conducting gradient estimation in the BPTT method. Subsequently, the gradient over many time steps of the network can be very small or can be very large [30]. The phenomenon is called the vanishing/exploding gradient problem. Therefore, determining how to learn and tune the hyper-parameters of the hidden layer for capturing long sequential data may be highly complex and could lead to overtraining times. The backpropagation (BP) technique is popularly utilized for training the MLP model. The network is trained to optimize the cost function: where F represents the function cost, f (z) represents the predicted value from a neuron, and f (z) true represents the observed value as the desired output. The chain rule of differentiation is used in BP for estimating the partial derivative of the error of the corresponded weights. The change rate of the cost F concerning a weight w i is formulated as follows: The partial derivative of the activation function can be represented by the following formulation.
The partial derivative of the cost function is represented by Equation (7): The training of the MLP network means the weights are iteratively adjusted by using the BP algorithm to calculate errors between the predicted and true value.

Basic RNN and Long-Term Dependencies Problem
RNNs have a superiority for sequential information in data. RNNs define the present state by the input X t and the output state of the preceding time step h t−1 [30,42]. The state formulations for the time steps t and t + 1 are presented as follows: where σ is the activation function, h t−1 , h t , and h t+1 are the states of hidden neurons for the time steps t − 1, t, and t + 1, respectively; w h+1 and w h denote the weights of hidden neurons; w i+1 and w i denote the weights of input layer and hidden layer; b denotes the bias.
Notably, each input X t = {X t1 , . . . , X tn } at time step t includes the twenty-five predictor variables at every time step that are the same as the above-mentioned models. From Equations (8) and (9), the difference between RNN and MLP is that the RNN obviously considers the state of previous time step, whilst the MLP has independent output and input at every time step. For training RNNs, the backpropagation through time (BPTT) method is implemented as an innovative version of BP algorithm in which the chain rule of differentiation is used for calculating the partial derivative of the error of the corresponded weights [32]. Figure 3b shows the scheme of the BPTT algorithm and the structure of the RNN that includes one hidden layer, an input layer, and an output layer. BPTT determines both the partial derivative of the cost concerning the input weights and the partial derivative of the cost with respect to the hidden weights of the previous state. As seen in Figure 3b, the dash-dot line denotes the gradient computation of the BPTT algorithm. Firstly, BPTT calculates the partial derivative of the output at time step t + 1 considering the state of the hidden neuron at time step t + 1 ( Figure 3b). Afterward, it calculates the partial derivative of the state of hidden neuron at time step t + 1 concerning the state at preceding step ( ∂h t+1 ∂h t in Figure 3b). The partial derivative of the error E considering a weight w i then sums up the contributions at every time step and can be expressed as follows: where Y t+1 is the predicted value from the neuron. The BP process continues to the former neurons step by step in the same manner. The error of partial derivative increases gradually with the time step when conducting gradient estimation in the BPTT method. Subsequently, the gradient over many time steps of the network can be very small or can be very large [30]. The phenomenon is called the vanishing/exploding gradient problem. Therefore, determining how to learn and tune the hyper-parameters of the hidden layer for capturing long sequential data may be highly complex and could lead to overtraining times.

LSTM Network
The LSTM-cell can solve the vanishing/exploding problem of the BPTT algorithm when training a long-term sequence [30,43]. Unlike the simple RNN cell, each LSTM cell typically has four layers composed of the main layer called the gate of input modulation (g) and three gate controllers for optionally letting information through by an activation function and a pointwise multiplication operation: input gate (i), forget gate (f), and output gate (o). Therefore, LSTMs are able to carry this information for long-term dependencies. LSTMs can drop some unimportant memory or add some new content, learn the long-term state, and then filter the result. Figure 4 shows the structure of the LSTM unit in the RNN hidden layer. The symbol c denotes the memory cell. The LSTM algorithm can be shown by the below equations: where = { 1 , … , } represents the input vector that includes twenty-five variables represented the corresponded to data points surrounding the stations at time step ; and denote parameters for bias and weights, respectively; ° is the element wise multiplication of two vectors; ℎ( ) = − − + − is the hyperbolic tangent function; ( ) = 1 1+ − is the sigmoid function. In Equations (11)-(16), ℎ can be considered as the short-term state and can be considered the long-term state. In brief, LSTMs can control exploding information largely because sigmoid activation is used as the gating function for 3 gates, which present outputs between 0 and 1, which does not allow for the flow of information (with a value of Cell state: Output vector: where X t = {X t1 , . . . , X tn } represents the input vector that includes twenty-five variables represented the corresponded to data points surrounding the stations at time step t; b and w denote parameters for bias and weights, respectively; • is the element wise multiplication of two vectors; tanh(z) = e z −e −z e z +e −z is the hyperbolic tangent function; σ s (z) = 1 1+e −z is the sigmoid function.
In Equations (11)- (16), h t can be considered as the short-term state and c t can be considered the long-term state. In brief, LSTMs can control exploding information largely because sigmoid activation is used as the gating function for 3 gates, which present outputs between 0 and 1, which does not allow for the flow of information (with a value of 0) or allows for the complete flow of information (with a value of 1). To avoid the dissipating gradient issue, the tanh function is used to for maintaining the second derivative over a prolonged extent before approaching zero. Therefore, with these types of systems, a value very far from the past may be valid in the LSTM predictions. Mathematically, the LSTM recurrent networks first use the previous hidden state for gaining the fixed-dimensional proxy of the sequential input to estimate the conditional probability, and then define the probability of its corresponding output sequence with the initial state, which is set to the representation of the input sequence [44]. These properties ensured that the LSTM was the dominant model.

Training Datasets and Model Implementation
For the 27 selected events (Table 2), 22 events were chosen for training and validating sets and the other 5 events were used for testing sets. The training sets were arranged from the 1st event to 22nd event as continuous time series data. The considered period of the MAPLE QPFs was approximately 60 min at the beginning of each event (as shown in Figure 2). The considered period of an event can be changed to smaller than 60 min because of the short period of the strong rainfall events. In this study, we separately developed forecast time models corresponding to 60 min, 120 min, and 180 min lead time. Therefore, the QPF data used in the 60 min, 120 min, and 180 min models had a 60 min lead time, 120 min lead time, and 180 min lead time, respectively. The input matrices of the different forecast time models include 25 columns for the number of QPF data points that are overlaid with each AWS station. Simultaneously, the vector output includes one column representing the observed rainfall at the corresponding ground stations and was provided in the same time order as the input (Equation (17)). The vector output was also issued separately according to the different forecast time models. Notably, the data used in five algorithms (MLR, MARS, MLP, basic RNN, and LSTM) are similar in terms of data structure and temporal scale. The input and output for each lead-time model can be explained as follows: where X is forecasted rainfall of the MAPLE; Y is observed rainfall at the corresponding gauge; e is the e-th event; qe is the number of selected QPFs in an event; lt is the selected lead times (60 min, 120 min, and 180 min), for example, with 60 min lead time there are 6 rainfall values corresponded to six 10 min time steps; n is number of variables, in this study, n = 25 variables that corresponded to 25 data points surrounding a rain gauge at each time step of each QPF. For testing stage, the two time steps of QPF issued from the beginning of each test event were chosen to examine the performance of the AI models and regression-based models because of the short period and zero forecasts in the last time steps of the events. Like the process of training data preparation, these two time steps of QPF were prepared for testing the 60 min, 120 min, and 180 min models. Notably, the time period was fixed at the beginning of the events when implementing the forecast time models to each time step of QPF. In other words, the 120 min model was extended by an additional 60 min compared to the 60 min model, and so on for the 180 min model. The 180 min corrected rainfall results were combined from the test results of three different forecast time models. Figure 5 shows how to combine three models to obtain final 180 min corrected rainfall forecasts. The rainfall of the first 60 min lead time, the rainfall after 60 min to 120 min lead time, and the rainfall after 120 min to 180 min were based on the 60 min model, the 120 min model, and the 180 min model, respectively. The method of combining different lead-time models was applied to all the proposed models in this study.  In this study, the ANN models were developed based on version 2.3.0 of open-source Tensor-Flow [45] that is a machine learning program published by Google. Besides, the other packages of Python programs, namely Numpy [46], sklearn [47] and Matplotlib [48], were applied. The hyper-parameters were used to calibrate the model, for example, the neural number of each hidden layer and batch size. The number of iterations can be determined from the training loss and validation loss performance.

Model Performance Parameters
The performance of the proposed models is evaluated by indicators, including the critical success index (CSI), probability of detection (POD), percent error in maximum In this study, the ANN models were developed based on version 2.3.0 of open-source Tensor-Flow [45] that is a machine learning program published by Google. Besides, the other packages of Python programs, namely Numpy [46], sklearn [47] and Matplotlib [48], were applied. The hyper-parameters were used to calibrate the model, for example, the neural number of each hidden layer and batch size. The number of iterations can be determined from the training loss and validation loss performance.

Model Performance Parameters
The performance of the proposed models is evaluated by indicators, including the critical success index (CSI), probability of detection (POD), percent error in maximum rainfall (PEMR), root mean square error (RMSE), correlation coefficient (R) and relative forecast bias (FB) as shown in the following equations.
CSI and POD are used as quantitative evaluation parameters to represent the success of forecasts for hit rates. Below equations interpret the calculation of CSI and POD. If the CSI or POD is equal to 1, then the quality of the forecast is excellent.
where A, B, and C denote the hit, miss, and false alarm number for a examined threshold, respectively.
The RMSE is calculated as follows: where Y f or i is the i-th forecasted data; Y obs i is the i-th observed data; n is the number of data. RMSE values equal zero indicate the best fit between forecasted and observed data.
The R value is calculated as follows: The FB is utilized for assessing the quality of the total forecasted rainfall as follows: If the FB is around 1, then the forecast is considered highly accurate. FB values from 0 to 1 indicate an underestimation in prediction, and values larger than 1 show an overestimation.
PEMR is used to examine the capability of the models in maximum rainfall prediction (Equation (23)). In the equation, Y max f or and Y max obs are the maximum rainfall values of the forecasted data and observed data, respectively.

Training of ANN Models
Because the main objective of this study is to improve rainfall forecasts from radarbased forecasting systems for urban areas under heavy rainfall events, different forecast time models of the data-driven methods were developed. The goals of the models are to reproduce rainfall forecasts with a 180 min lead time based on observed rainfall and 25 data points around rain gauges from the MAPLE system. After arranging the input and output data, the trial-and-error approach was applied for tuning the hyper-parameters, such as the neural number of each hidden layer, learning rate, batch size and the iteration number. The dataset was divided into 80% for training and 20% for validation.
A regularization method is essential for the proper performance of the ANN models to prevent overfitting. In this work, early stopping was utilized as a regularization method that is utilized to stop the training process when the validation error reaches a minimum. The optimization algorithm used in the models was the ADAM technique [49]. The learning rate used for the models was 0.005. The RMSE was chosen for measuring the error of model prediction in the training stage. The optimum structure of the LSTM model and basic RNN model had one hidden layer with 10, 16, and 25 hidden neurons corresponding to the 60 min, 120 min, and 180 min RNN models, respectively. The best structure of the MLP was one hidden layer with 10, 15, and 20 hidden neurons for the 60 min, 120 min, and 180 min MLP models. The number of basic functions in MARS was 71, 20, and 5 for the 60 min, 120 min, and 180 min MARS models.       Table 3 summarizes the comparative results for R values of MAPLE and the LSTM model at the stations in terms of the first 120 min forecast times. The R values were calculated by comparing MAPLE or LSTM to the observations. In general, the performances of the LSTM model at the stations in terms of correlation coefficient were much better than the MAPLE forecast. The correlation coefficients of the LSTM forecasts were much better than those of MAPLE forecasts within the first 60 min lead time. However, at stations 648, 414 (20190731 event), and 160, the R values of LSTM were not better than those of MAPLE in 20 min lead time. It can be seen from the Table 3 that the MAPLE system generated the rainfall forecasts with low, even negative, correlation values after 20 min lead time at the stations 422, 788, 904, and 938. These are caused by the high uncertainty of the MAPLE in terms of modeling the growth, deterioration, and movement of the rain field and when issuing the forecasts in the heavy rainfall events. Therefore, some rainfall forecasts can be uncorrelated and non-linear with varied observations during the lead time.   (Figure 8d), all the models showed substantial overestimations at the first 30 min and after the 60 min forecast time, however, the performance of the LSTM model was better than that of the correcting models with smaller overestimation. This problem was similar to the first 30 min lead time of the 20180826 event (Figure 8b). In general, the performance of the LSTM regarding FB was more stable and better relative to the other models.

Performance of Data-Driven Models
MLP models generated good PEMR values at some stations in the events, such as 3.8% (station 414 in 20190731 event) and −8.5% (station 160 in 20190806 event) for MLP, 6.8% and −8.5% (station 160 in 20190806 event) for basic RNN. The original MAPLE forecasts showed significant underestimation in many stations and high overestimation in some stations. This drawback of MAPLE is caused by the rapid growth or decay of rain cells in heavy rain events. Hence, the MAPLE may issue the predictions with much higher or much lower rainfall than the actual rainfall level.   Table 4 summarizes the comparative results for the percent error in maximum rainfall over the stations in the five test events. The LSTM outperformed other models in terms of PEMR at most stations in the test events. While the other correcting models mainly provided the large underestimations of PEMR in the events, the LSTM reproduced the good predictions with a range of −26.0~16.3% of PEMR at most stations. This PEMR improvement of LSTM is considerable for applications of predicting heavy rainfall events and urban flood warning practices. However, some stations had a considerable underestimation of maximum rainfall, namely 414 (−40.5% in the 20170723 event), and 422 (−39.4% and 40.4% in the 20170723 event and 20190731 event, respectively). The basic RNN and MLP models generated good PEMR values at some stations in the events, such as 3.8% (station 414 in 20190731 event) and −8.5% (station 160 in 20190806 event) for MLP, 6.8% and −8.5% (station 160 in 20190806 event) for basic RNN. The original MAPLE forecasts showed significant underestimation in many stations and high overestimation in some stations. This drawback of MAPLE is caused by the rapid growth or decay of rain cells in heavy rain events. Hence, the MAPLE may issue the predictions with much higher or much lower rainfall than the actual rainfall level. To further investigate the performance of the models, the CSI and POD parameters were estimated by two high thresholds: 3 mm/10 min, and 5 mm/10 min. Figure 9 shows comparisons between MAPLE and the models regarding the CSI with the two thresholds. These comparisons were conducted based on an aggregation of the five test events and eleven test stations over the 180 min forecast time. In detail, the CSI values were estimated from hit, miss, and false alarm along 180 min lead time of predictions at a rain gauge. Then, taking the average values of all rain gauges and events during the 180 min lead time. Concerning the threshold of 3 mm/10 min (Figure 9a), while the CSI values of MAPLE varied from 0.45 to 0.28 in the 180 min lead time, the LSTM, basic RNN, MLP, and MARS showed notable enhancements. However, the LSTM outperformed the other models, with CSI values ranging from 0.88 to 0.71, which are around 2.0~2.5 times higher than the MAPLE's values along the lead time. The other correcting models had similar performances with their values varying from around 0.89 to 0.53 in the 180 min lead time. For the threshold at 5 mm/10 min (Figure 9b), the CSI values of the LSTM were considerably decreased to the range (0.74, 0.58), while the values of the other models were around 2 times lower than those of the LSTM. Noticeably, the CSI values of basic RNN and MLP ranging from 0.48 to 0.34 were slightly higher than those of MARS and MLR in range (0.32, 0.25). The CSI values of MAPLE were very low, which varied around 0.1. Figure 10 presents the comparative results between MAPLE and the other models in terms of the POD with the two thresholds. For the two thresholds, the performances of the models in terms of the POD were consistent with that of the CSI, with the LSTM outperforming the other models and showing a substantial increase in the POD values at the 5 mm/10 min threshold. However, the POD values of the LSTM and other correcting models at the 3 mm/10 min threshold were similar to each other. Notably, Figures 9 and 10 show that the ANN models are better than the regression-based models in terms of the CSI and POD indicators at the higher threshold. From Figures 9 and 10, the CSI and POD values of MAPLE and the correcting models decreased with the lead time and were substantially reduced when considering at the higher threshold (5 mm/10 min) compared to the lower threshold (3 mm/10 min). While the original MAPLE, MLR, MARS, MLP, and basic RNN performed a rapid decrease from the 3 mm/10 min to 5 mm/10 min thresh-olds, the LSTM provided a moderate reduction in terms of CSI and POD in the 180 min lead time.   Figure 11 shows the scatter plots of LSTM and MLR with MAPLE in the first 60 min lead time and rainfall patterns of MAPLE and LSTM-corrected rainfall forecast in the test events. As can be seen, while the MAPLE showed a bad correlation with the observations, the LSTM-corrected rainfall forecast showed a fairly good correlation with the observations (Figure 11a). The relationship between LSTM and the observations is moderately strong and positive. Most points of LSTM are fairly close to the best-fit line even at high rainfall values. The MLR did not improve significantly in correlation with the observations. The points of MLR at small rainfall and high rainfall were quite far from the best-fit line (Figure 11b). Figure 11c-g represent rainfall time series from MAPLE, LSTM, and observations at some stations. From these figures, the LSTM-corrected rainfall forecasts for the stations in the test events generally performed better than MAPLE, especially in the first 1.5 h of the forecast time. The MAPLE rainfall time series often showed a large underestimation at high rainfall observations. The LSTM rainfall time series was similar to the rainfall pattern of the observations, especially at times of high rain in the 20180826 event ( Figure 11d) and 20180904 event (Figure 11e).
The above results prove that the accuracy of the LSTM-corrected rainfall forecasts is more stable and better than that of the basic RNN, MLP, MARS, and MLR for the 180 min lead time. Thanks to the innovations of the memory cell for managing short-term states and long-term states, the LSTM could provide acceptable performance in terms of correcting rainfall forecasts. The performance was presented by lower RMSE values, better correlation coefficients at high thresholds, higher CSI and POD values, and more reasonable PEMR and rainfall patterns than other models in the test events. In addition, this model   Figure 11 shows the scatter plots of LSTM and MLR with MAPLE in the first 60 min lead time and rainfall patterns of MAPLE and LSTM-corrected rainfall forecast in the test events. As can be seen, while the MAPLE showed a bad correlation with the observations, the LSTM-corrected rainfall forecast showed a fairly good correlation with the observations (Figure 11a). The relationship between LSTM and the observations is moderately strong and positive. Most points of LSTM are fairly close to the best-fit line even at high rainfall values. The MLR did not improve significantly in correlation with the observations. The points of MLR at small rainfall and high rainfall were quite far from the best-fit line (Figure 11b). Figure 11c-g represent rainfall time series from MAPLE, LSTM, and observations at some stations. From these figures, the LSTM-corrected rainfall forecasts for the stations in the test events generally performed better than MAPLE, especially in the first 1.5 h of the forecast time. The MAPLE rainfall time series often showed a large underestimation at high rainfall observations. The LSTM rainfall time series was similar to the rainfall pattern of the observations, especially at times of high rain in the 20180826 event ( Figure 11d) and 20180904 event (Figure 11e).
The above results prove that the accuracy of the LSTM-corrected rainfall forecasts is more stable and better than that of the basic RNN, MLP, MARS, and MLR for the 180 min lead time. Thanks to the innovations of the memory cell for managing short-term states and long-term states, the LSTM could provide acceptable performance in terms of correcting rainfall forecasts. The performance was presented by lower RMSE values, better correlation coefficients at high thresholds, higher CSI and POD values, and more reasonable PEMR and rainfall patterns than other models in the test events. In addition, this model  Figure 11 shows the scatter plots of LSTM and MLR with MAPLE in the first 60 min lead time and rainfall patterns of MAPLE and LSTM-corrected rainfall forecast in the test events. As can be seen, while the MAPLE showed a bad correlation with the observations, the LSTM-corrected rainfall forecast showed a fairly good correlation with the observations (Figure 11a). The relationship between LSTM and the observations is moderately strong and positive. Most points of LSTM are fairly close to the best-fit line even at high rainfall values. The MLR did not improve significantly in correlation with the observations. The points of MLR at small rainfall and high rainfall were quite far from the best-fit line (Figure 11b).

Conclusions
To improve hydrological predictions for urban catchments, it is essential to correct the short-term radar-based rainfall forecasts in advance. Such corrections could enable better forecasts and higher efficiencies in urban hydrology. Through examining AWS rain stations of urban areas in Korea, the current work compared the performance of five datadriven models named LSTM, basic RNN, MLP, MARS, and MLR, for a 180 min correction The above results prove that the accuracy of the LSTM-corrected rainfall forecasts is more stable and better than that of the basic RNN, MLP, MARS, and MLR for the 180 min lead time. Thanks to the innovations of the memory cell for managing shortterm states and long-term states, the LSTM could provide acceptable performance in terms of correcting rainfall forecasts. The performance was presented by lower RMSE values, better correlation coefficients at high thresholds, higher CSI and POD values, and more reasonable PEMR and rainfall patterns than other models in the test events. In addition, this model takes advantage of training long sequence data; hence, this could lead to good predictions. Notably, the model performances were dependent on the MAPLE forecast data. Specifically, the slight overestimation phenomenon in the events, 20180826 ( Figure 8b) and 20190731 (Figure 8d) occurred when MAPLE rainfall forecasts were near no bias zone or overestimation. Additionally, the behavior of RMSE values of the models was quite similar to the RMSE trends of MAPLE ( Figure 6). The quick increases in RMSE in 60 min lead time and decreases in CSI and POD values at a threshold of 5 mm/10 min from MAPLE and the correcting models can be explained by the limitations of the MAPLE system. The radar-based extrapolation algorithms do not consider the life cycle of rain cells, especially in convective rain fields. Therefore, the algorithm often fails to forecast rain fields after 60 min or after a few tens of minutes of lead time.
As a result of the uncertainties regarding radar-based systems, the LSTM does not always perform better than the other models, such as degraded RMSE values at stations 414 and 422 in the 20170723 event (Figure 7a), and degraded R values at stations 648, 414 (20180904 event), and 160 (Table 3). Additionally, for the 20190731 event at stations 414 and 422, the PEMR values of LSTM indicated lower maximum rainfall predictions than other models (Table 4). Based on the POD results, although the values were slightly higher compared with the other models when considering the threshold of 5 mm/10 min (Figure 10b), the performance of the LSTM model was not superior to the other correcting models at the smaller threshold ( Figure 10a). Moreover, there were several stations, such as 414 (in the 20170723 event) and 422 (in the events: 20170723 and 20190731) that had notable underestimations in forecasting maximum rainfall (Table 4). In this study, the models were trained with data of twenty-five data points surrounding rain gauges; therefore, this is a limitation for generating the rain field. However, it might be overcome by utilizing convolutional neural networks with the full images of overall domain and radar data.

Conclusions
To improve hydrological predictions for urban catchments, it is essential to correct the short-term radar-based rainfall forecasts in advance. Such corrections could enable better forecasts and higher efficiencies in urban hydrology. Through examining AWS rain stations of urban areas in Korea, the current work compared the performance of five data-driven models named LSTM, basic RNN, MLP, MARS, and MLR, for a 180 min correction of MAPLE rainfall forecasts with 10 min time steps during heavy rainfall events. Data from twenty-seven rainfall events, including observed rainfall and QPFs from MAPLE around eleven ground rain gauges in the urban areas, were used to train and evaluate the selected models. The discussion and results concluded the following: (1) The models are compared with MAPLE using the RMSE, FB, R, PEMR, CSI, and POD criteria. All of the models were able to improve the rainfall forecasts to a certain extent. (2) The four models named basic RNN, MLP, MARS, and MLP showed similar corrections for the test events in terms of RMSE and FB performances. However, the basic RNN and MLP can provide better performance in terms of CSI and POD value, which showed substantially higher accuracy for high rainfall predictions. (3) Because of the gating structures of the neurons, the LSTM outperformed the basic RNN, MLP, MARS, and MLR, especially for predicting high rainfall values, reducing RMSE, and improving forecast bias. The LSTM could reproduce the rainfall forecasts with sufficient accuracy within 60 min forecast time at the stations. This advanced AI technique, therefore, has high practicability for improving rainfall forecasts of the radar-based system. The LSTM model can be considered an optional approach in real practice.
In addition to the practicability of the LSTM, some limitations of the model were discussed: (i) the model accuracy in terms of CSI and POD decreased with increases in the threshold, although its accuracy was better than that of the other models. The RMSE, R, and PEMR performances were not better than the MAPLE in several stations. (ii) Additionally, the model performance was dependent on the MAPLE data. Therefore, the proposed LSTM algorithm is completely applicable to heavy rainfall events with a short duration (less than 12 h) in urban rain areas and might not be appropriate for other types of rainfall events. (iii) The LSTM algorithm is trained only with the sequential data points, thus it does not consider the rain field movement and growth, which may lead to the limitations of rain field reproduction.
It is necessary to investigate the forecasting capability and the uncertainties that exist in the present LSTM method. In addition, an assessment of the proposed method's suitability for the rainfall types is also noted since it does not always work well with all events. Extending the data together with rain field consideration should be considered in the next step of this study for these investigations and assessments. The potential ability of deep learning is very interesting in modern research, especially when implementing this method for hydrometeorology and water resources. The LSTM can be implemented as the hybrid method that combines one-hour high-resolution numerical weather prediction models and a high-resolution radar-based system. Moreover, using the convolutional LSTM and other types of convolutional neural network for 2D radar data post-processing should be considered in future studies.