Runoff Prediction Based on the Discharge of Pump Stations in an Urban Stream Using a Modiﬁed Multi-Layer Perceptron Combined with Meta-Heuristic Optimization

: Runoff in urban streams is the most important factor inﬂuencing urban inundation. It also affects inundation in other areas as various urban streams and rivers are connected. Current runoff predictions obtained using a multi-layer perceptron (MLP) exhibit limited accuracy. In this study, the runoff of urban streams was predicted by applying an MLP using a harmony search (MLPHS) to overcome the shortcomings of MLPs using existing optimizers and compared with the observed runoff and the runoff predicted by an MLP using a real-coded genetic algorithm (RCGA). Furthermore, the results of the MLPHS were compared with the results of the MLP with existing optimizers such as the stochastic gradient descent, adaptive gradient, and root mean squared propagation. The runoff of urban steams was predicted based on the discharge of each pump station and rainfall information. The results obtained with the MLPHS exhibited the smallest error of 39.804 m 3 /s when compared to the peak value of the observed runoff. The MLPHS gave more accurate runoff prediction results than the MLP using the RCGA and that using existing optimizers. The accurate prediction of the runoff in an urban stream using an MLPHS based on the discharge of each pump station is possible.


Introduction
In Korea, most of the annual precipitation is concentrated during the rainy season. In urban areas, the water level of urban streams increases rapidly during the rainy season, creating a flood risk. Such urban areas have constructed pump stations to prevent backwater effects from streams and to quickly drain water during rainy seasons. Runoff from pump stations affects stream water level and inundation in the watershed. For urban streams, the flow rate is higher in the lower stream, which is close to residential and commercial areas, than in the upper stream, which is located in mountainous areas. A high flow rate is associated with an increased risk of damage due to flooding.
Deep learning, including multi-layer perceptron, has advanced in various ways since its initial development and has been used in various fields. Some important deep-learning techniques are given below.
• Rosenblatt (1958) introduced the concept of a perceptron as a probabilistic model of an artificial neural network (ANN) [1]. • Kohonen (1982) suggested the self-organizing map (SOM) in a neural structure using a topologically ordered transformation of arbitrary dimensional signals to one/two dimensional discrete maps [2]. • Hopfield (1982) applied the information storage algorithm to neural networks based on synapses of a general type proposed by Hepp [3,4]. • Pearl (1985) developed the Bayesian network model with self-activated memory [5]. • LeCun et al. (1998) developed a gradient-based learning technique called a convolutional neural network (CNN) for document recognition [6].
to model multi-step-ahead flood forecasting [42]. Rainfall-runoff modeling in the Clear Creek and Upper Wapsipinicon River of Iowa in the U.S. was conducted using LSTM and a sequence-to-sequence model [43]. A decomposition ensemble model with LSTM and variational mode decomposition was suggested for streamflow forecasting [44]. An LSTM model with a Gaussian process was applied to forecast the streamflow in the upper Yangtze River, China [45]. Several different deep learning models have been proposed. Streamflow and rainfall forecasting were conducted using convolutional LSTM and wavelet LSTM [46]. A featureenhanced regression model with a combined stack autoencoder and LSTM was proposed to forecast the short-term streamflow of the Yangtze River in China [47]. RNN trained with back propagation through time, and LSTM were applied to forecast the water level in the Trinity River of Texas, U.S. [48]. Monthly rainfall forecasting with an RNN and LSTM was conducted using monthly average precipitation data [49]. Heavy rain damage was predicted by applying a DNN, a CNN, and an RNN [50].
Studies using methods other than those mentioned above have also been conducted. Probabilistic flood forecasting for a real-time framework using an Elman neural network with different lead times was applied to the Huai River in eastern China [51]. A waveletbased ANN, support vector regression, and deep belief networks were used to forecast multi-step-ahead streamflow in the U.K. [52]. The urban flood depth under various rainfall conditions was predicted using the gradient boosting decision tree and data warehouse [53]. The SVM, KNN, and naïve Bayes classifiers (trained by weather parameters) were used to predict floods [12].
Studies combining deep learning techniques with meta-heuristic optimization have been proposed to forecast rainfall, rainfall-runoff, and water depth. An ANN with a structure of multi-layer perceptron (MLP) using a real-coded genetic algorithm (RCGA) was developed to overcome the limitations of the rainfall-runoff model using an existing ANN [54]. The evolved neural network, a kind of MLP using RCGA was suggested to forecast daily rainfall-runoff [55]. The optimized scenario generated by genetic algorithm (GA) combined with ANN (MLP) was used to forecast rainfall [56]. A hybrid ANN model (MLP) and genetic algorithm using precipitation and stage data were developed [57].
Various deep learning techniques have been applied to rainfall, rainfall-runoff, and water level prediction, but the effects of various hydraulic structures, such as pump stations in urban areas, were not considered. Additionally, the harmony search (HS) has been applied to improve the existing optimizers of deep learning techniques, including the MLP. The MLP was suggested to solve the XOR problem, which could not be solved with a single-layer perceptron (Minsky M. L. and Papert S. A. 1969. Perceptrons. Cambridge, MA: MIT Press). The HS is a meta-heuristic optimization technique motivated by harmony [58]. The HS was applied to various studies including computer science, mathematics, and civil engineering [59]. In hydrology, hydraulics and water resources problems, including the optimal design of water supply network, showed good results under HS [60]. Additionally, the guidance was provided to produce useful and relevant results by reusing data and model [61]. The application of HS was classified as data mining, medical system, agriculture, scheduling, power engineering, image processing, communication system, water resource management, astronomy, health care, and manufacturing/design [62]. The HS was compared with various meta-heuristic optimization algorithm such as evolutionary algorithm (EA), simulated annealing (SA), ant colony optimization (ACO), particle swarm optimization (PSO), and firefly algorithm (FA) [63].
The data used to predict flood and water levels in urban areas, such as the water level and runoff, are limited. To overcome this limitation, in this study, the runoff of urban streams was predicted based on the discharge of pump stations in urban streams. The peak value of the data for each year is different, which adversely affects learning. In order to overcome this shortcoming, normalized data that had been applied in various studies and showed good performance were used as an MLP optimizer for combining MLP and HS. A multi-layer perceptron combined with the HS (MLPHS) was used to predict the runoff of urban streams. The results predicted using multi-layer perceptron models combined with the HS were compared with the results obtained using a multi-layer perceptron coupled with existing optimizers and the RCGA.

Overview
The study was divided into three steps: 1.
The observed rainfall data, the discharge of each pump station, and the runoff of the urban stream were collected. 2.
The structures of MLPHS were generated to predict the runoff of urban streams. 3.
The MLPHS results were compared with the results of the MLP with existing optimizers and the RCGA. Figure 1 illustrates the steps followed in this study to predict the runoff of urban stream.
to overcome this shortcoming, normalized data that had been applied in various studies and showed good performance were used as an MLP optimizer for combining MLP and HS. A multi-layer perceptron combined with the HS (MLPHS) was used to predict the runoff of urban streams. The results predicted using multi-layer perceptron models combined with the HS were compared with the results obtained using a multi-layer perceptron coupled with existing optimizers and the RCGA.

Overview
The study was divided into three steps: 1. The observed rainfall data, the discharge of each pump station, and the runoff of the urban stream were collected.
2. The structures of MLPHS were generated to predict the runoff of urban streams.
3. The MLPHS results were compared with the results of the MLP with existing optimizers and the RCGA. Figure 1 illustrates the steps followed in this study to predict the runoff of urban stream.

Study Area
The Han River, which penetrates Seoul (the capital of South Korea), has various tributaries such as the Anyang and Hongje streams, and the Anyang Stream has tributaries such as the Mokgam and Dorim streams. In the Dorim stream, this study focused on two tributaries (Daebang and Bongchun streams) and 11 pump stations (Mullae, Dorim2, Daerim2, Daerim3, Guro1, Guro2, Guro3, Guro4, Sinlim1, Sinlim2, and Sinlim5 pump stations). The discharged stream of the 11 pump stations is the Dorim Stream. The Daebang and Bongchun streams had no pump stations. The Dorim stream starts at Seoul National University, located on Gwanak Mountain, the upstream point, and continues to the downstream point of the Anyang Stream junction. The monitoring point for observing the runoff of the Dorim Stream is located downstream. In addition, an observatory for rainfall observations was located in the middle of the watershed. The drainage area (A), length (L), and shape coefficient (A/L 2 ) of the Dorim stream are 41.93 km 2 , 14.2 km, and 0.21, respectively. The watershed slope of the Dorim Stream ranges from 0.007 to 0.8598. Figure  2 shows information about the study area.

Study Area
The Han River, which penetrates Seoul (the capital of South Korea), has various tributaries such as the Anyang and Hongje streams, and the Anyang Stream has tributaries such as the Mokgam and Dorim streams. In the Dorim stream, this study focused on two tributaries (Daebang and Bongchun streams) and 11 pump stations (Mullae, Dorim2, Daerim2, Daerim3, Guro1, Guro2, Guro3, Guro4, Sinlim1, Sinlim2, and Sinlim5 pump stations). The discharged stream of the 11 pump stations is the Dorim Stream. The Daebang and Bongchun streams had no pump stations. The Dorim stream starts at Seoul National University, located on Gwanak Mountain, the upstream point, and continues to the downstream point of the Anyang Stream junction. The monitoring point for observing the runoff of the Dorim Stream is located downstream. In addition, an observatory for rainfall observations was located in the middle of the watershed. The drainage area (A), length (L), and shape coefficient (A/L 2 ) of the Dorim stream are 41.93 km 2 , 14.2 km, and 0.21, respectively. The watershed slope of the Dorim Stream ranges from 0.007 to 0.8598. Figure 2 shows information about the study area. The drainage area of the 11 pump stations in the Dorim stream is 8.84 km 2 , which is approximately 21% of the total drainage area of the Dorim stream. The length of the Daebang stream is 7.16 km, and the length of the Bongchun stream is 5.00 km. The Gwanak detention reservoir is located upstream of the Dorim stream and has a capacity of 65,000 m 3 . The drainage area of the 11 pump stations varies from 0.19 km 2 to 2.49 km 2 . The basin of the Daerim3 pump station, which has the largest drainage area, has a Daerim detention reservoir for flood reduction. The capacity of the Daerim detention reservoir is 2447 m 3 . Table 1 lists the drainage areas of each pump station. In Table 1, the Daerim3 pump station had the largest drainage area of 2.49 km 2 , whereas the Daerim2 pump station had the smallest drainage area of 0.19 km 2 . Pump stations are drained naturally through a sluice gate or drained through drainage pumps. A detention reservoir was constructed in the drainage area of the Daerim3 pump station, which has the largest drainage area. The drainage area of the 11 pump stations in the Dorim stream is 8.84 km 2 , which is approximately 21% of the total drainage area of the Dorim stream. The length of the Daebang stream is 7.16 km, and the length of the Bongchun stream is 5.00 km. The Gwanak detention reservoir is located upstream of the Dorim stream and has a capacity of 65,000 m 3 . The drainage area of the 11 pump stations varies from 0.19 km 2 to 2.49 km 2 . The basin of the Daerim3 pump station, which has the largest drainage area, has a Daerim detention reservoir for flood reduction. The capacity of the Daerim detention reservoir is 2447 m 3 . Table 1 lists the drainage areas of each pump station. In Table 1, the Daerim3 pump station had the largest drainage area of 2.49 km 2 , whereas the Daerim2 pump station had the smallest drainage area of 0.19 km 2 . Pump stations are drained naturally through a sluice gate or drained through drainage pumps. A detention reservoir was constructed in the drainage area of the Daerim3 pump station, which has the largest drainage area.

Preparation of Data for the MLPHS
Prior to being applied to the MLPHS, data from 2010,2011,2012,2013,2014,2016,2018, and 2019, when rainfall occurred most recently in the study area, were prepared. The interval of the observed runoff in the Dorim stream was 10 min. During the rest of the period, it is difficult to construct appropriate learning data as the pumps at each pump station did not operate. If the data of each pump station when the pumps were not operated is used as the learning data, the learning may not work properly. Since the data for each year are different, it is difficult to guarantee accurate prediction results when the raw data is used at it is. To compensate such shortcomings using raw data, in this study, the prediction accuracy was improved by using normalization. Figure 3 shows the observed rainfall and runoff data from 2010 to 2019.

Preparation of Data for the MLPHS
Prior to being applied to the MLPHS, data from 2010,2011,2012,2013,2014,2016,2018, and 2019, when rainfall occurred most recently in the study area, were prepared. The interval of the observed runoff in the Dorim stream was 10 min. During the rest of the period, it is difficult to construct appropriate learning data as the pumps at each pump station did not operate. If the data of each pump station when the pumps were not operated is used as the learning data, the learning may not work properly. Since the data for each year are different, it is difficult to guarantee accurate prediction results when the raw data is used at it is. To compensate such shortcomings using raw data, in this study, the prediction accuracy was improved by using normalization. Figure 3 shows the observed rainfall and runoff data from 2010 to 2019. Data from 2010 to 2018 were used as the learning dataset for the MLP, and data from 2019 were used for making predictions. In addition to these data, records of rainfall received in the study area were used; however, data from pump stations that were not operated due to the low water level of the stream were excluded.
For efficient learning of the MLP, normalization was conducted to represent each data set as a value between 0 and 1. The normalization is shown in Equation (1). Data from 2010 to 2018 were used as the learning dataset for the MLP, and data from 2019 were used for making predictions. In addition to these data, records of rainfall received in the study area were used; however, data from pump stations that were not operated due to the low water level of the stream were excluded.
For efficient learning of the MLP, normalization was conducted to represent each data set as a value between 0 and 1. The normalization is shown in Equation (1).
where y i is the normalized value, and x i is the real value. x max is the maximum real value, and x min is the minimum real value. To compare the results of each MLP, the square value-based root mean square error (RMSE) and absolute value-based mean absolute error (MAE) and R-squared (R 2 ) were applied. The RMSE, MAE, and R 2 equations are given by Equation (A1), Equation (A2), and Equation (A3), respectively in Appendix A.

Multi-Layer Perceptron Combined with the Harmony Search
Machine learning techniques, including multi-layer perceptrons, are used to predict or classify data. For example, a computer cannot distinguish a dog from a cat using only a picture; however, humans can easily distinguish between dogs and cats. For this purpose, a machine learning method was developed. With machine learning, a large amount of data is inputted into a computer and then used to classify similar data. When a picture similar to a stored picture of a dog is input, the computer classifies the picture as a dog. Many machine learning algorithms have been developed to classify data. Representative examples include decision trees, Bayesian networks, SVMs, and ANNs. Among them, deep learning methods are the descendants of ANNs.
A perceptron was proposed as a type of ANN [1]. A single-layer perceptron can predict linearly separable things; however, the calculation of XOR is not possible. A multilayer perceptron has been proposed to overcome the limitation of a single-layer perceptron. As it can predict non-linearly separable things, it can be used to calculate XOR. Linear separation is possible with a single neuron in a single-layer perceptron. In a multi-layer perceptron, the number of neurons is higher than that in a single-layer perceptron, and layers are added to create a complex decision boundary structure.
In the MLP, the optimizer is a tool applied to the gradient descent using differentiation to consider the optimization of correlations between neurons. As the use of ANNs, including multi-layer perceptrons, has become more widespread, the importance of optimizers that can efficiently find optimal correlations has increased. The optimizers commonly used in ANNs are the gradient descent (GD), stochastic gradient descent (SGD), adaptive gradient (Adagrad), root mean squared propagation (RMSprop), adaptive delta (AdaDelta), adaptive moment (Adam), and Nesterov-accelerated adaptive moment (Nadam). Existing optimizers, including the GD, SGD, Adagrad, RMSprop, AdaDelta, Adam, and Nadam depend entirely on the shape of the error surface and the weights and biases that are initially created. There is a possibility of convergence to the local optimum as new weights and biases are generated using the numerical derivative. To overcome the shortcomings of the existing optimizers, the MLP was improved by applying a meta-heuristic optimization algorithm that can consider global and local searches.
The HS is a meta-heuristic optimization algorithm motivated by the improvisation of musicians. The parameters of the HS are the harmony memory size (HMS), harmony memory considering rate (HMCR), pitch adjustment rate (PAR), and bandwidth (BW). The HMS is the size of the HM. The number of initially randomly generated solutions was determined according to the HMS. The HMCR is the probability of choosing a decision variable within the HM. The process used to create a new decision variable by randomly combining the decision variables in the current HM is similar to the operation process of the GA. However, the GA always creates a new decision variable from two decision variables. The HS, on the other hand, can create a new decision variable from two or more decision variables as well as the HMS. Additionally, the GA cannot consider decision variables independently as it should maintain the genetic structure; however, the HS can select each decision variable independently when generating a new solution. The PAR is the probability of selecting a new decision variable adjacent to the previously selected decision variable through the HMCR while considering the BW. To compare the results of MLP using the existing optimizers, MLP using the RCGA and MLPHS, the structure of MLP was set to five hidden layers, and ten nodes in each hidden layer were set. Figure 4 shows the calculation flow of the MLPHS used in this study.

Comparison of Results
Data from 2010 to 2018 were used to train the MLP model. Data from 2019 were used to predict the runoff and rainfall information. A rectified linear unit (ReLU) was used as the activation function in the MLP using existing optimizers, MLP using RCGA, and MLPHS. Each MLP learned 100,000 epochs of data from 2010 to 2018 and then predicted the results of 2019. The results obtained using each MLP were then compared. Among the parameters of the RCGA, the population was set to 50, and the mutation rate was set to 0.15. HMS, HMCR, PAR, and BW in the HS were set to 50, 0.85, 0.4, and 0.0003, respectively. In the GA and HS, the searching range of weights and biases was set from -15 to 15. Table 2 shows the results obtained with MLP using existing optimizers, MLP using the RCGA, and MLPHS. Among the existing optimizers, the AdaDelta was the best for the RMSE and RMSprop was the best for the MAE, and Nadam showed the highest R 2 . However, in all the results including MLP using the RCGA and MLPHS, RMSE and MAE of MLPHS were the lowest, and MLPHS showed the highest value in R 2 . The results of the AdaDelta and RMSprop, which gave good results among the applied optimizers, for each time period were compared with those obtained with the MLPHS. Figure 5 shows a comparison of the results obtained with the AdaDelta, RMSprop, and HS (MLPHS).

Comparison of Results
Data from 2010 to 2018 were used to train the MLP model. Data from 2019 were used to predict the runoff and rainfall information. A rectified linear unit (ReLU) was used as the activation function in the MLP using existing optimizers, MLP using RCGA, and MLPHS. Each MLP learned 100,000 epochs of data from 2010 to 2018 and then predicted the results of 2019. The results obtained using each MLP were then compared. Among the parameters of the RCGA, the population was set to 50, and the mutation rate was set to 0.15. HMS, HMCR, PAR, and BW in the HS were set to 50, 0.85, 0.4, and 0.0003, respectively. In the GA and HS, the searching range of weights and biases was set from −15 to 15. Table 2 shows the results obtained with MLP using existing optimizers, MLP using the RCGA, and MLPHS. Among the existing optimizers, the AdaDelta was the best for the RMSE and RMSprop was the best for the MAE, and Nadam showed the highest R 2 . However, in all the results including MLP using the RCGA and MLPHS, RMSE and MAE of MLPHS were the lowest, and MLPHS showed the highest value in R 2 . The results of the AdaDelta and RMSprop, which gave good results among the applied optimizers, for each time period were compared with those obtained with the MLPHS. Figure 5 shows a comparison of the results obtained with the AdaDelta, RMSprop, and HS (MLPHS).
Additionally, notable results occurred at the peak value of the observed runoff. RM-Sprop showed a difference of 104.139 m 3 /s from the peak value of the observed runoff, and AdaDelta showed a difference of 140.856 m 3 /s from the peak value of the observed runoff. On the other hand, the MLPHS showed a difference of 39.804 m 3 /s from the peak value of the observed runoff. The MLPHS showed better results than the existing optimizers in terms of both overall and peak results. The results of the MLP using the RCGA were compared with those of the MLPHS. Figure 6 shows a comparison of the results obtained with the MLP using the RCGA and the HS (MLPHS). Additionally, notable results occurred at the peak value of the observed runoff. RMSprop showed a difference of 104.139 m 3 /s from the peak value of the observed runoff, and AdaDelta showed a difference of 140.856 m 3 /s from the peak value of the observed runoff. On the other hand, the MLPHS showed a difference of 39.804 m 3 /s from the peak value of the observed runoff. The MLPHS showed better results than the existing optimizers in terms of both overall and peak results. The results of the MLP using the RCGA were compared with those of the MLPHS. Figure 6 shows a comparison of the results obtained with the MLP using the RCGA and the HS (MLPHS). The MLP using the RCGA showed a difference of 26.398 m 3 /s from the peak value of the observed runoff, and the MLPHS showed a difference of 39.804 m 3 /s from the peak value of the observed runoff. At the peak value of the observed runoff, the MLP using the RCGA showed better results than the MLPHS. However, in the observed runoff near the peak value, the MLP using the RCGA showed a larger error than the MLPHS. Considering the overall results, the MLPHS provided the most accurate prediction among all the applied optimizers. Figure 7 shows the R 2 for the observed and predicted data of MLP using  Additionally, notable results occurred at the peak value of the observed runoff. RMSprop showed a difference of 104.139 m 3 /s from the peak value of the observed runoff, and AdaDelta showed a difference of 140.856 m 3 /s from the peak value of the observed runoff. On the other hand, the MLPHS showed a difference of 39.804 m 3 /s from the peak value of the observed runoff. The MLPHS showed better results than the existing optimizers in terms of both overall and peak results. The results of the MLP using the RCGA were compared with those of the MLPHS. Figure 6 shows a comparison of the results obtained with the MLP using the RCGA and the HS (MLPHS). The MLP using the RCGA showed a difference of 26.398 m 3 /s from the peak value of the observed runoff, and the MLPHS showed a difference of 39.804 m 3 /s from the peak value of the observed runoff. At the peak value of the observed runoff, the MLP using the RCGA showed better results than the MLPHS. However, in the observed runoff near the peak value, the MLP using the RCGA showed a larger error than the MLPHS. Considering the overall results, the MLPHS provided the most accurate prediction among all the applied optimizers. Figure 7 shows the R 2 for the observed and predicted data of MLP using the existing optimizers, MLP using RCGA, and MLPHS. The MLP using the RCGA showed a difference of 26.398 m 3 /s from the peak value of the observed runoff, and the MLPHS showed a difference of 39.804 m 3 /s from the peak value of the observed runoff. At the peak value of the observed runoff, the MLP using the RCGA showed better results than the MLPHS. However, in the observed runoff near the peak value, the MLP using the RCGA showed a larger error than the MLPHS. Considering the overall results, the MLPHS provided the most accurate prediction among all the applied optimizers. Figure 7 shows the R 2 for the observed and predicted data of MLP using the existing optimizers, MLP using RCGA, and MLPHS.

Discussion
In general, in most studies, the results of the new study were compared with those of previous studies. However, since there are no results obtained in previous studies, the learning time required for MLP using the existing optimizers, MLP using RCGA, and MLPHS, was compared. To compare the required time, only data from 2010 was used, and the epochs were set to 1000. In addition, the learning of each MLP was repeated 50 times to calculate the average value.
The time required according to the number of nodes was simulated by setting the hidden layer to one and the number of nodes from one to five. Additionally, the time required according to the number of hidden layers was simulated with three nodes and one to five hidden layers. Table 3 showed the average time required for each optimizer according to the number of nodes.

Discussion
In general, in most studies, the results of the new study were compared with those of previous studies. However, since there are no results obtained in previous studies, the learning time required for MLP using the existing optimizers, MLP using RCGA, and MLPHS, was compared. To compare the required time, only data from 2010 was used, and the epochs were set to 1000. In addition, the learning of each MLP was repeated 50 times to calculate the average value.
The time required according to the number of nodes was simulated by setting the hidden layer to one and the number of nodes from one to five. Additionally, the time required according to the number of hidden layers was simulated with three nodes and one to five hidden layers. Table 3 showed the average time required for each optimizer according to the number of nodes.  Figure 8 shows the comparison of the average time required for each optimizer according to the number of nodes.    According to the results of Table 3 and Figure 8, MLP using the RCGA showed the shortest average time required for the number of nodes from one to five, and MLPHS showed the second shortest average time. Table 4 shows the average time required for each optimizer according to the number of hidden layers.  Figure 9 shows the comparison of the average time required for each optimizer according to the number of hidden layers. According to the results of Table 3 and Figure 8, MLP using the RCGA showed the shortest average time required for the number of nodes from one to five, and MLPHS showed the second shortest average time. Table 4 shows the average time required for each optimizer according to the number of hidden layers.  Figure 9 shows the comparison of the average time required for each optimizer according to the number of hidden layers. Water 2022, 14, x FOR PEER REVIEW 12 of 16 Figure 9. Comparison of average time required for each optimizer according to the number of hidden layers.
According to the results of Table 4 and Figure 9, MLP using the RCGA showed the shortest average time required for the number of nodes from one to five, and MLPHS showed the second shortest average time. These results were similar to the average time required for each optimizer according to the number of nodes. The existing optimizer finds a new solution by updating the weight and bias of the existing solution position using differentiation. However, meta-heuristic optimization algorithms such as RCGA and HS find a new solution through a combination of existing solutions, fine-tuning, or random selection in the search range. The reason the amount of time required for MLP using the meta-heuristic optimization algorithm was shorter than that of MLP using the existing optimizers is due to the difference in the search method for a new solution. Additionally, the average time required was longest when the number of hidden layers was five and the number of nodes was three. Therefore, the average time required is long when the calculation process is long due to the large number of weights and biases.

Conclusions
The MLP using existing optimizers, MLP using RCGA, and MLPHS, were applied to predict urban stream runoff, and discharge from each pump station was used as the learning data. The data from 2010 to 2018 were used as the learning data, and the predicted results for 2019 obtained with the MLPHS were more accurate than the results obtained with the MLP using existing optimizers and MLP using RCGA. Among the results of the MLP using existing optimizers, those obtained with the MLP using AdaDelta exhibited the smallest RMSE, and those obtained with the MLP using RMSprop exhibited the smallest MAE. In addition, MLP using Nadam among MLP using the existing optimizers showed the highest R 2 .
The results of the study revealed that the discharge of pump stations in urban areas can be used to predict the runoff of urban streams. Pump stations have been constructed in urban areas to prevent urban flooding. The results revealed that the discharge of pump stations directly affects the amount of runoff from urban streams as pump stations discharge water from the inland to urban streams.
The results of the study revealed that the MLPHS can be applied to reduce the error between the observed and simulated runoff. The optimizer of the MLP is a very important component of the learning process. As the optimizer greatly influences the weights and biases in the MLP, studying how the optimizer can be improved is very important. In this study, learning was conducted by replacing the optimizer used in the MLP with the HS, a well-known metaheuristic optimization algorithm. According to the results of Table 4 and Figure 9, MLP using the RCGA showed the shortest average time required for the number of nodes from one to five, and MLPHS showed the second shortest average time. These results were similar to the average time required for each optimizer according to the number of nodes. The existing optimizer finds a new solution by updating the weight and bias of the existing solution position using differentiation. However, meta-heuristic optimization algorithms such as RCGA and HS find a new solution through a combination of existing solutions, fine-tuning, or random selection in the search range. The reason the amount of time required for MLP using the meta-heuristic optimization algorithm was shorter than that of MLP using the existing optimizers is due to the difference in the search method for a new solution. Additionally, the average time required was longest when the number of hidden layers was five and the number of nodes was three. Therefore, the average time required is long when the calculation process is long due to the large number of weights and biases.

Conclusions
The MLP using existing optimizers, MLP using RCGA, and MLPHS, were applied to predict urban stream runoff, and discharge from each pump station was used as the learning data. The data from 2010 to 2018 were used as the learning data, and the predicted results for 2019 obtained with the MLPHS were more accurate than the results obtained with the MLP using existing optimizers and MLP using RCGA. Among the results of the MLP using existing optimizers, those obtained with the MLP using AdaDelta exhibited the smallest RMSE, and those obtained with the MLP using RMSprop exhibited the smallest MAE. In addition, MLP using Nadam among MLP using the existing optimizers showed the highest R 2 .
The results of the study revealed that the discharge of pump stations in urban areas can be used to predict the runoff of urban streams. Pump stations have been constructed in urban areas to prevent urban flooding. The results revealed that the discharge of pump stations directly affects the amount of runoff from urban streams as pump stations discharge water from the inland to urban streams.
The results of the study revealed that the MLPHS can be applied to reduce the error between the observed and simulated runoff. The optimizer of the MLP is a very important component of the learning process. As the optimizer greatly influences the weights and biases in the MLP, studying how the optimizer can be improved is very important. In this study, learning was conducted by replacing the optimizer used in the MLP with the HS, a well-known metaheuristic optimization algorithm.
A limitation of this study is that the observation data consisted of only the runoff from each pump station and rainfall. In future studies, more accurate runoff predictions would be possible if the water level or runoff observation data at the upstream point were added. Additionally, it is possible to increase usability by applying a self-adaptive meta-heuristic optimization algorithm. Learning using the MLP can be conducted if an automatic optimization of the MLP structure can be performed.