Physical Hybrid Neural Network Model to Forecast Typhoon Floods

This study proposed a hybrid neural network model that combines a self-organizing map (SOM) and back-propagation neural networks (BPNNs) to model the rainfall-runoff process in a physically interpretable manner and to accurately forecast typhoon floods. The SOM and a two-stage clustering scheme were applied to group hydrologic data into four clusters, each of which represented a meaningful hydrologic component of the rainfall-runoff process. BPNNs were constructed for each cluster to achieve high forecasting capability. The physical hybrid neural network model was used to forecast typhoon flood discharges in Wu River in Taiwan by using two types of rainfall data. The clustering results demonstrated that the rainfall-runoff process was favorably described by the sequence of derived clusters. The flood forecasting results indicated that the proposed hybrid neural network model has good forecasting capability, and the performance of the models using the two types of rainfall data is similar. In addition, the derived lagged inputs are hydrologically meaningful, and the number and activation function of the hidden nodes can be rationally interpreted. This study also developed a traditional, single BPNN model trained using the whole calibration data for comparison with the hybrid neural network model. The proposed physical hybrid neural network model outperformed the traditional neural network model in forecasting the peak discharges and low flows.

Although ANN-based models exhibit high forecasting performance, they are regarded as a black box and their physical interpretation is unattainable.For instance, Zhang et al. [26] and Lange [27] noted that the ANN has no explicit form for analyzing the relationship between inputs and outputs and that explaining the results obtained by the networks is difficult.Some studies have attempted to interpret the physical meaning of the derived network structure and results of ANN models.Jain et al. [28] demonstrated that the hidden neurons in the ANN rainfall-runoff model approximate various components of the hydrologic process, such as infiltration, base flow, and surface flow.Chen and Yu [12] and Chen [29] demonstrated that the input data mined to construct the hidden nodes of an SVM network are informative hydrologic data that characterize a flood hydrograph, particularly the data around the peak flood and in the rising limb.
Separate ANN models trained using different input-output data sets have been proposed to improve forecasting performance.For example, Furundzic [30] used the self-organizing map (SOM) to decompose input-output data into three sets and develop separate ANNs for each data set.Abrahart and See [31], Hsu et al. [32], and Jain and Srinivasulu [33] also applied the SOM to partition data into different clusters corresponding to the different segments of the hydrograph and developed separate ANN models for each cluster.They concluded that the performance of the separate ANNs is better than that of a single ANN trained using the whole dataset.The partitioning of data is based on the fact that different magnitudes of hydrologic data are produced by different physical processes.A separate ANN can more closely model a specific dataset corresponding to a hydrologic component.However, most studies that applied data partitioning focused on the decomposition method and performance improvement.Efforts in physically partitioning data and interpreting the hydrologic process have been limited.
The present study, which also partitioned data into clusters and constructed separate ANNs, focused on the derivation of physically interpretable clusters.The SOM was applied to group hydrologic data into four clusters.The sequence of these clusters physically represents the rainfall-runoff process of a storm event according to the quantity of the rainfall and discharge data.A two-stage clustering scheme was used to obtain the expected meaningful clusters.A back-propagation neural network (BPNN) was employed to construct the forecasting model for each cluster to forecast flood discharge.The proposed hybrid neural network model that combines the SOM and BPNN characterizes the rainfall-runoff process in a physically interpretable manner.The physical hybrid neural network model was used to forecast typhoon flood discharges in Wu River in Taiwan.Two types of forecasting model were constructed with respect to two sets of rainfall data (the basin average rainfall and rainfall from different rain gauges).The clustering results prove that the proposed clustering scheme captures the behavior of the rainfall-runoff process and properly divides the hydrologic process into different components.Flood forecasting results reveal that both types of forecasting models have favorable forecasting capability with high coefficient of efficiency values and low mean absolute errors.In addition, the proposed hybrid neural network model was compared with a single traditional neural network model that was constructed using the whole dataset.The following section introduces the proposed physical hybrid neural network model including the methodologies of SOM and BPNN.Section 3 provides information of the study area and typhoon flood data.Section 4 presents the model development process and the flood forecasting performed by the hybrid neural network model.A comparison of the proposed model to the traditional BPNN model is presented as well.The last section outlines the conclusions of this study.

Rainfall-Runoff Clusters Based on the Hydrologic Process
The rainfall-runoff process can be divided into several temporal steps corresponding to respective hydrologic phenomena.This study grouped rainfall-runoff data into clusters to represent different components of the rainfall-runoff process.An example of a typical rainfall event is shown in Figure 1a.Such an event generally begins with low rainfall (R), followed by intense, heavy rainfall, and finally ends with sprinkling rainfall.A storm hydrograph recorded during a storm rainfall event is presented in Figure 1b.At the beginning of the event, the discharge (Q) rises slowly and the discharge increment (∆Q) is small.As the high intensity rainfall continues, the discharge increases rapidly to the peak discharge during the rising limb of the hydrograph.In the major part of the rising limb, the discharge increment is large.After cessation of the intense rainfall, the discharge declines sharply, but the discharge increment remains large.In the lower part of the recession limb, the discharge decreases slowly to the base flow and the discharge increment is small.
On the basis of the typical rainfall-runoff event illustrated in Figure 1, the rainfall-runoff process can be divided into several steps, as shown in Figure 2. At the beginning of a rainfall event, the rainfall is generally low and does not significantly contribute to the runoff.The rainfall-runoff data during this step are low rainfall and small discharge increments, and the data are grouped as Cluster A. Next, the rainfall increases and becomes large; however, the initial losses and high infiltration losses during this period cause only a gradual increase in the discharge.The high rainfall and small discharge increment data during this period are grouped as Cluster B. Subsequently, the infiltration losses decrease and increasingly more surface runoff reaches the basin outlet.The discharge increases rapidly to the crest segment of the hydrograph.The high rainfall and large discharge increment data are grouped as Cluster C. When the rainfall diminishes and the discharge starts to decrease, the data with low rainfall and large discharge increment are grouped as Cluster D. Subsequently, the segment of a hydrograph with low rainfall and small discharge increment is similar to the initial part of the hydrograph, and the corresponding data are thus also grouped as Cluster A. Thus, the rainfall-runoff process can be described by the sequence through Clusters A, B, C, D, and A. For an actual storm hydrograph, the rainfall-runoff phenomenon can be more complex, and the translation and storage effects can be significant in large watersheds.However, the proposed four clusters represent the rainfall-runoff process for a typical storm hydrograph.sharply, but the discharge increment remains large.In the lower part of the recession limb, the discharge decreases slowly to the base flow and the discharge increment is small.On the basis of the typical rainfall-runoff event illustrated in Figure 1, the rainfall-runoff process can be divided into several steps, as shown in Figure 2. At the beginning of a rainfall event, the rainfall is generally low and does not significantly contribute to the runoff.The rainfall-runoff data during this step are low rainfall and small discharge increments, and the data are grouped as Cluster A. Next, the rainfall increases and becomes large; however, the initial losses and high infiltration losses during this period cause only a gradual increase in the discharge.The high rainfall and small discharge increment data during this period are grouped as Cluster B. Subsequently, the infiltration losses decrease and increasingly more surface runoff reaches the basin outlet.The discharge increases rapidly to the crest segment of the hydrograph.The high rainfall and large discharge increment data are grouped as Cluster C. When the rainfall diminishes and the discharge starts to decrease, the data with low rainfall and large discharge increment are grouped as Cluster D. Subsequently, the segment of a hydrograph with low rainfall and small discharge increment is similar to the initial part of the hydrograph, and the corresponding data are thus also grouped as Cluster A. Thus, the rainfall-runoff process can be described by the sequence through Clusters A, B, C, D, and A. For an actual storm hydrograph, the rainfall-runoff phenomenon can be more complex, and the translation and storage effects can be significant in large watersheds.However, the proposed four clusters represent the rainfall-runoff process for a typical storm hydrograph.

Hybrid Neural Network Model
The proposed hybrid neural network model based on physically clustered hydrologic data is illustrated in Figure 3.The input rainfall and discharge increment data are grouped into four clusters using the SOM.Each cluster meaningfully corresponds to a typical step in the rainfall-runoff process.Then, BPNNs are constructed with respect to each cluster to forecast the discharge increment.The discharge forecasts are obtained when the forecasted discharge increment is added to the observed discharge at the present time.The detailed methodology of the SOM and BPNN has been well documented in the literature.Therefore, a brief description of the two neural network models is provided herein.

Hybrid Neural Network Model
The proposed hybrid neural network model based on physically clustered hydrologic data is illustrated in Figure 3.The input rainfall and discharge increment data are grouped into four clusters using the SOM.Each cluster meaningfully corresponds to a typical step in the rainfall-runoff process.Then, BPNNs are constructed with respect to each cluster to forecast the discharge increment.The discharge forecasts are obtained when the forecasted discharge increment is added to the observed discharge at the present time.The detailed methodology of the SOM and BPNN has been well documented in the literature.Therefore, a brief description of the two neural network models is provided herein.

SOM
The SOM, proposed by Kohonen [34], is an unsupervised-learning neural network that automatically groups input data into several clusters without assigning the target outputs.The SOM uses a competitive learning strategy to map the input data onto a low-dimensional topological map.The process of constructing an SOM neural network is described briefly as follows.
The SOM network comprises one input layer and one output layer (the topological map), and the input neurons are fully connected to the output neurons.Let the input variables xi (i = 1, 2, …, m) form an input vector X, where m is the number of input neurons.Each output neuron uj (j = 1, 2, …, n) on the topological map has a weight wij with respect to each input variable xi, and n is the number of output neurons.The SOM is trained iteratively using randomly assigned initial weights.The SOM algorithm calculates the similarity between the input vector X and weight vector Wj for each output neuron.The similarity is defined as the Euclidean distance dj: The output neuron whose weight vector is closest to the input vector has the minimum distance and is declared the winning neuron.The weights of this winning neuron * j u and its neighboring neurons uj are then adjusted to approach the input vector.A typical neighborhood function is the Gaussian function hj:

SOM
The SOM, proposed by Kohonen [34], is an unsupervised-learning neural network that automatically groups input data into several clusters without assigning the target outputs.The SOM uses a competitive learning strategy to map the input data onto a low-dimensional topological map.The process of constructing an SOM neural network is described briefly as follows.
The SOM network comprises one input layer and one output layer (the topological map), and the input neurons are fully connected to the output neurons.Let the input variables xi (i = 1, 2, …, m) form an input vector X, where m is the number of input neurons.Each output neuron uj (j = 1, 2, …, n) on the topological map has a weight wij with respect to each input variable xi, and n is the number of output neurons.The SOM is trained iteratively using randomly assigned initial weights.The SOM algorithm calculates the similarity between the input vector X and weight vector Wj for each output neuron.The similarity is defined as the Euclidean distance dj: The output neuron whose weight vector is closest to the input vector has the minimum distance and is declared the winning neuron.The weights of this winning neuron * j u and its neighboring neurons uj are then adjusted to approach the input vector.A typical neighborhood function is the Gaussian function hj:

SOM
The SOM, proposed by Kohonen [34], is an unsupervised-learning neural network that automatically groups input data into several clusters without assigning the target outputs.The SOM uses a competitive learning strategy to map the input data onto a low-dimensional topological map.The process of constructing an SOM neural network is described briefly as follows.
The SOM network comprises one input layer and one output layer (the topological map), and the input neurons are fully connected to the output neurons.Let the input variables x i (i = 1, 2, . . ., m) form an input vector X, where m is the number of input neurons.Each output neuron u j (j = 1, 2, . . ., n) on the topological map has a weight w ij with respect to each input variable x i , and n is the number of output neurons.The SOM is trained iteratively using randomly assigned initial weights.The SOM algorithm calculates the similarity between the input vector X and weight vector W j for each output neuron.The similarity is defined as the Euclidean distance d j : Water 2018, 10, 632 5 of 17 The output neuron whose weight vector is closest to the input vector has the minimum distance and is declared the winning neuron.The weights of this winning neuron u * j and its neighboring neurons u j are then adjusted to approach the input vector.A typical neighborhood function is the Gaussian function h j : where σ is the width of the topological neighborhood.The neighborhood function h j and width σ are usually set to decrease monotonically during the iterative process.The adjusted weight at iteration time r + 1 is defined as where η is the learning rate (0 < η < 1) and is also set to decrease during the iterative process.Iterations are performed until the weight vector converges.Thereafter, similar input vectors are mapped to a specific region (cluster) on the topological map, and several clusters are automatically grouped.

BPNN
The BPNN, developed by Rumelhart et al. [35], is the most representative and popularly used neural network.A supervised multilayer feed-forward neural network, the BPNN uses the back-propagation algorithm for network training.The BPNN typically comprises three layers: the input, hidden, and output layers.Let the input variables x i (i = 1, 2, . . ., m) be the neurons in the input layer, and ŷk (k = 1, 2, . . ., p) be the output variable of the k-th neuron in the output layer.The BPNN output ŷk is expected to fit the target (actual) output y k .The BPNN (with n neurons in the hidden layer) can be expressed in the following form: where w ij is the weight connecting the i-th neuron in the input layer to the j-th neuron in the hidden layer; b j is the bias of the j-th hidden neuron; w jk is the weight connecting the j-th neuron in the hidden layer to the k-th neuron in the output layer; c k is the bias of the k-th output neuron; and F( ) is the activation function of the hidden neuron.Among the various activation functions that exist, linear, sigmoid, and hyperbolic tangent functions are the most widely used functions.
In the learning process of the back-propagation algorithm, the weights of the network are adjusted to minimize the objective function E: To minimize E, the gradient descent method is used to tune the weights along the negative direction of the gradient of E. The iteration of weight adjustment is repeated until convergence is reached.The detailed process of determining the weights can be found in the literature [36,37].

Study Area and Hydrologic Data
The study area is Wu River, located in central Taiwan (Figure 4).Wu River flows through the metropolitan area of Taichung City and empties into the Taiwan Strait.Wu River encloses a basin area of 2026 km 2 and has a mainstream length of 119 km.The average annual precipitation in the Wu River basin is approximately 2087 mm, much of which is typhoon rainfall.

Determining the Input Variables
This study analyzed the lags between the discharge at Dadu Bridge with various lagged rainfall and discharge variables.The derived lagged variables were used as inputs of the proposed hybrid neural network model to forecast the discharge at Dadu Bridge.This study applied the linear transfer function (LTF) to determine the lagged variables by applying the least-squares technique to construct a linear function with lagged input variables.The t-test was employed to examine the statistical significance of the input variables.An advantage of using the LTF is that the lagged variables can be objectively determined by the statistical significance test.The process of using the LTF and the statistical significance test to determine the lags of input variables can be found in Chen et al. [38].The time step for the analysis of lags and the following flood forecasting is one hour in this study.This study used two types of hourly rainfall data as input variables: multiple rainfall data from eight rain gauges and average rainfall data.The most significant lags between the discharge at Dadu Bridge and the rainfall from each rain gauge were determined by the LTF.The lag for rain gauges G1 and G2 was 1 h, and the lag for G3 was 2 h.Rain gauges G4, G5, and G6 exhibited a lag of 3 h, and G7 and G8 exhibited a lag of 4 h.The determined lags were hydrologically rational.When the distance The downstream Dadu Bridge discharge station (Figure 4) near the metropolitan area is the forecasting object.This study collected hourly discharge data from Dadu Bridge and hourly rainfall data from eight rainfall gauges (named G1 to G8 and shown in Figure 4).Data for thirteen typhoon flood events with complete records were obtained.Among these flood events, 10 events (488 datasets) were used for calibration and three events (206 datasets) that caused flooding disasters in the downstream metropolitan area were used for validation.Table 1 lists the characteristics of the typhoon flood events, including the date and name of the typhoon, total amount of average rainfall (Thiessen polygon method) in the Dadu Bridge Basin, and peak discharge at Dadu Bridge.

Determining the Input Variables
This study analyzed the lags between the discharge at Dadu Bridge with various lagged rainfall and discharge variables.The derived lagged variables were used as inputs of the proposed hybrid neural network model to forecast the discharge at Dadu Bridge.This study applied the linear transfer function (LTF) to determine the lagged variables by applying the least-squares technique to construct a linear function with lagged input variables.The t-test was employed to examine the statistical significance of the input variables.An advantage of using the LTF is that the lagged variables can be objectively determined by the statistical significance test.The process of using the LTF and the statistical significance test to determine the lags of input variables can be found in Chen et al. [38].The time step for the analysis of lags and the following flood forecasting is one hour in this study.
This study used two types of hourly rainfall data as input variables: multiple rainfall data from eight rain gauges and average rainfall data.The most significant lags between the discharge at Dadu Bridge and the rainfall from each rain gauge were determined by the LTF.The lag for rain gauges G1 and G2 was 1 h, and the lag for G3 was 2 h.Rain gauges G4, G5, and G6 exhibited a lag of 3 h, and G7 and G8 exhibited a lag of 4 h.The determined lags were hydrologically rational.When the distance of a rain gauge to Dadu Bridge was longer, the most statistically significant time lag was also longer.For the average rainfall, the Thiessen polygon method was used to calculate the average rainfall in the Dadu Bridge watershed.The lagged average rainfall variables for 1-4 h were statistically significant at the 5% significance level, with the most significantly lagged variable for 3 h.The lagged discharge variables (discharge increment) were also examined.Only the lagged discharge variable for 1 h was statistically significant at the 5% significance level.
Let the hourly discharge of Dadu Bridge at the present time t be Q(t).For flood forecasting, the forecasted discharge with the lead-time of 1 h is Q(t + 1).When the multiple rainfall data from eight rain gauges are used as inputs according to the most significant lags, the input variables are denoted as , and R G8 (t − 3), where R G1 (t) indicates the rainfall variable for rain gauge G1 at time t; the same notation applies to the other rain gauges.Let the discharge increment be ∆Q(t), which is defined as ∆Q(t) = Q(t) − Q(t − 1).The proposed hybrid neural network model using multiple rainfall data (denoted as f I [ ]) for forecasting the one-hour-ahead discharge increment ∆ Q(t + 1) can be formulated as When the discharge increment ∆ Q(t + 1) is computed by the model and added to the observed discharge Q(t), the one-hour-ahead discharge Q(t + 1) can be forecasted.The model that uses the basin average rainfall data R A (t) (denoted as f II [ ]) is formulated as For convenience, the hybrid neural network model using multiple rainfall data is hereafter termed Model I, and that using average rainfall data is termed Model II.

Clustering by Using the SOM
According to the proposed hybrid neural network model, input data were grouped into four hydrologically meaningful clusters formed by using the SOM.A two-stage clustering process (Figure 5) was proposed based on the properties of the rainfall and discharge data.In the first stage, input variables were grouped into two clusters (low and high rainfall clusters) by using only the rainfall data.In the second stage, the low rainfall cluster was further separated into two clusters (small and large discharge increment clusters) by using the discharge increment data.The high rainfall cluster was also divided into small and large discharge increment clusters.Consequently, four hydrologically meaningful clusters (with low and high R vs. small and large ∆Q) were obtained.This study applied the two-stage scheme to ensure that the rainfall and discharge increment data could be grouped into the expected four clusters.Because two forecasting models, Model I and Model II, were proposed corresponding to the two types of input rainfall variables, the clustering process was performed with respect to both the multiple rainfall data and average rainfall data.
Table 2 lists the clustering results for the calibration events (totally 488 datasets).The numbers of clusters corresponding to the two types of rainfall data are comparable.Cluster A (low R and small ∆Q) has the most data, and Cluster C (high R and large ∆Q) has the fewest data.The clustering results corresponding to the number of clusters are rational.The initial and final parts of the hydrograph (grouped as Cluster A) normally contain a large portion of the whole dataset.The rapidly rising limb of the hydrograph (grouped as Cluster C) encloses fewer data.Figure 6 illustrates the clustering results of the SOM for the calibration events using three large, medium, and small flood events as an example.The clustering results concerning the multiple rainfall data (left panel) and average rainfall data (right panel) are similar.Event 02 (upper panel) is a large flood with a single peak caused by a concentrated and severe storm.The rainfall and discharge (also the discharge increment) data are large.Therefore, very few data (only two for the average rainfall case) are grouped as Cluster A. However, the rainfall-runoff process from Cluster B to Clusters C and D is appropriately identified by the clustered data.Event 03 (middle panel) is a medium flood with multiple peaks caused by a series of intermittent storms.The hydrologic process shown by the clusters is somewhat complicated.Nevertheless, progress from Cluster A to Clusters B, C, and D can be observed, with some data of Cluster C in the peak segment.Event 10 (lower panel) is an event with small peak discharge and low rainfall.The hydrograph is favorably described by the clusters; however, no data is grouped as Cluster C due to the low rainfall and discharge.
The SOM clustering was validated using the validation events, and the results are as follows.Two of the three events that caused flooding disasters in the metropolitan area were relatively large flood events.The greater numbers of Clusters C and D (with large discharge increments) than those of Clusters A and B (with small discharge increments) listed in Table 4 indicate the circumstances.Figure 7 presents the clustering results for the validation events.Event 11 is a large flood, with lots of data around the peak grouped as Cluster C and no data grouped as Cluster A. Event 12 is an extremely large flood with a peak discharge much higher than is present in the calibration data (cf.Table 1).Many high discharge data were reasonably classified as Cluster C. The clustering results show a clear progression from Cluster A to Clusters B, C, and D for the first hydrograph, and also an obvious sequence of Clusters B, C, D, and A for the second hydrograph.Event 13 is a small flood event that shows a similar result as Event 10 in the calibration set.The hydrograph is well explained  Table 3 lists the ranges (minimum and maximum) of rainfall R and discharge increment ∆Q for different clusters with respect to the average rainfall data.The average rainfall 4.66 mm was grouped into the low rainfall cluster, and that 4.67 mm was grouped into the high rainfall cluster at the first clustering stage.During the second-stage clustering process, the discharge increment data were further classified, and four clusters were obtained.The ranges of clusters are comparable to the physical meaning of clusters.According to the ranges listed in Table 3, Cluster A (low R and small ∆Q) has small rainfall and discharge increment data, and Cluster C (high R and large ∆Q) encloses the largest range among the clusters.Figure 6 illustrates the clustering results of the SOM for the calibration events using three large, medium, and small flood events as an example.The clustering results concerning the multiple rainfall data (left panel) and average rainfall data (right panel) are similar.Event 02 (upper panel) is a large flood with a single peak caused by a concentrated and severe storm.The rainfall and discharge (also the discharge increment) data are large.Therefore, very few data (only two for the average rainfall case) are grouped as Cluster A. However, the rainfall-runoff process from Cluster B to Clusters C and D is appropriately identified by the clustered data.Event 03 (middle panel) is a medium flood with multiple peaks caused by a series of intermittent storms.The hydrologic process shown by the clusters is somewhat complicated.Nevertheless, progress from Cluster A to Clusters B, C, and D can be observed, with some data of Cluster C in the peak segment.Event 10 (lower panel) is an event with small peak discharge and low rainfall.The hydrograph is favorably described by the clusters; however, no data is grouped as Cluster C due to the low rainfall and discharge.by the clusters; however, no data is grouped as Cluster C. The calibration and validation results prove that the proposed clustering method based on the hydrologic process meaningfully depicts the physical processes behind rainfall and discharge data.The SOM clustering was validated using the validation events, and the results are as follows.Two of the three events that caused flooding disasters in the metropolitan area were relatively large flood events.The greater numbers of Clusters C and D (with large discharge increments) than those of Clusters A and B (with small discharge increments) listed in Table 4 indicate the circumstances.Figure 7 presents the clustering results for the validation events.Event 11 is a large flood, with lots of data around the peak grouped as Cluster C and no data grouped as Cluster A. Event 12 is an extremely large flood with a peak discharge much higher than is present in the calibration data (cf.Table 1).Many high discharge data were reasonably classified as Cluster C. The clustering results show a clear progression from Cluster A to Clusters B, C, and D for the first hydrograph, and also an obvious sequence of Clusters B, C, D, and A for the second hydrograph.Event 13 is a small flood event that shows a similar result as Event 10 in the calibration set.The hydrograph is well explained by the   For each cluster, BPNNs were constructed with respect to the structures of Model I and Model II.The calibration data used in constructing the BPNNs were linearly normalized to the interval between 0 and 1 according to the minimum and maximum values in the calibration data.The hidden nodes and activation functions of the BPNNs were determined through trial and error.Table 5 lists the calibration results regarding the number of hidden nodes and types of activation functions.The numbers of hidden nodes of the BPNNs using multiple rainfall data are generally greater than those of the BPNNs using average rainfall data.The multiple rainfall data possess a more complex spatial pattern than the average rainfall data.Therefore, more hidden nodes are required to describe the complex relationship between the inputs and output.An interesting result is the derived activation functions.The BPNNs corresponding to Cluster A (small R and small ∆Q) and Cluster C (large R and large ∆Q) use the linear function.The BPNNs for Cluster B (large R and small ∆Q) and Cluster D (small R and large ∆Q) use the sigmoid function.When the rainfall and discharge increment data in a cluster have similar properties (i.e., all small or all large), the linear function is sufficient to model the input-output relationship.When the data in a cluster are different (i.e., small vs. large), the nonlinear sigmoid function is used to model the complex input-output relationship.With the constructed SOMs and BPNNs, the hybrid neural network model was established for flood forecasting with respect to the calibration and validation events.Performance indices-the coefficient of efficiency (CE), mean absolute error (MAE), and error of time to peak discharge (ETP)-were obtained as follows: ETP = Tp − T p (10) where Q(t) is the observed discharge at time t; Q(t) is the forecasted discharge; Q is the average observed discharge; n is the number of data; T p is the time to peak for observed discharge; and Tp is the time to peak for forecasted discharge.CE is a dimensionless index with a value of unity indicating perfect fit.MAE is an index that directly describes the average forecast error with the same unit of the data.ETP is positive if the forecasted peak discharge is delayed.This lag often exists in hydrological forecasting.The model that has a smaller absolute value of ETP is better in forecasting performance.Table 6 lists the performance indices of the hybrid neural network model for the calibration and validation events.The CE values for the calibration data are 0.97 and 0.98 corresponding to Model I and Model II, respectively, whereas the MAE values are 92.9 and 68.2 m 3 /s, which are small compared to the discharge magnitude of the flood events.For the validation events, CE is 0.94 and 0.91 and MAE is 188.0 and 248.2 m 3 /s, respectively.ETPs for calibration events range from −2 to 1 h; ETP is zero for half of the events.The average ETPs are small as shown in Table 6.The performance indices prove that the proposed hybrid neural network model favorably forecasts the flood discharge and that the performance of Model I and Model II is similar.9 present the forecasted hydrographs for the calibration and validation events, respectively.In general, the forecasted hydrograph matches the observed hydrograph.However, the forecasted discharge for Cluster C is not as close as that for the other clusters.During the model learning process, only 33 and 35 datasets were used to train the BPNNs for Cluster C (Table 2).Although the BPNNs trained using fewer data have larger errors, Event 12 has a peak discharge much higher than the calibration data.The forecasted discharges around the crest segment are reasonable, indicating that the hybrid neural network model extrapolates successfully.Overall, the forecasting results demonstrate that the proposed hybrid neural network model accurately forecasts typhoon floods, including small, medium, and large events, and the two types of model (Model I and Model II) have comparable capability.

Comparison with Traditional Neural Network Model
This study also developed a traditional neural network model to assess and compare its performance with that of the hybrid neural network model.The traditional neural network model, which does not group data into clusters, uses all the calibration data to construct a single BPNN using the same calibration scheme as the hybrid neural network model.The single BPNN was also trained by using the two types of rainfall variable (Model I and Model II).The constructed traditional BPNNs have three hidden nodes that use the sigmoid activation function.Table 7 lists the performance indices of the traditional neural network model.The CE value for calibration is 0.95, which is a little lower than the CE values (0.97 and 0.98) of the hybrid neural network model.However, the CE value of 0.85 for validation is considerably lower than those (0.94 and 0.91) of the hybrid neural network model.The MAE and the ETP values of the traditional BPNN are larger than those of the hybrid neural network model (cf.Tables 6 and 7).Although the traditional BPNN also exhibits good forecasting performance in view of the performance indices, the hybrid neural network model apparently outperform the traditional neural network model.
Figure 10 displays the forecasted hydrographs obtained using the hybrid and traditional neural network models pertaining to the validation events.In general, the two sets of the forecasted hydrographs have similar patterns.However, the hydrographs obtained using the traditional BPNN exhibit minor underestimation for large discharges (especially for the peak discharge in Event 12) and overestimation for small discharges (especially for the case in Event 13).The traditional BPNN was trained using small and large discharge data simultaneously.The learning mechanism matches the whole calibration data.Thus, the single BPNN does not perform very well in some cases of small and large discharges.However, the proposed hybrid neural network model was trained using different clusters with respective small and large datasets.The hybrid neural network model is more robust and flexible for various rainfall-runoff processes.
Water 2018, 10, x FOR PEER REVIEW 13 of 17 and large discharges.However, the proposed hybrid neural network model was trained using different clusters with respective small and large datasets.The hybrid neural network model is more robust and flexible for various rainfall-runoff processes.

Conclusions
ANNs, usually regarded as black boxes, suffer from a lack of physical interpretation of the constructed model architecture.This study proposed a physical hybrid neural network model that combines the SOM and BPNNs and applied this proposed model to real-time flood forecasting.The SOM was used to group the rainfall and discharge data into four clusters with clear physical meanings to characterize the rainfall-runoff process.Then, a BPNN was constructed for each cluster with specific properties of rainfall and discharge data, which gave the BPNNs higher capability and a network structure that could be meaningfully discussed.
Typhoon flood discharges at Dadu Bridge and rainfall from eight rain gauges in the Wu River basin in Taiwan were used as the study data.Two types of rainfall data (multiple rainfall and average rainfall) were used to construct two types of hybrid neural network model (Model I and Model II).

Conclusions
ANNs, usually regarded as black boxes, suffer from a lack of physical interpretation of the constructed model architecture.This study proposed a physical hybrid neural network model that combines the SOM and BPNNs and applied this proposed model to real-time flood forecasting.The SOM was used to group the rainfall and discharge data into four clusters with clear physical meanings to characterize the rainfall-runoff process.Then, a BPNN was constructed for each cluster with specific properties of rainfall and discharge data, which gave the BPNNs higher capability and a network structure that could be meaningfully discussed.
Typhoon flood discharges at Dadu Bridge and rainfall from eight rain gauges in the Wu River basin in Taiwan were used as the study data.Two types of rainfall data (multiple rainfall and average rainfall) were used to construct two types of hybrid neural network model (Model I and Model II).The lagged input variables of the models were determined by using the LTF.The derived lags of the rainfall variables are hydrologically rational and represent the distance from the rain gauge to the basin outlet.
The clustering results of the SOM pertaining to calibration and validation events prove that the hydrologic process is meaningfully described by the clusters.The rainfall-runoff process can be identified by the sequence of Clusters A, B, C, D, and A. The training of the BPNNs reveals that more hidden nodes are required to describe the complex relationship between multiple rainfall and discharge.The simple linear activation function was adopted in the clusters with similar data, whereas the nonlinear sigmoid activation function was used in clusters where the rainfall and discharge data were different.
Flood forecasting using the hybrid neural network model revealed that the proposed model successfully forecasts flood discharge with high efficiency and small errors.Both Model I and Model II have comparable forecasting performance.This study also developed a traditional neural network for comparison with the hybrid neural network model.The traditional neural network model that was trained with the whole calibration dataset did not perform favorably in some cases of small and large discharges.With respect to the performance indices and forecast hydrographs, the proposed physical hybrid neural network model exhibits robust flood forecasting and outperforms the traditional neural network model.

Figure 2 .
Figure 2. Rainfall-runoff clusters based on the hydrologic process.

Figure 3 .
Figure 3. Structure of the proposed hybrid neural network model.

Figure 3 .
Figure 3. Structure of the proposed hybrid neural network model.

Figure 3 .
Figure 3. Structure of the proposed hybrid neural network model.

Figure 4 .
Figure 4. Wu River basin and locations of the gauge stations.

Figure 4 .
Figure 4. Wu River basin and locations of the gauge stations.

Water
2018, 10, x FOR PEER REVIEW 8 of 17 ΔQ) has small rainfall and discharge increment data, and Cluster C (high R and large ΔQ) encloses the largest range among the clusters.

Figure 5 .
Figure 5. Process of the two-stage clustering scheme.

Figure 5 .
Figure 5. Process of the two-stage clustering scheme.

Figure 6 .
Figure 6.Clustering results of the self-organizing map (SOM) for three calibration events.Figure 6. Clustering results of the self-organizing map (SOM) for three calibration events.

Figure 6 .
Figure 6.Clustering results of the self-organizing map (SOM) for three calibration events.Figure 6. Clustering results of the self-organizing map (SOM) for three calibration events.

17 Figure 7 .
Figure 7. Clustering results of the SOM for the validation events.Figure 7. Clustering results of the SOM for the validation events.

Figure 7 .
Figure 7. Clustering results of the SOM for the validation events.Figure 7. Clustering results of the SOM for the validation events.

Figure 8 .
Figure 8. Flood forecasting results of the proposed model for calibration events.Figure 8. Flood forecasting results of the proposed model for calibration events.

Figure 8 .
Figure 8. Flood forecasting results of the proposed model for calibration events.Figure 8. Flood forecasting results of the proposed model for calibration events.

Figure 10 .Table 7 .
Figure 10.Comparison of the flood forecasting results for the hybrid and traditional neural network models.

Figure 10 .
Figure 10.Comparison of the flood forecasting results for the hybrid and traditional neural network models.

Table 1 .
Characteristics of collected typhoon flood events.

Table 1 .
Characteristics of collected typhoon flood events.

Table 2 .
Number of clusters for the calibration data sets.

Table 3 .
Ranges of rainfall and discharge increment for different clusters.

Table 2 .
Number of clusters for the calibration data sets.

Table 3 .
Ranges of rainfall and discharge increment for different clusters.

Table 4 .
Number of clusters for the validation datasets.

Table 5 .
Calibrated numbers of hidden nodes and types of activation functions.

Table 6 .
Performance indices of the hybrid neural network model.