Comparative Studies of Different Imputation Methods for Recovering Streamflow Observation

Faulty field sensors cause unreliability in the observed data that needed to calibrate and assess hydrology models. However, it is illogical to ignore abnormal or missing values if there are limited data available. This study addressed this problem by applying data imputation to replace incorrect values and recover missing streamflow information in the dataset of the Samho gauging station at Taehwa River (TR), Korea from 2004 to 2006. Soil and Water Assessment Tool (SWAT) and two machine learning techniques, Artificial Neural Network (ANN) and Self Organizing Map (SOM), were employed to estimate streamflow using reasonable flow datasets of Samho station from 2004 to 2009. The machine learning models were generally better at capturing high flows, while SWAT was better at simulating low flows.


Introduction
A stream-gaging network in a watershed provides the necessary data for withdrawal uses, hydropower production, flood forecast and risk assessment, and hydrological and water quality modeling [1,2].In addition, it is essential to have a better understanding on the spatiotemporal variations of water resources and to create effective management schemes for water resources [3].However, streamflow records suffer from missing observations, mostly resulting from unexpected causes including records loss, sensor problems, or disruption of the data collection [2].In the United States, Wallis et al. [4] found that at least 5% of streamflow records were missing from 1009 United States Geological Survey stream-gauges for the period from 1948 to 1988 [4].These data would result in an incorrect response of hydrological models, but it is illogical to ignore abnormal or missing values if there is limited data available; substantial uncertainty in hydrologic and water quality modeling can be driven by these missing records.
Various data imputation methods (i.e., statistical-or physical-based methods) have been suggested to resolve missing observations [5][6][7][8].Traditional statistical methods range from simple (e.g., listwise deletions or pairwise deletions) to advanced techniques (e.g., moving average and regression) [2].Adopting an adequate statistical method depends on the number of missing observations, seasonal characteristics of missing observations, and available data from neighboring stations [5,[9][10][11].One drawback of statistical methods is the assumption of linearity between predictors and streamflow [10], resulting in a simplification of streamflow variation and underestimation of uncertainty.In addition, Adeloye [12] reported that regression methods could only be applicable when all predictors exist.
A physical-based model (e.g., the hydrological model) can also recover missing records when calibrated with all available data [11].Hydrological models, however, are not only difficult to construct, but also have a site-specific limitation.Essential data for the calibration of hydrological models may be inaccessible, resulting in relative inaccuracies when calibration parameters are determined without the application of specific data from a target station [13].
Therefore, more complex nonlinear models such as artificial neural networks (ANNs) have been applied for better estimation in recovering streamflow [14][15][16].Previous studies [9,17,18] have reported as well that the self-organizing map (SOM), an unsupervised ANN, showed satisfactory imputation results.These nonlinear models have demonstrated their performances by showing better imputation results than the traditional statistical methods [19].
For the purpose of streamflow imputation, comparison among the Soil and Water Assessment Tool (SWAT), Artificial Neural Network (ANN), and Self Organizing Map (SOM) has not been made yet.The objectives of this study were (1) to recover missing observations from the Taehwa River (TR), Korea using the Soil Water Assessment Tool model, ANN, and SOM; (2) to compare their performance in terms of streamflow imputation; and (3) to propose superior imputation methods.

Study Area and Data Acquisition
This study explored the Taehwa River watershed, which is located in the southeastern part of  1).The station is not affected by tidal action.
Water 2015, 7, page-page 2 underestimation of uncertainty.In addition, Adeloye [12] reported that regression methods could only be applicable when all predictors exist.
A physical-based model (e.g., the hydrological model) can also recover missing records when calibrated with all available data [11].Hydrological models, however, are not only difficult to construct, but also have a site-specific limitation.Essential data for the calibration of hydrological models may be inaccessible, resulting in relative inaccuracies when calibration parameters are determined without the application of specific data from a target station [13].
Therefore, more complex nonlinear models such as artificial neural networks (ANNs) have been applied for better estimation in recovering streamflow [14][15][16].Previous studies [9,17,18] have reported as well that the self-organizing map (SOM), an unsupervised ANN, showed satisfactory imputation results.These nonlinear models have demonstrated their performances by showing better imputation results than the traditional statistical methods [19].
For the purpose of streamflow imputation, comparison among the Soil and Water Assessment Tool (SWAT), Artificial Neural Network (ANN), and Self Organizing Map (SOM) has not been made yet.The objectives of this study were (1) to recover missing observations from the Taehwa River (TR), Korea using the Soil Water Assessment Tool model, ANN, and SOM; (2) to compare their performance in terms of streamflow imputation; and (3) to propose superior imputation methods.

Study Area and Data Acquisition
This study explored the Taehwa River watershed, which is located in the southeastern part of Korea (129°0′ E-129°25′ E, 35°27′ N-35°45′ N).The area of the watershed is 643.96km 2 and it includes most of Ulsan city and a small portion of Gyeongju city.The watershed consists of forest (62%), rice paddy (14%), and urban (10%) areas, as illustrated in Figure 1.Most of the urban areas are located downstream, while forest and rice paddies dominate the upstream.It has a moderate climate with average temperatures of 2 and 25.92 °C in January and August, respectively, and intense rainfall events during summer.The mean annual temperature of the Taehwa River watershed is 13.8 °C and the mean annual precipitation is 1274.6 mm based on the climatological normal.The TR watershed has eight flow gauging stations and Samho station, one of eight stations, is located in the middle part of the river (Figure 1).The station is not affected by tidal action.We obtained the Digital Elevation Model (DEM), land use information, and flow rate data of Samho station from the Water Management Information System, and the soil properties from the Korean Soil information System.Weather data was obtained from Meteorological Information Portal Service System-Disaster Prevention.Additionally, we considered the discharge and water quality from Eonyang and Gulhwa Waste Water Treatment Plants (WWTPs) as point sources of Ulsan city.

Data Imputation Methods
The Soil and Water Assessment Tool (SWAT) and two machine learning techniques, Artificial Neural Network [20] and Self Organizing Map (SOM), were applied to restore 350 flow rates in the Samho station from 2004 to 2006.The 350 flow rates had constant values caused by a faulty sensor and were regarded as missing data in this study.Figure 2  We obtained the Digital Elevation Model (DEM), land use information, and flow rate data of Samho station from the Water Management Information System, and the soil properties from the Korean Soil information System.Weather data was obtained from Meteorological Information Portal Service System-Disaster Prevention.Additionally, we considered the discharge and water quality from Eonyang and Gulhwa Waste Water Treatment Plants (WWTPs) as point sources of Ulsan city.

Data Imputation Methods
The Soil and Water Assessment Tool (SWAT) and two machine learning techniques, Artificial Neural Network [20] and Self Organizing Map (SOM), were applied to restore 350 flow rates in the Samho station from 2004 to 2006.The 350 flow rates had constant values caused by a faulty sensor and were regarded as missing data in this study.Figure 2

Soil and Water Assessment Tool
SWAT is a physically distributed hydrological model developed by Jeff Arnold in the United States Department of Agriculture-Agricultural Research Service [21].It simulates the hydrologic cycle including surface runoff, evapotranspiration, and infiltration with the consideration of water contaminations in terms of sediment, pesticides, and nutrients in a watershed.To set up the model with the Geographic Information System interface, it requires watershed characteristics including slope, land use, soil type, stream, point sources, and meteorological data including precipitation, temperature, solar intensity, relative humidity, and wind speed.SWAT divides a watershed into multiple subbasins, which consist of smaller hydrologic response units (HRU), defined by the combination of land use, slope, and soil type.The model simulates both hydrologic responses and water quality in subbasin, HRU, and reach levels by governing equations.For instance, the hydrologic process is simulated based on the water balance equation [21]: Water 2015, 7, 6847-6860 where SW t is the final soil water content (mm H 2 O), SW 0 is the initial soil water content (mm H 2 O), t is the time (days), R day is the amount of precipitation on day i (mm H 2 O), Q sur f is the amount of surface runoff on day i (mm H 2 O), E a is the amount of evapotranspiration on day i (mm H 2 O), w seep is the amount of percolation and bypass flow exiting the soil profile bottom on day i (mm H 2 O), and Q gw is the amount of return flow on day i (mm H 2 O).
In this study, the TR watershed was divided into 85 subbasins including 1413 HRUs.We calibrated the model using the SWAT Calibration and Uncertainty Programs (SWAT-CUP), which allows us to conduct sensitivity analysis, calibration, and parameterization [22].This study used the SUFI-2 algorithm to calibrate 25 hydrological parameters, as tabulated in Table 1.After model calibration, we validated the model from 2004 to 2006 without the missing data.ANN, inspired by the human brain, is a functional method for pattern classification of multi-variable datasets as well as the prediction of complex processes [23][24][25].Many researchers have applied the ANN model to predict streamflows using input variables including rainfall, temperature, past flows, past rainfall, water levels, and so on [26,27].For example, Bonafe et al. [28] chose the previous discharge, daily precipitation, daily mean temperature, total rainfall of the previous five days, and mean temperature over the previous ten days as input variables, and yielded a good performance in determining the daily mean flow in the upper Tiber River basin, Italy [27,28].In this regard, ANN could be applicable for generalizing a nonlinear relationship between environmental variables and streamflows.This study selected 4 input variables to estimate daily flow, including daily precipitation, daily temperature, total precipitation of the previous 5 days, and mean temperature over the previous 10 days by reviewing the previous studies related with the data imputation of streamflows.This is because daily precipitation has a strong positive correlation with flow rate, while total precipitation of previous 5 days has a moderate positive correlation.In addition, daily mean temperature and mean temperature over previous 10 days have weak positive correlation with flow rate.
Similar to interconnected neurons in the human brain, ANN has a structure consisting of an input layer, hidden layer, output layer, and neurons (nodes) in each layer, which is connected by weights.The input layer accepts an input vector and transfers it to the network where the hidden layer determines the complexity of training, while the output layer presents the final output of the model [29,30].Before training, weights and biases in each neuron are randomly initialized and updated by the back-propagation step [31].In this step, signals from input vectors are transferred to the next neurons in the network where they are multiplied by weights.Finally, the transfer function in each neuron utilizes the multiplied signal as an input.This study decided to apply the Tansig function as a transfer function because it was empirically the most efficient: where x i is the input in the network, y is the output in the network, N is the number of neurons in the input vector, w i is the connection weight between input and output, f is the transfer function, and b is the bias term.
To update the weight and bias in each neuron, ANN utilizes the back-propagation algorithm where the objective function is the error between output and observation [26].This algorithm updates weights by moving along the gradient descent of the error function, which allows the steepest decreasing change.The advantages of this algorithm are its ability to adjust the learning rate by updating the learning rate parameter and it also guarantees less oscillation with the momentum constant [32].Equations ( 3) and ( 4) explain the back-propagation step using gradient descent with momentum algorithm: where j is the iteration number, c is the learning rate, and a is the momentum constant.ANN repeats the above process until the error is less than the desired goal or the number of iterations is greater than the maximum iteration.In addition, the performance of ANN models is significantly affected by parameters including number of hidden layers, number of neurons in each layer, learning rate, and momentum constant.We built the structure with one hidden layer because using one hidden layer is common in hydrologic studies [33].For the rest of the parameters, this study employed the pattern search algorithm to find optimal parameters that maximize the model efficiency.

Self-Organizing Map
Kohonen was the first to propose SOM, an unsupervised machine learning technique, that clusters similar samples into a smaller dimension map while preserving the topological structure [34].At the initial step, SOM defines the map size in an output layer by considering the number of input data.The number of map units (hexagonal lattice) is generally determined by 5 c n, where n represents the number of samples [35].After setting the network size, SOM normalizes the input data and initializes weight vectors in each unit.One sample vector is randomly picked in the training step and then used to estimate the Euclidean distance with weight vectors in all the map units [36].Then, SOM identifies the Best Matching Unit (BMU) as the map unit that has the shortest distance to the sample vector: where c j is the winner unit, x j is the input vector (j = 1, 2, ¤ ¤ ¤ , n), w i is the weight vector (i = 1, 2, ¤ ¤ ¤ , m), m is the number of map units, and || || is the distance measure, Euclidean distance.
SOM iteratively updates weight vectors of BMU and its neighboring units by using a neighborhood function to minimize the distance between them.The Gaussian distribution is applied to update the weights, as follows [34]: where h c j ,i is the neighborhood function around the winner c j .Iteration of SOM is repeated until it converges.We selected the same input variables used in the ANN model to compare the SOM performance with ANN.

Cross-Validation and Evaluation
Cross-validation was performed for ANN and SOM models to increase the model training efficiency.For this step, we randomly shuffled the datasets and divided them into the six subsets.Five subsets out of six were drawn for training and the remaining subset was assigned for validation.While storing the training network of the iterations, we repeated the cross validation step and selected the network with the best performance in terms of Nash-Sutcliffe efficiency coefficient (NSE).The datasets used in calibration and validation of the ANN and SOM were different because they were randomly shuffled in this step.
We evaluated model results based on the three statistics including NSE, coefficient of determination (R 2 ), and Root Mean Square Error (RMSE).At first, NSE is a normalized statistic, indicating the fitness of a 1:1 line between observed and simulated data, and it varies from ¡V to 1.
It is considered to be acceptable when values are greater than 0.5.Next, R 2 measures the degree of collinearity between observed and simulated data, and it varies from 0 to 1.A higher R 2 value means less error variance and it is considered to be acceptable when values are greater than 0.5.Last, RMSE is the error index, and a lower RMSE indicates a better model.These statistics are calculated by the equations below [37]: Water 2015, 7, 6847-6860 where O i and S i are the observed and simulated data, respectively; O avg and S avg are the means of the observed and simulated data, respectively; and n is the number of datum.

Parameter Estimation
This study calibrated 25 hydrologic parameters in SWAT as shown in Table 1, which includes the ranges of parameters, sensitivity rank, and the final values used in the calibration.CH_N2 (Manning's n value for the main channel length) was the most sensitive parameter, followed by SLSUBBSN (Average slope length), CN2 (Moisture condition II curve number), SOL_K (Saturated hydraulic conductivity), and ALPHA_BF (Base flow recession constant).Most of the top sensitive parameters were related with channel or overland routing.This result is in agreement with previous calibration works, showing that CN2, ALPHA_BF, and SLSUBBSN were highly ranked in the sensitivity analysis [38][39][40].
Table 2 shows the ANN-associated parameters including learning rate, momentum constant, and number of neurons optimized by the pattern search method and SOM-related errors: quantization and topographic errors.The momentum constant (0.5) is less than the learning rate (0.75), implying that previous weights have more influence in updating the weights in the ANN model compared to new weights.In SOM, the quantization error measures the resolution of SOM while the topographic error does the topology preservation of SOM.The quantization (0.335) and topographic (0.039) errors in this study were within the reasonable ranges of a previous application [41].

Comparison of Model Performance
The performances of SWAT, ANN, and SOM models were compared after calibration and training.Figure 3 illustrates observed and simulated flow rates in the calibration, training, and validation periods of each model, while Table 3 shows the statistical analysis with NSE and R 2 .SWAT, ANN, and SOM had NSE values of 0.55, 0.71, and 0.79 during the calibration periods, and 0.54, 0.61, and 0.63 for the validation periods, respectively.Based on the NSE value, the three models produced acceptable results for both periods [37].SOM showed the best performance while SWAT had the worst among them.In the case of the R 2 , SWAT, ANN, and SOM had R 2 values of 0.55, 0.71, and 0.83 during the calibration periods, and 0.59, 0.63, and 0.65 for the validation periods, respectively.The values of the R 2 are similar with NSE or slightly greater than NSE, and SOM showed the best performance in terms of the R 2 .
For the SWAT model, NSE values during the calibration and validation were similar.However, for the other two models the NSE values were lower in the validation period compared to the calibration period.These discrepancies were mainly due to the different datasets used in the models.SWAT had continuous time series data for both the calibration and validation periods as 2007-2009 and 2004-2006 without incorrect values, respectively.However, for the ANN and SOM, the data used for the calibration and validation periods were selected by the cross validation step.In this step, models tended to select the data with a bigger value for the calibration period to reduce the error in an efficient way.In short, the calibration period could be concentrated with the bigger value, while Water 2015, 7, 6847-6860 the rest of the data with relatively lower values went to the validation period.Therefore, NSE values were not similar during the calibration and validation for the ANN and SOM.Though NSE values were lower in the validation step for two models, they are still acceptable values.Figure 4 shows the results of data imputation by the three models for 350 missing streamflows.The imputed flow rates of the ANN and SOM models showed similar trends; for example, they are both sensitive to precipitation events with comparable times and magnitudes of peaks, and the R 2 between them is 0.73.In contrast, SWAT generally underestimated the discharges.Based on the statistical index, SOM was considered the best model at simulating streamflow of the TR watershed.However, NSE and R 2 are only substantially sensitive to the high flow and they do not reflect low flow well.Hence, it is implausible to simply adopt the SOM as the best model when most of the Water 2015, 7, 6847-6860 Figure 4 shows the results of data imputation by the three models for 350 missing streamflows.The imputed flow rates of the ANN and SOM models showed similar trends; for example, they are both sensitive to precipitation events with comparable times and magnitudes of peaks, and the R 2 between them is 0.73.In contrast, SWAT generally underestimated the discharges.Based on the statistical index, SOM was considered the best model at simulating streamflow of the TR watershed.However, NSE and R 2 are only substantially sensitive to the high flow and they do not reflect low flow well.Hence, it is implausible to simply adopt the SOM as the best model when most of the missing streamflow was low-flow.To make a better comparison of these models, this study plotted the Flow Duration Curves (FDC) of SWAT and SOM from 2007 to 2009.ANN was excluded since it had similar trends and sensitivity to SOM.      4 shows the RMSE of SWAT and SOM in each section from 2007 to 2009.SWAT had lower RMSE than SOM for Sections II-V, which have relatively low discharges.This implies that SWAT could simulate relatively low better than SOM despite having smaller NSE and R 2 values during calibration and validation.6, dry and low intensity rainfall periods were found in 2009 before high intensity rainfall period.Soil moisture was low during the dry period (i.e., the blue box), thereby water infiltration throughout soil layers was enhanced and surface runoff reduced.Therefore, the magnitude of peak streamflow in 2009 was the lowest compared to 2007 and 2008 (Figure 6).In a previous work, SWAT tends to underestimate high peak flows, which is one of the limitations of the model [38,[43][44][45].This is analogous to the results in Table 5, showing that SWAT performed better in 2009 while SOM was better in 2007 and 2008.We found that the model performances in Section I substantially influenced the overall model performance as reflected in NSE or R 2 .With the exception of 2009, wherein SWAT performed better than SOM, the machine learning technique usually shows better performance in high flow; therefore, it is recommended to use an ANN or SOM model for imputing high flow events.Otherwise, applying the SWAT model for low flow events would be more desirable.Here, the Q95 was used as a critical value to determine high flows from the whole observation.The rest of flows, which belong to Sections II-V, are considered as low flows in this study.

Data Imputation Result
Figure 7 summarizes the proposed data imputation algorithm using SOM and the SWAT model.The first step is to make FDC results to determine the Q95 value, which is used to separate low and high flow events.Then, it is required to first simulate the flow, using both SOM and SWAT, and compare two simulated streamflows with the Q95.If two simulated streamflows are greater than the Q95 value, the missing streamflow belongs to Section I and is substituted by the SOM output; otherwise it is categorized by Sections II-V and is substituted by the SWAT model.If two simulated streamflows belong to different sections, it is recommended to follow what SOM brings since SOM has higher accuracy of performance than SWAT.

Data Imputation Result
Figure 7 summarizes the proposed data imputation algorithm using SOM and the SWAT model.The first step is to make FDC results to determine the Q95 value, which is used to separate low and high flow events.Then, it is required to first simulate the flow, using both SOM and SWAT, and compare two simulated streamflows with the Q95.If two simulated streamflows are greater than the Q95 value, the missing streamflow belongs to Section I and is substituted by the SOM output; otherwise it is categorized by Sections II-V and is substituted by the SWAT model.If two simulated streamflows belong to different sections, it is recommended to follow what SOM brings since SOM has higher accuracy of performance than SWAT.
Korea (129 ¥ 0 I E-129 ¥ 25 I E, 35 ¥ 27 I N-35 ¥ 45 I N).The area of the watershed is 643.96km 2 and it includes most of Ulsan city and a small portion of Gyeongju city.The watershed consists of forest (62%), rice paddy (14%), and urban (10%) areas, as illustrated in Figure 1.Most of the urban areas are located downstream, while forest and rice paddies dominate the upstream.It has a moderate climate with average temperatures of 2 and 25.92 ¥ C in January and August, respectively, and intense rainfall events during summer.The mean annual temperature of the Taehwa River watershed is 13.8 ¥ C and the mean annual precipitation is 1274.6 mm based on the climatological normal.The TR watershed has eight flow gauging stations and Samho station, one of eight stations, is located in the middle part of the river (Figure

Figure 1 .
Figure 1.Land use and location of Samho station in Taehwa River watershed.

Figure 1 .
Figure 1.Land use and location of Samho station in Taehwa River watershed.
illustrates a brief framework of this study, showing calibration (2007-2009) and validation periods (2004-2006) of the SWAT model as well as the input data of the ANN and SOM models.For the ANN and SOM models, data from 2004 to 2009 were used for cross-validation.Water 2015, 7, page-page 3 illustrates a brief framework of this study, showing calibration (2007-2009) and validation periods (2004-2006) of the SWAT model as well as the input data of the ANN and SOM models.For the ANN and SOM models, data from 2004 to 2009 were used for cross-validation.

Figure 2 .
Figure 2. Flow chart of the Data Imputation methodology of the three models: Soil and Water Assessment Tool (SWAT), Artificial Neural Network (ANN), and Self Organizing Map (SOM).

2. 2 . 1 .
Soil and Water Assessment ToolSWAT is a physically distributed hydrological model developed by Jeff Arnold in the United States Department of Agriculture-Agricultural Research Service[21].It simulates the hydrologic cycle including surface runoff, evapotranspiration, and infiltration with the consideration of water contaminations in terms of sediment, pesticides, and nutrients in a watershed.To set up the model with the Geographic Information System interface, it requires watershed characteristics including slope, land use, soil type, stream, point sources, and meteorological data including precipitation, temperature, solar intensity, relative humidity, and wind speed.SWAT divides a watershed into multiple subbasins, which consist of smaller hydrologic response units (HRU), defined by the combination of land use, slope, and soil type.The model simulates both hydrologic responses and water quality in subbasin, HRU, and reach levels by governing equations.For instance, the hydrologic process is simulated based on the water balance equation[21]:

Figure 2 .
Figure 2. Flow chart of the Data Imputation methodology of the three models: Soil and Water Assessment Tool (SWAT), Artificial Neural Network (ANN), and Self Organizing Map (SOM).

Figure 4 .
Figure 4. Data imputation results of 350 missing flow data for the SWAT, ANN, and SOM models.Four line graphs represent the observation (red), SWAT (blue), ANN (purple), and SOM (green), while the bar graph at the top is the daily precipitation amount of the missing flow data.

3. 3 .
Comparison of Flow Duration Curve Q95, Q185, Q275, and Q355 from FDC indicate the criteria for averaged-wet flow, normal flow, low flow, and drought flow, respectively [42].This study separated FDC into five sections based on the flow indices in an attempt to compare the model performances during low-flow and high-flow separately.Figure 5 portrays the FDC of Samho station from 2007 to 2009 and shows the streamflow simulated by SWAT and SOM (dotted blue and green lines) with the observation.Table4shows the RMSE of SWAT and SOM in each section from 2007 to 2009.SWAT had lower RMSE than SOM for Sections II-V, which have relatively low discharges.This implies that SWAT could simulate relatively low flows better than SOM despite having smaller NSE and R 2 values during calibration and validation.

Figure 4 .
Figure 4. Data imputation results of 350 missing flow data for the SWAT, ANN, and SOM models.Four line graphs represent the observation (red), SWAT (blue), ANN (purple), and SOM (green), while the bar graph at the top is the daily precipitation amount of the missing flow data.

3. 3 .
Comparison of Flow Duration Curve Q95, Q185, Q275, and Q355 from FDC indicate the criteria for averaged-wet flow, normal flow, low flow, and drought flow, respectively [42].This study separated FDC into five sections based on the flow indices in an attempt to compare the model performances during low-flow and high-flow separately.Figure 5 portrays the FDC of Samho station from 2007 to 2009 and shows the streamflow simulated by SWAT and SOM (dotted blue and green lines) with the observation.Table 4 shows the RMSE of SWAT and SOM in each section from 2007 to 2009.SWAT had lower RMSE than SOM for Sections II-V, which have relatively low discharges.This implies that SWAT could simulate relatively low flows better than SOM despite having smaller NSE and R 2 values during calibration and validation.Water 2015, 7, 6847-6860 9 separately.Figure 5 portrays the FDC of Samho station from 2007 to 2009 and shows the streamflow simulated by SWAT and SOM (dotted blue and green lines) with the observation.Table

Figure 5 .
Figure 5. Flow duration curves (FDC) of Samho station in (a) 2007, (b) 2008, and (c) 2009.The plots are divided into five sections: I, flows over averaged-wet flow standard; II, flows between averaged-wet and normal flow standard; III, flows between normal and low flow standard; IV, flows between low and drought flow standard; and V, flows under drought flow standard.

Figure 5 .
Figure 5. Flow duration curves (FDC) of Samho station in (a) 2007, (b) 2008, and (c) 2009.The plots are divided into five sections: I, flows over averaged-wet flow standard; II, flows between averaged-wet and normal flow standard; III, flows between normal and low flow standard; IV, flows between low and drought flow standard; and V, flows under drought flow standard.

Figure 6 .
Figure 6.Comparison of daily performances between SWAT and SOM from 2007 to 2009 with regard to precipitation.

Figure 6 .
Figure 6.Comparison of daily performances between SWAT and SOM from 2007 to 2009 with regard to precipitation.

Figure 7 .
Figure 7.The proposed data imputation algorithm using SOM and the SWAT model.In this study, we adopted the proposed data imputation algorithm to find out the best representative model output for 350 missing streamflows.We determined Q95 values from 2004 to 2006 to separate high-flow (Section I) and low-flow (Sections II-V) events; Q95 in 2004, 2005, and 2006 were 16.88, 6.99, and 10.38 cm, respectively.Two simulated streamflows were lower than the Q95 in both 2004 and 2006, while only SOM simulated streamflows were greater than Q95 in 2005.Considering the algorithm, we selected SWAT for the missing streamflows in 2004 and 2006, and

Figure 7 .
Figure 7.The proposed data imputation algorithm using SOM and the SWAT model.

Table 1 .
SWAT calibration results for 25 hydrologic parameters.

Table 2 .
ANN optimized parameters and SOM related errors.

Table 4 .
RMSE values of SWAT and SOM in each section of FDC from 2007 to 2009.I, however, showed inconsistent results from 2007 to 2009.SOM has lower RMSE in 2007 and 2008, while SWAT has the lower value in 2009.This is due to the different rainfall pattern from 2007 to 2009.As shown in the blue box of Figure Section

Table 5 .
R 2 of SWAT and SOM from 2007 to 2009.
Better model in a section is in bold.

Table 5 .
R 2 of SWAT and SOM from 2007 to 2009.
Better model in a section is in bold.