Case Study: Reconstruction of Runo ﬀ Series of Hydrological Stations in the Nakdong River, Korea

: Reliable runo ﬀ series is sine qua non for ﬂood or drought analysis as well as for water resources management and planning. Since observed hydrological measurement such as runo ﬀ can sometimes show abnormalities, data quality control is necessary. Generally, the data of adjacent hydrological stations are used. However, di ﬃ culties are frequently encountered when runo ﬀ series of the adjacent stations have di ﬀ erent ﬂow characteristics. For instance, when the correlation between the up- and downstream locations in which the stations are located is used as the main criterion for quality control, di ﬃ culties can occur. Therefore, this study aims to suggest a method to reconstruct an abnormal daily runo ﬀ series in the Nakdong River, Korea. The variational mode decomposition (VMD) technique is applied to the runo ﬀ series of the three target stations: Goryeong County (Goryeong bridge) and Hapcheon County (Yulji bridge and Jeogpo bridge). These runo ﬀ series are also divided into several intrinsic mode functions (IMFs) that are governed by basin runo ﬀ and disturbed ﬂow caused by the hydraulic structure. The decomposition results based on VMD show that the runo ﬀ components in a particular station that is inﬂuenced by hydraulic structures could be reconstructed using adjacent stations, but the residual mode could not. The runo ﬀ reconstruction model using an artiﬁcial neural network (ANN), the two “divided” modes, and the residual component is established and applied to the runo ﬀ series for the target station (Yulji bridge in Hapcheon County). The reconstructed series from the model show relatively good results, with R 2 = 0.92 and RMSE = 99.3 in the validation year (2019). Abnormal runo ﬀ series for 2012 to 2013 at the Yulji bridge station in Hapcheon County are also reconstructed. Using the suggested method, a well-matched result with the observations for the period from 2014 onwards is produced and a reconstructed abnormal series is obtained.


Introduction
The observation process plays a vital role in many studies and industries [1,2], and it is particularly important in hydrology and water resources management. Technological advances, such as real-time information acquisition or new measuring equipment, have been brought to a new level. However, there are still concerns about their use in practice. The microwave Doppler current meter (hereafter referred to as "MDCM") [1,3], a new way to measure velocity or depth, has been found to be questionable in terms of reflectivity due to meteorological or structural conditions [4]. These erroneous and abnormal records sometimes mislead management, which often leads to property losses. Therefore, Figure 1. The target stations of the study in the main stream of the Nakdong River; red inverted triangles indicate water level stations, and gray blocks indicate multi-purpose weirs which were completed in 2012 by the Four Major Rivers Restoration Project [43].

Variational Mode Decomposition (VMD)
VMD was developed by Dragomiretskiy and Zosso (2013) [45] to overcome the limitations of empirical mode decomposition (hereafter referred to as "EMD") [46]. In principle, all of the "mode decomposition" techniques are used to decompose a signal into several fast and slow oscillating components, which are called IMFs, or simply, "mode". It enables breaking through the limitation of traditional harmonic analysis, which is based on the Fourier transform [47]. However, there is also a mode-mixing problem, which has been defined as the IMF consisting of oscillation frequencies of disparate scales which makes the decompositions harder to interpret [48]. VMD aims to decompose a signal into k band-limited modes with specific sparsity properties [45]. The constrained variational formulation for yielding the IMFs can be expressed as follows: where u k and ω k are the kth mode and its center frequency of signal, respectively. {u k } = {u 1 , . . . , u k } and {ω k } = {ω 1 , . . . , ω k } are shorthand notations for the set of all modes and frequencies, 2 denotes the Euclidean distance (L 2 ), δ is the Dirac distribution, * denotes convolution, j 2 = −1 is the Fourier transform of the signal f (t), k is the amount of data, t is the time step, and f (t) is original signals.
Equation (1) is changed to the following unconstrained equation by introducing an augmented Lagrangian L [45].
where α denotes the balancing parameter, λ is the Lagrange multiplier, is the scalar product, and Equation (2) could be solved with the alternate direction method of multipliers (ADMM) [49]. Additionally, the mode u k (t) can be obtained using: wheref (ω ) is the Fourier transform of the signal f (t), F −1 is understood as the inverse Fourier transform, and Y is the real parts of the signal. Equation (3) has a Wiener filtering structure, so, the mode in the time domain can be obtained from the real part of the inverse Fourier transform. More specific details on VMD and its process can be found in Dragomiretskiy and Zosso (2013) [45], and Herrera et al. (2014) [45,50].

Artificial Neural Network (ANN)
There are many models and techniques to forecast or simulate and reconstruct hydrological series. An ANN was treated as a good alternative because it had shown an ability to reproduce and capture nonlinearity in pattern recognition [51,52], signal classification [52][53][54], trajectory prediction [54,55], and even financial trading [53,56]. Additionally, an ANN could be regarded as a better option than Autoregressive integrated moving average (ARIMA) or Autoregressive (AR)-related models in forecasting or simulating seasonal time series [57] and reconstructing nonlinear time series, including runoff [22,23]. Therefore, an ANN is found to be a suitable method to reconstruct runoff series using the IMFs that were decomposed by VMD.
The ANN mimics the structure and connectivity and functions of nodes through which neurons are connected to each other in biological systems [58]. Generally, an ANN is composed of three layers: the input layer which represents the observed data, the output layer which serves as the result of reconstruction, and the hidden layer which is a network of neurons trained to recognize observed data [59]. The back-propagation algorithm has been used in the structure to train the connection strength to learn about the error and optimize the neurons, therefore, an appropriate algorithm is essential for ANN modeling [60]. The Levenberg-Marquardt-QNBP algorithm was selected as the back-propagation algorithm in accordance with its well-founded performance for nonlinear hydrological series [60].
Finally, Figure 2 shows the overall framework and process for the runoff reconstruction model with three stages: (i) decomposing the runoff series into IMFs based on the VMD technique, (ii) training and establishing an ANN model to reconstruct each IMF using adjacent stations, and (iii) runoff reconstruction by summing up reconstructed IMFs and residuals. data [59]. The back-propagation algorithm has been used in the structure to train the connection strength to learn about the error and optimize the neurons, therefore, an appropriate algorithm is essential for ANN modeling [60]. The Levenberg-Marquardt-QNBP algorithm was selected as the back-propagation algorithm in accordance with its well-founded performance for nonlinear hydrological series [60]. Finally, Figure 2 shows the overall framework and process for the runoff reconstruction model with three stages: (i) decomposing the runoff series into IMFs based on the VMD technique, (ii) training and establishing an ANN model to reconstruct each IMF using adjacent stations, and (iii) runoff reconstruction by summing up reconstructed IMFs and residuals.      One of the challenges in the target stations is the quality control for abnormal or missing data. All of the observed data cause concerns about their general use in practice due to their accuracy, with abnormal and missing data during the observation process. Different flow characteristics tend to make quality control more difficult because the techniques to fill or adjust hydrological data depend on the correlation between adjacent hydrological stations. For instance, Figure 4a shows a relatively constant correlation for the Hyeongsan River, unlike in Figure 4b, wherein there are two groups of correlations which make it hard to determine the exact correlation between the two stations. In this case, it is difficult to fix the abnormal data. This is the reason why the data in station B from 2012 to 2013 cannot be solved (see Figure 3) and why VMD was selected and applied for the study.   One of the challenges in the target stations is the quality control for abnormal or missing data. All of the observed data cause concerns about their general use in practice due to their accuracy, with abnormal and missing data during the observation process. Different flow characteristics tend to make quality control more difficult because the techniques to fill or adjust hydrological data depend on the correlation between adjacent hydrological stations. For instance, Figure 4a shows a relatively constant correlation for the Hyeongsan River, unlike in Figure 4b, wherein there are two groups of correlations which make it hard to determine the exact correlation between the two stations. In this case, it is difficult to fix the abnormal data. This is the reason why the data in station B from 2012 to 2013 cannot be solved (see Figure 3) and why VMD was selected and applied for the study. One of the challenges in the target stations is the quality control for abnormal or missing data. All of the observed data cause concerns about their general use in practice due to their accuracy, with abnormal and missing data during the observation process. Different flow characteristics tend to make quality control more difficult because the techniques to fill or adjust hydrological data depend on the correlation between adjacent hydrological stations. For instance, Figure 4a shows a relatively constant correlation for the Hyeongsan River, unlike in Figure 4b, wherein there are two groups of correlations which make it hard to determine the exact correlation between the two stations. In this case, it is difficult to fix the abnormal data. This is the reason why the data in station B from 2012 to 2013 cannot be solved (see Figure 3) and why VMD was selected and applied for the study.

Runoff Series Characteristics and Its Decomposition
Since VMD can decompose a signal into several numbers of each oscillating component [45], it could be used as a method to divide runoff series into several "correlated" and "independent" components with their own frequencies, and it can also be used as basic data to reconstruct or predict runoff series [39]. Therefore, VMD was applied to the runoff series of the target stations (see Figure 5). Each series was decomposed into two IMFs (1st mode, 2nd mode) and a residual mode. The parameters of VMD, such as the moderate bandwidth constraint, were calibrated with trial and error, focusing on the simulated efficiencies of hydrological series because there are no criteria to select the exact value for the hydrological series. The decomposed result of each runoff series shows fascinating results in Figure 5a-c: (i) the 1st and the 2nd mode showed a similar trend which, during the calibration process, have correlation coefficients of 0.90, 0.88 for stations A and B, and 0.91, 0.78 for stations B and C, respectively. The station B series from 2012 to 2013 has inaccurate observations due to the facility issue. (ii) The residual shows a less similar trend for whole periods, with a value of 0.75 or less for stations A and B and C for correlation coefficient, and (iii) stations A and C show "very" similar trends, with 0.96, 0.95 for 1st, 2nd mode, and 0.76 for the residual in the correlation coefficient for the whole period. Therefore, some conclusions could be reached by combining these results since there are common components for all of them, and it could be assumed to be rainfall-runoff processes of the basin. Both show a very similar trend with over 0.9 or more in the correlation coefficient, which means that these modes are governed by the same components or local frequency ranges [39]. Therefore, the 1st and 2nd modes are runoff components and entirely governed by rainfall-runoff processes. Furthermore, the residuals are probably a mixture of runoff components and their own flow characteristics, such as back-water, disturbed flow, and other uncertainties. The residual shows a constant value in the correlation coefficient, with 0.75 or less regardless of the periods or stations, but not for the modes. This could be due to a mixture of the same and different characteristics with local frequency ranges [38]. Therefore, the 1st and 2nd mode could be used as the data to explain the runoff series in other stations, but the residual could not be used.
processes. Furthermore, the residuals are probably a mixture of runoff components and their own flow characteristics, such as back-water, disturbed flow, and other uncertainties. The residual shows a constant value in the correlation coefficient, with 0.75 or less regardless of the periods or stations, but not for the modes. This could be due to a mixture of the same and different characteristics with local frequency ranges [38]. Therefore, the 1st and 2nd mode could be used as the data to explain the runoff series in other stations, but the residual could not be used.

Establishment of Reconstruction Model
The 1st and 2nd modes could be used to explain others, but the residual could not. Thus, the two modes could be a target of "reconstruction", but the residual should only be used for itself. Therefore, to fill the abnormal data from 2012 to 2013, the 1st and 2nd modes of station B were reconstructed using stations A and C, and the residual of station B was added to the reconstructed series to take into account its own flow characteristics. An ANN was employed to reconstruct the modes of station B due to its ability to capture the nonlinearity of the runoff series [62,63]. Since the runoff series has nonlinearity characteristics [64][65][66][67], an ANN will be a good alternative during the reconstruction process. Additionally, the number of hidden layers, which significantly affects whole ANN procedure, was set to 12 with trial and error, and the Levenberg-Marquardt-QNBP algorithm was selected as the back-propagation algorithm in accordance with its well-founded performance for nonlinear hydrological series [60]. Before the training process, all of the IMFs were normalized to improve the efficiency of the ANN model and avoid large fluctuations during the training process [39]. The normalization formula is defined as follows: whereû k is the normalized IMF, u k is the original IMF, u kmax and u kmin are the maximum and minimum value of the IMF, respectively. To validate the performance of reconstruction, the 1st and 2nd modes from 2014 to 2019 were divided into two periods: 2014 to 2018 for training and 2019 for validation. Another ANN model using raw runoff series was also established to compare with the suggested method of the study. Both of them had the same parameters, input periods, and algorithm, the only difference between the two ANN models was the data, wherein one used modes and residuals while the other used raw runoff series. Additionally, multiple regression, which is widely used to reconstruct missing or abnormal data, was employed as one of the control groups (see Figure 6 and Table 2).
where is the normalized IMF, is the original IMF, and are the maximum and minimum value of the IMF, respectively. To validate the performance of reconstruction, the 1st and 2nd modes from 2014 to 2019 were divided into two periods: 2014 to 2018 for training and 2019 for validation. Another ANN model using raw runoff series was also established to compare with the suggested method of the study. Both of them had the same parameters, input periods, and algorithm, the only difference between the two ANN models was the data, wherein one used modes and residuals while the other used raw runoff series. Additionally, multiple regression, which is widely used to reconstruct missing or abnormal data, was employed as one of the control groups (see Figure  6 and Table 2).   Comparison results are shown in Figure 6a,b and Table 2. To sum up, both ANN results look like more convincing performances than multiple regression. For learning periods, the ANN with VMD has 0.93, 80.3 for the coefficient of determination (R 2 ) and root mean square error (RMSE), which are widely known and used as measures and evaluation for model performance or selection criteria [68]. In comparison, the ANN with raw runoff and multiple regression has 0.93, 0.88 for R 2 and 59.3, 116.7 for the RMSE, respectively. Therefore, both ANN models showed similar and better performance than multiple regression. Station B has 15,053 km 2 of upper basin area and 350 m of river width, therefore, the station measures the runoff from a relatively large basin. It would be reasonable to regard it as an abnormal observation when station B shows a low flow rate of approximately 20 m 3 /s or less [61]. Having too little runoff is a problem in observation. The ANN with raw runoff and multiple regression shows several relatively small runoffs during both periods, such as 0.1-1.0m 3 /s in September 2015, 0.5-8.0 m 3 /s in August 2016, and 1.0-7.2 m 3 /s in March 2017. Station B has 51.8 m 3 /s at the drought water level, which is equal to 0.97 in percentiles [41]. Approximately 24 m 3 /s or less of runoff amount, which is 0.995 in percentiles in station B at the drought water level and is the confidence boundary for proper observation percentiles [69], has the probabilities of abnormal observation. In the periods of validation (2019), the ANN with raw runoff and multiple regression shows 0.84, 0.81 for R 2 and 127.3, 219.2 for the RMSE, respectively. The ANN with VMD shows a more stable result with 0.91 for R 2 and 99.3 for the RMSE. Above all, it does not show an abnormal trend or too little runoff during the whole period. Therefore, the suggested method based on VMD shows its applicability.

Reconstruction Results and Discussion
The abnormal runoff series from 2012 to 2019 in station B were reconstructed using an established ANN model with VMD, and its results are shown in Figure 7. For their visibility, reconstructed and observed runoff values are expressed as a 5-day moving average, and the runoff boundary, which is explained with high and low runoff values of stations A and C, are also plotted to compare with station B.  The limitation of the study is the absolute accuracy of the measurement. The hydrological stations in the middle part of the Nakdong River have 80 to 100 m 3 /s as the median value of runoff, and have 150 to 550 m in river width, with approximately a 10 m depth [43]. This means that the flow velocity in the Nakdong River is inevitably low, at 0.05-0.08 m/s, even at median water level, and since the accuracy of ADVM is 1% of the measure velocity or 0.005 m/s [42], this suggests that there is 10% uncertainty or more under the median water level. That is the reason for this study and why the daily runoff time series was selected. Thus, with the help of the methodology suggested in this study, a reconstruction of abnormal data would be possible and can later be used as a basis for accurate quality control in the Nakdong River region.

Conclusions
The objective of the study is to suggest a method to reconstruct abnormal daily runoff data, Overall, the reconstructed runoff series in station B has shown a relatively good result, with 0.92 for R 2 and 55.9 for the RMSE from 2014 to 2019 (Figure 7b). Additionally, the reconstructed runoff series was identified to have the same distribution as the two-variable Kolmogorov-Smirnov test with a 95% significance level [70]. Of course, there is no "true value" of observations for 2012 to 2013, but the reconstructed series (red line) seems to be well matched with the observed runoff (blue line) after 2013. In fact, the reconstructed series shows a similar trend with the runoff boundary in stations A and C. Additionally, other evidence could be found in 2014 to 2019. In Figure 7b, there are three periods with wide range of runoff boundaries and abnormal runoff observations in station B: (i) the period (marked as "Missing data") during March 2017 has a zero value of runoff in station A and was identified as missing data due to the failure of measuring equipment, (ii) period A during November 2018 to February 2019, with approximately a 65.0 m 3 /s difference with stations A and C and abnormal runoff (minus runoff value) in station B, and (iii) period B during October 2019, with approximately a 81.6 m 3 /s difference between stations A and C and abnormal runoff (1.0-10 m 3 /s) in station B. Technically, there is a relatively more constant low runoff amount in station C than stations A and B, therefore, it is insignificant since station C is the downstream station. Station C is located about 7 km downstream from the Hapcheon-Changnyeong weir and about 35 km upstream from the Changnyeong-Haman weir, and therefore it was influenced by the weirs (see the Figure 1). The Korean government established and carried forward the plan that tries to open and monitor the multi-purpose weir for environmental restorations (the Ministry of Environment 2019) [43]. As part of the plan, the Hapcheon-Changnyeong and Changnyeong-Haman weir were also opened twice to evaluate the effect of continuous discharge. The opening periods were from 10 October 2018 to 22 February 2019 and 17 October 2019 to the end of that year, with Height above sea level (EL.) 9.2 to 4.8 m for the Hapcheon-Changnyeong weir, and EL.4.8 to 2.2 m for the Changnyeong-Haman weir, respectively. Therefore, the abnormal observation of stations B and C in these periods (A and B) could be explained by the gate operation of the weirs. It is one of the pieces of evidence proving that the ANN model with VMD is appropriate. Consequently, the suggested method could be used to reconstruct the abnormal data in the river section, which are influenced by hydraulic structures.
The limitation of the study is the absolute accuracy of the measurement. The hydrological stations in the middle part of the Nakdong River have 80 to 100 m 3 /s as the median value of runoff, and have 150 to 550 m in river width, with approximately a 10 m depth [43]. This means that the flow velocity in the Nakdong River is inevitably low, at 0.05-0.08 m/s, even at median water level, and since the accuracy of ADVM is 1% of the measure velocity or 0.005 m/s [42], this suggests that there is 10% uncertainty or more under the median water level. That is the reason for this study and why the daily runoff time series was selected. Thus, with the help of the methodology suggested in this study, a reconstruction of abnormal data would be possible and can later be used as a basis for accurate quality control in the Nakdong River region.

Conclusions
The objective of the study is to suggest a method to reconstruct abnormal daily runoff data, particularly in adjacent hydrological stations where different flow characteristics are present and the influence of hydraulic structure exists. The VMD technique was applied to three runoff series obtained through several hydraulic structures in the main stream of the Nakdong River. The series were divided into three components of two runoff components (1st and 2nd modes), which are governed by basin runoff characteristics, and a residual, which is combined with runoff and disturbed flow characteristics. The decomposed result using VMD showed that the runoff components in a particular station that was influenced by hydraulic structures could be reconstructed using adjacent stations, but the residual mode could not be reconstructed and had to be used only on its own series. Therefore, the abnormal data could be reconstructed using the "reconstructed" modes and the "original" residuals in each hydrological station. The IMF reconstruction model was established using the ANN technique, two "divided" modes, and the reconstructed runoff, which is done by summing up reconstructed IMFs and residual modes. The runoff series in station B from 2012 to 2019, which shows inaccurate observation from 2012 to 2013 due to a facility issue, was also reconstructed. The reconstructed result in station B has been shown to have relatively good results, with 0.92 for R 2 and 55.9 for the RMSE, thereby proving its applicability. Considering the importance of target stations, inaccurate or missing observations could cause problems in managing water resources, especially in times of flood, drought, and even algal blooms. Since different flow characteristics of target stations tend to make quality control more difficult, the method suggested in this study could be used as an alternative to reconstruct or control the quality of the hydrological data for both missing and abnormal data. Finally, with further