# Investigation of Hyperparameter Setting of a Long Short-Term Memory Model Applied for Imputation of Missing Discharge Data of the Daihachiga River

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Study Area

^{2}of catchment area [27,28].

#### 2.2. Data

#### 2.3. Method

#### 2.3.1. LSTM Model

_{t}and hidden state h

_{t}, which is the output from the LSTM unit, are passed to the LSTM of the next time step t + 1. In an LSTM unit, h

_{t}can be obtained from h

_{t−}

_{1}, C

_{t−}

_{1}and x

_{t}, which is the input value at time step t. In an output unit, output y

_{t}from the output layer can be obtained from the output of LSTM, h

_{t}. The procedure is the same as the standard RNN model. In the case of LSTM, x

_{t}, h

_{t−}

_{1}and C

_{t−}

_{1}will be passed through the input gate, output gate, and forget gate in the LSTM unit, and the vanishing gradient problem can be solved [29,30]. The units of the input layer and output layer of the LSTM used in this study are 24 and 5, respectively (Figure 2). The other hyperparameters set for training are shown below:

- $In$: number and type of input variables.
- $Bac{k}_{ts}$: backtracked time steps of data used for the training.
- $Hid$: number of units of hidden layer.
- $Drp$: dropout.
- $Dr{p}_{r}$: recurrent dropout.

_{ts}steps before will be used as training data. When the data to be estimated are used as input data for training, considering the existence of missing data, the data from t-Back

_{ts}to t-Back

_{ts}-24 are used to estimate the data at time t. This means that if there is a Back

_{ts}steps gap of missing data, this part can be estimated by the data before the gap. Thus, the hyperparameter Back

_{ts}only has a value when the estimated data were also used as input data. For other input data than the estimated point, such as temperature and precipitation, data from time t to t-24 are used to estimate time t.

#### 2.3.2. Hyperparameter Settings and Data Training

_{ts}= 24 and 168 assume the missing period of 1 day and 7 days, respectively. In each training period, values of dropout and recurrent dropout were assigned identically. The hyperparameter values were assigned by trial-and-error approach in those 90 trainings. The estimated data were the discharge of Shioyabashi. Table 2 shows the input variable types for each case. For example, in the setting of Back

_{ts}= 24, Case 1 takes as input data two variables: the discharge data from t-Back

_{ts}to t-Back

_{ts}-24 at the Shioyabashi, and the precipitation at t to t-24. Case 2 takes as input data two variables: the discharge data from t-Back

_{ts}to t-Back

_{ts}-24 at the Shioyabashi, and the discharge data from t to t-24 at Sanpukuji. Case 5 takes one variable as input data: the discharge data from t-Back

_{ts}to t-Back

_{ts}-24 at the Shioyabashi. Additionally, Case 9 takes as input data one variable: the discharge data from t to t-24 at the Sanpukuji.

_{5}) on N = 1–50 was evaluated. Even if the result was dispersed, P

_{5}can indicate that 95% of the results were better than it.

#### 2.3.3. Traditional Method

## 3. Results and Discussion

#### 3.1. Evaluation Metrics

#### 3.2. Number of Members for Ensemble Average

_{5}of the NSE from the number of ensemble members N = 1 to 50 for Case 2 is shown in Figure 8 as an example. The blue line shows the results with Back

_{ts}= 24, Hid = 20, the orange line shows the results with Back

_{ts}= 24, Hid = 200, and the red line shows the results with Back

_{ts}= 168, Hid = 100. At the blue line, where the number of hidden layers (Hid) is small, the accuracy is generally low, and the P

_{5}of NSE = 0.90 is the best. For the orange and red lines, where Hid > 100, increasing the number of ensemble members results in an accuracy of NSE > 0.92, which is greater than the 0.903 reference accuracy. In brief, the variation curves of P

_{5}of NSE become flat, and keep the accuracy level when N ≤ 20. Thus, in this study, the training results were evaluated by the P

_{5}of NSE for N = 30 for safety to avoid the dispersion of different training results.

#### 3.3. Type of Input Variables

_{5}of NSE, when N = 30, was compared to evaluate the influence of each training hyperparameter to the result. There are four kinds of Hid combinations for each input variable case. Figure 9, Figure 10 and Figure 11 show the four combinations of the hyperparameter settings summaries of the results with Drp = 0 and Drp

_{r}= 0. Figure 9 and Figure 10 show the results of Case 1–6 with Back

_{ts}= 24, which assumed a 1-day missing period, and Back

_{ts}= 168, which assumed a 7-day missing period, respectively. In Figure 9, Case 2, Case 3, and Case 4, which used the discharge data of both Shioyabashi and Sanpukuji as input data, were relatively good. Case 2 and Case 3 indicate 0.939 and 0.947 in the median of P

_{5}of NSE, respectively. They are over the reference accuracy 0.903, which is the accuracy of the linear regression model. Case 4, which used air temperature as one of the input data factors, indicates 0.900 in the median of P

_{5}of NSE, which was slightly less than the reference accuracy. The lower quartiles for Case 2 and Case 3 were 0.922 and 0.917, respectively, which were better than the reference accuracy. However, the minimum for Case 2 and Case 3 were 0.896 and 0.860, respectively, which were slightly less than the reference accuracy. It must be noted that Case 3, which used precipitation as one of the input data factors, has a wide variation in accuracy depending on different hyperparameter settings. In Figure 10, Case 2, Case 3, and Case 4 have relatively good accuracy, as is seen in Figure 9. Case 2 indicates 0.922 in the median of P

_{5}of NSE, which is over the reference accuracy. However, the median of Case 3 and Case 4 are 0.899 and 0.871, respectively, which are less than the reference accuracy. The minimum for Case 2 is 0.904, which is over the reference accuracy. Thus, Case 2 is appropriate for the input data when Back

_{ts}= 168.

_{ts}= 0 are shown in Figure 11. Case 9, Case 10, and Case 11, which used discharge data of Sanpukuji as the input data, indicated relatively good accuracies. However, they were 0.877–0.891 in the median of P

_{5}of NSE, which was slightly less than the reference accuracy. In Case 7, where only precipitation was used as the input data, the median of the P

_{5}of NSE is 0.344. In Case 8, where only air temperature was used as the input data, the median of the P

_{5}of NSE is 0.013, indicating that it is difficult to estimate the missing data. These results suggest the understanding for the input data combination is: (i) both the Sanpukuji data and the Shioyabashi data are necessary; (ii) air temperature is not required; and (iii) precipitation may contribute to the improvement of accuracy, but it should be noted that it may cause poor accuracy depending on the parameter settings.

_{ts}= 24, Hid = 20, Drp = 0, Drp

_{r}= 0) and Case 8 (Back

_{ts}= 24, Hid = 20, Drp = 0, Drp

_{r}= 0) were chosen to investigate the impact of precipitation to the estimation results. The hydrograph of observed discharge, precipitation, and both estimation results are shown in Figure 12. The blue line is observed data, and the grey line is precipitation. The green and orange lines represent discharge estimated from data with precipitation and without precipitation, respectively. The hydrograph indicates that both estimation results are responsive to precipitation events. When precipitation data was used for training, there is a trend of an occurrence of trough in the estimated discharge after a peak caused by precipitation. Additionally, the model could not estimate the discharge well when a relatively heavy precipitation event occurs. These are considered as part of the reasons that precipitation may cause lower accuracy.

#### 3.4. Dropout and Recurrent Dropout

_{r}), when Hid = 20 to 200 and Back

_{ts}= 24 for Case 3. The accuracy improved as Drp&Drp

_{r}became smaller, and Drp&Drp

_{r}= 0.00 had the best accuracy of 0.917 in the lower quartile P

_{5}of the NSE. The maximum P

_{5}of the NSE was 0.961 when Drp&Drp

_{r}= 0.01 and Hid = 200. When Hid = 20, the minimums were indicated as 0.860, 0.860, 0.825, and 0.771 for Drp&Drp

_{r}= 0.00, 0.01, 0.05, and 0.10, respectively. When Drp&Drp

_{r}= 0.00 or 0.01, in the case of Hid > 50, the accuracies were indicated to be more than 0.920, which is over the reference accuracy. In brief, Drp&Drp

_{r}= 0.00 shows the best results. The reason may be that higher Drp and Drp

_{r}values drop more units for the linear transformation of the input and recurrent state. Fewer units caused a shortage of information necessary for the training, since the LSTM model of this study has only a few units in the input and hidden layers. In the case of some other studies, a large number of units in the input and hidden layers caused drops in the units by dropout and recurrent dropout, which improved the training results [32,33]. Thus, if Hid is less than 200, Drp&Drp

_{r}= 0.00 is appropriate, and if Hid is more than 200, Drp&Drp

_{r}= 0.01 is appropriate.

#### 3.5. Number of Hidden Layers

_{5}of NSE, when Back

_{ts}= 24 and Drp&Drp

_{r}= 0.00. In Cases 2, 3, and 4, the accuracy improved as Hid increased. Case 2 and 3 indicated better accuracy than the reference accuracy 0.903 when Hid > 50. However, when Hid > 100, the accuracies were almost the same, i.e., 0.947–0.947 for Case 2, 0.957–0.959 for Case 3. Additionally, Figure 15 shows the relationship between the number of hidden layers (Hid = 20 to 200) and the P

_{5}of the NSE when Back

_{ts}= 168 and Drp&Drp

_{r}= 0.00. In Case 2, the accuracy slightly improved from 0.904 to 0.928 as Hid increased. In Case 4, the accuracy had almost no change as Hid increased. On the other hand, in Case 3, Hid = 50 indicated the best accuracy of 0.915. However, the accuracy decreased as Hid increased when Hid > 50. In Case 3, with precipitation as the input, the estimated hydrograph might be jagged due to the influence of precipitation, showing pulsed time-series data. This is the reason why Case 3 does not always show the best accuracy. As a result, for Back

_{ts}= 24, Hid = 100 is appropriate for both Cases 2 and 3 because the accuracies for Hid = 100 and 200 were almost same. Setting a higher Hid value will just lead to a longer training time consumption. Moreover, for Back

_{ts}= 168, Hid = 100 is also appropriate for Case 2, and Hid = 50 is appropriate for Case 3. However, in Case 3, where precipitation is used as the input data, care should be taken because the accuracy may decrease depending on the settings.

## 4. Conclusions

_{r}= 0, a setting with which it is possible to estimate with greater accuracy than the reference. The necessity of hyperparameter tuning was proved, and the hyperparameter settings could be a reference for further research in relevant areas. Of course, this combination can be appropriately adjusted under specific experimental conditions. In future experiments, the amount of analysis data can be increased, such as the discharge data of more than two observation points, and the influence of precipitation and air temperature on the model performance can be further analyzed to improve the accuracy of the results.

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Gao, Y.; Merz, C.; Lischeid, G.; Schneider, M. A Review on Missing Hydrological Data Processing. Environ. Earth Sci.
**2018**, 77, 47. [Google Scholar] [CrossRef] - Kojiri, T.; Panu, U.S.; Tomosugi, K. Complement Method of Observation Lack of Discharge with Pattern Classification and Fuzzy Inference. J. Jpn. Soc. Hydrol. Water Resour.
**1994**, 7, 536–543. [Google Scholar] [CrossRef] - Tezuka, M.; Ohgushi, K. A Practical Method To Estimate Missing. In Proceedings of the 18th IAHR APD Congress, Jeju, Korean, 22–23 August 2012; pp. 547–548. [Google Scholar]
- Ding, Y.; Zhu, Y.; Feng, J.; Zhang, P.; Cheng, Z. Interpretable Spatio-Temporal Attention LSTM Model for Flood Forecasting. Neurocomputing
**2020**, 403, 348–359. [Google Scholar] [CrossRef] - Li, W.; Kiaghadi, A.; Dawson, C. Exploring the Best Sequence LSTM Modeling Architecture for Flood Prediction. Neural Comput. Appl.
**2020**, 33, 5571–5580. [Google Scholar] [CrossRef] - Le, X.H.; Ho, H.V.; Lee, G.; Jung, S. Application of Long Short-Term Memory (LSTM) Neural Network for Flood Forecasting. Water
**2019**, 11, 1387. [Google Scholar] [CrossRef][Green Version] - Taniguchi, J.; Kojima, T.; Sota, Y.; Hukumoto, S.; Satou, H.; Machida, Y.; Mikami, T.; Nagayama, M.; Nishikohri, T.; Watanabe, A. Application of Recurrent Neural Network for Dam Inflow Prediction. Adv. River Eng.
**2019**, 25, 321–326. [Google Scholar] - Hu, C.; Wu, Q.; Li, H.; Jian, S.; Li, N.; Lou, Z. Deep Learning with a Long Short-Term Memory Networks Approach for Rainfall-Runoff Simulation. Water
**2018**, 10, 1543. [Google Scholar] [CrossRef][Green Version] - Xiang, Z.; Yan, J.; Demir, I. A Rainfall-Runoff Model With LSTM-Based Sequence-to-Sequence Learning. Water Resour. Res.
**2020**, 56, e2019WR025326. [Google Scholar] [CrossRef] - Sahoo, B.B.; Jha, R.; Singh, A.; Kumar, D. Long Short-Term Memory (LSTM) Recurrent Neural Network for Low-Flow Hydrological Time Series Forecasting. Acta Geophys.
**2019**, 67, 1471–1481. [Google Scholar] [CrossRef] - Sudriani, Y.; Ridwansyah, I.; Rustini, H.A. Long Short Term Memory (LSTM) Recurrent Neural Network (RNN) for Discharge Level Prediction and Forecast in Cimandiri River, Indonesia. In Proceedings of the IOP Conference Series: Earth and Environmental Science; Institute of Physics Publishing: Bristol, UK, 2019; Volume 299. [Google Scholar]
- Lee, G.; Lee, D.; Jung, Y.; Kim, T.-W. Comparison of Physics-Based and Data-Driven Models for Streamflow Simulation of the Mekong River. J. Korea Water Resour. Assoc.
**2018**, 51, 503–514. [Google Scholar] [CrossRef] - Bai, P.; Liu, X.; Xie, J. Simulating Runoff under Changing Climatic Conditions: A Comparison of the Long Short-Term Memory Network with Two Conceptual Hydrologic Models. J. Hydrol.
**2021**, 592, 125779. [Google Scholar] [CrossRef] - Fan, H.; Jiang, M.; Xu, L.; Zhu, H.; Cheng, J.; Jiang, J. Comparison of Long Short Term Memory Networks and the Hydrological Model in Runoff Simulation. Water
**2020**, 12, 175. [Google Scholar] [CrossRef][Green Version] - Granata, F.; di Nunno, F. Forecasting Evapotranspiration in Different Climates Using Ensembles of Recurrent Neural Networks. Agric. Water Manag.
**2021**, 255, 107040. [Google Scholar] [CrossRef] - Chen, Z.; Zhu, Z.; Jiang, H.; Sun, S. Estimating Daily Reference Evapotranspiration Based on Limited Meteorological Data Using Deep Learning and Classical Machine Learning Methods. J. Hydrol.
**2020**, 591, 125286. [Google Scholar] [CrossRef] - Ferreira, L.B.; da Cunha, F.F. Multi-Step Ahead Forecasting of Daily Reference Evapotranspiration Using Deep Learning. Comput. Electron. Agric.
**2020**, 178, 105728. [Google Scholar] [CrossRef] - Reimers, N.; Gurevych, I. Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks. arXiv
**2017**, arXiv:1707.06799. [Google Scholar] - Hossain, M.D.; Ochiai, H.; Fall, D.; Kadobayashi, Y. LSTM-Based Network Attack Detection: Performance Comparison by Hyper-Parameter Values Tuning. In Proceedings of the 2020 7th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2020 6th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom), New York, NY, USA, 1–3 August 2020; pp. 62–69. [Google Scholar]
- Yadav, A.; Jha, C.K.; Sharan, A. Optimizing LSTM for Time Series Prediction in Indian Stock Market. Procedia Comput. Sci.
**2020**, 167, 2091–2100. [Google Scholar] [CrossRef] - Yi, H.; Bui, K.N. An Automated Hyperparameter Search-Based Deep Learning Model for Highway Traffic Prediction. IEEE Trans. Intell. Transp. Syst.
**2020**, 22, 5486–5495. [Google Scholar] [CrossRef] - Kratzert, F.; Klotz, D.; Shalev, G.; Klambauer, G.; Hochreiter, S.; Nearing, G. Benchmarking a Catchment-Aware Long Short-Term Memory Network (LSTM) for Large-Scale Hydrological Modeling. Hydrol. Earth Syst. Sci. Discuss.
**2019**, 2019, 1–32. [Google Scholar] [CrossRef] - Afzaal, H.; Farooque, A.A.; Abbas, F.; Acharya, B.; Esau, T. Groundwater Estimation from Major Physical Hydrology Components Using Artificial Neural Networks and Deep Learning. Water
**2020**, 12, 5. [Google Scholar] [CrossRef][Green Version] - Ayzel, G.; Heistermann, M. The Effect of Calibration Data Length on the Performance of a Conceptual Hydrological Model versus LSTM and GRU: A Case Study for Six Basins from the CAMELS Dataset. Comput. Geosci.
**2021**, 149, 104708. [Google Scholar] [CrossRef] - Alizadeh, B.; Ghaderi Bafti, A.; Kamangir, H.; Zhang, Y.; Wright, D.B.; Franz, K.J. A Novel Attention-Based LSTM Cell Post-Processor Coupled with Bayesian Optimization for Streamflow Prediction. J. Hydrol.
**2021**, 601, 126526. [Google Scholar] [CrossRef] - Kojima, T.; Weilisi; Ohashi, K. Investigation of Missing River Discharge Data Imputation Method Using Deep Learning. Adv. River Eng.
**2020**, 26, 137–142. [Google Scholar] - River Division of Gifu Prefectural Office. Ojima Dam. Available online: https://www.pref.gifu.lg.jp/page/67841.html (accessed on 7 January 2022).
- Kojima, T.; Shinoda, S.; Mahboob, M.G.; Ohashi, K. Study on Improvement of Real-Time Flood Forecasting with Rainfall Interception Model. Adv. River Eng.
**2012**, 18, 435–440. [Google Scholar] - Gers, F.A.; Schmidhuber, J.A.; Cummins, F.A. Learning to Forget: Continual Prediction with LSTM. Neural Comput.
**2000**, 12, 2451–2471. [Google Scholar] [CrossRef] - Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed] - Nash, J.E.; Sutcliffe, J.V. River Flow Forecasting through Conceptual Models Part I—A Discussion of Principles. J. Hydrol.
**1970**, 10, 282–290. [Google Scholar] [CrossRef] - Semeniuta, S.; Severyn, A.; Barth, E. Recurrent Dropout without Memory Loss. In Proceedings of the COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, 11–16 December 2016; Matsumoto, Y., Prasad, R., Eds.; The COLING 2016 Organizing Committee: Osaka, Japan, 2016; pp. 1757–1766. [Google Scholar]
- Gal, Y.; Ghahramani, Z. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16), Barcelona, Spain, 5–10 December 2016; Curran Associates Inc.: Barcelona, Spain, 2016; pp. 1027–1035. [Google Scholar]

**Figure 1.**The Daihachiga River Basin and observation points. (

**a**) Location of Gifu Prefecture; (

**b**) location of the Daihachiga River Basin; (

**c**) observation points of the Daihachiga River.

**Figure 8.**Examples of relationship between number of ensemble members (N) and 5th percentile of NSE for Case 2. Blue line: Back

_{ts}= 24, Hid = 20, Drp = 0, Drp

_{r}= 0; orange line: Back

_{ts}= 24, Hid = 200, Drp = 0, Drp

_{r}= 0; red line: Back

_{ts}= 168, Hid = 100, Drp = 0, Drp

_{r}= 0.

**Figure 9.**Summarized training results for Case 1 to Case 6 when Back

_{ts}= 24, Drp = 0, Drp

_{r}= 0.

**Figure 10.**Summarized training results for Case 1 to Case 6 when Back

_{ts}= 168, Drp = 0, Drp

_{r}= 0.

**Figure 11.**Summarized training results for Case 7 to Case 11 when Back

_{ts}= 0, Drp = 0, Drp

_{r}= 0.

Hyperparameter | Value1 | Value2 | Value3 | Value4 |
---|---|---|---|---|

Back_{ts} | 24 | 168 | 0 | |

Hid | 20 | 50 | 100 | 200 |

Drp | 0 | 0.01 | 0.05 | 0.1 |

Drp_{r} | 0 | 0.01 | 0.05 | 0.1 |

Type of Input Variables | Number of Input Variables | |
---|---|---|

Case1 | Q_{shio} + P | 2 |

Case2 | Q_{shio} + Q_{san} | 2 |

Case3 | Q_{shio} + Q_{san} + P | 3 |

Case4 | Q_{shio} + Q_{san} + P + T | 4 |

Case5 | Q_{shio} | 1 |

Case6 | Q_{shio} + T | 2 |

Case7 | P | 1 |

Case8 | T | 1 |

Case9 | Q_{san} | 1 |

Case10 | Q_{san} + P | 2 |

Case11 | Q_{san} + T | 2 |

_{san}: discharge volume of Sanpukuji. Q

_{shio}: discharge volume of Shioyabashi. T: air temperature.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Weilisi; Kojima, T.
Investigation of Hyperparameter Setting of a Long Short-Term Memory Model Applied for Imputation of Missing Discharge Data of the Daihachiga River. *Water* **2022**, *14*, 213.
https://doi.org/10.3390/w14020213

**AMA Style**

Weilisi, Kojima T.
Investigation of Hyperparameter Setting of a Long Short-Term Memory Model Applied for Imputation of Missing Discharge Data of the Daihachiga River. *Water*. 2022; 14(2):213.
https://doi.org/10.3390/w14020213

**Chicago/Turabian Style**

Weilisi, and Toshiharu Kojima.
2022. "Investigation of Hyperparameter Setting of a Long Short-Term Memory Model Applied for Imputation of Missing Discharge Data of the Daihachiga River" *Water* 14, no. 2: 213.
https://doi.org/10.3390/w14020213