# Anomaly Detection Using a Sliding Window Technique and Data Imputation with Machine Learning for Hydrological Time Series

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Backgrounds

#### 2.1.1. MAD as a Robust Scale Estimator

#### 2.1.2. Bidirectional LSTM with Decreasing Weight Model

#### 2.1.3. An Architecture of Bidirectional LSTM Networks

#### 2.2. Anomaly Detection Methods

#### 2.2.1. Median with Fixed Threshold

#### 2.2.2. Median Absolute Deviation (MAD)

#### 2.3. Data Filling Methods

#### 2.3.1. Linear Interpolation

#### 2.3.2. Spline

#### 2.3.3. Bidirectional LSTM with Decreasing Weight Model

## 3. Experimental Analysis

#### 3.1. Data Gathering and Pre-Processing Step

#### 3.1.1. Test Data Generation for Anomaly Detection

#### 3.1.2. Test Data Generation for Data Filling

#### 3.2. Parameters and Training Configurations

#### 3.3. Evaluation Metrics

## 4. Results and Discussions

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv.
**2009**, 41, 1–58. [Google Scholar] [CrossRef] - Gupta, M.; Gao, J.; Aggarwal, C.C.; Han, J. Outlier detection for temporal data: A survey. IEEE Trans. Knowl. Data Eng.
**2013**, 26, 2250–2267. [Google Scholar] [CrossRef] - Ahmad, S.; Purdy, S. Real-time anomaly detection for streaming analytics. arXiv
**2016**, arXiv:1607.02480. [Google Scholar] - Thakkar, P.; Vala, J.; Prajapati, V. Survey on outlier detection in data stream. Int. J. Comput. Appl.
**2016**, 136, 13–16. [Google Scholar] [CrossRef] - Mishra, S.; Chawla, M. A comparative study of local outlier factor algorithms for outliers detection in data streams. In Emerging Technologies in Data Mining and Information Security; Springer: Singapore, 2019; pp. 347–356. [Google Scholar]
- Park, C.H. Outlier and anomaly pattern detection on data streams. J. Supercomput.
**2019**, 75, 6118–6128. [Google Scholar] [CrossRef] - Zhang, M.; Guo, J.; Li, X.; Jin, R. Data-Driven Anomaly Detection Approach for Time-Series Streaming Data. Sensors
**2020**, 20, 5646. [Google Scholar] [CrossRef] - Alghushairy, O.; Alsini, R.; Soule, T.; Ma, X. A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams. Big Data Cogn. Comput.
**2021**, 5, 1. [Google Scholar] - Braei, M.; Wagner, S. Anomaly detection in univariate time-series: A survey on the state-of-the-art. arXiv
**2020**, arXiv:2004.00433. [Google Scholar] - Gao, C.; Chen, Y.; Wang, Z.; Xia, H.; Lv, N. Anomaly detection frameworks for outlier and pattern anomaly of time series in wireless sensor networks. In Proceedings of the 2020 International Conference on Networking and Network Applications (NaNA), Haikou, China, 10–13 December 2020; pp. 229–232. [Google Scholar]
- Safaei, M.; Asadi, S.; Driss, M.; Boulila, W.; Alsaeedi, A.; Chizari, H.; Abdullah, R.; Safaei, M. A systematic literature review on outlier detection in wireless sensor networks. Symmetry
**2020**, 12, 328. [Google Scholar] [CrossRef] [Green Version] - Blázquez-García, A.; Conde, A.; Mori, U.; Lozano, J.A. A Review on Outlier/Anomaly Detection in Time Series Data. ACM Comput. Surv.
**2021**, 54, 1–33. [Google Scholar] [CrossRef] - Rousseeuw, P.J.; Croux, C. Alternatives to the median absolute deviation. J. Am. Stat. Assoc.
**1993**, 88, 1273–1283. [Google Scholar] [CrossRef] - Leys, C.; Ley, C.; Klein, O.; Bernard, P.; Licata, L. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. J. Exp. Soc. Psychol.
**2013**, 49, 764–766. [Google Scholar] [CrossRef] [Green Version] - Hochenbaum, J.; Vallis, O.S.; Kejariwal, A. Automatic anomaly detection in the cloud via statistical learning. arXiv
**2017**, arXiv:1704.07706. [Google Scholar] - Mehrang, S.; Helander, E.; Pavel, M.; Chieh, A.; Korhonen, I. Outlier detection in weight time series of connected scales. In Proceedings of the 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Washington, DC, USA, 9–12 November 2015; pp. 1489–1496. [Google Scholar]
- Hill, D.J.; Minsker, B.S. Anomaly detection in streaming environmental sensor data: A data-driven modeling approach. Environ. Model. Softw.
**2010**, 25, 1014–1022. [Google Scholar] [CrossRef] - Yu, Y.; Zhu, Y.; Li, S.; Wan, D. Time series outlier detection based on sliding window prediction. Math. Probl. Eng.
**2014**, 2014, 10–1155. [Google Scholar] [CrossRef] - Ma, L.; Gu, X.; Wang, B. Correction of outliers in temperature time series based on sliding window prediction in meteorological sensor network. Information
**2017**, 8, 60. [Google Scholar] [CrossRef] [Green Version] - Lepot, M.; Aubin, J.B.; Clemens, F.H. Interpolation in time series: An introductive overview of existing methods, their performance criteria and uncertainty assessment. Water
**2017**, 9, 796. [Google Scholar] [CrossRef] [Green Version] - Song, W.; Gao, C.; Zhao, Y.; Zhao, Y. A Time Series Data Filling Method Based on LSTM—Taking the Stem Moisture as an Example. Sensors
**2020**, 20, 5045. [Google Scholar] [CrossRef] [PubMed] - Cao, W.; Wang, D.; Li, J.; Zhou, H.; Li, L.; Li, Y. Brits: Bidirectional recurrent imputation for time series. arXiv
**2018**, arXiv:1805.10572. [Google Scholar] - Suo, Q.; Yao, L.; Xun, G.; Sun, J.; Zhang, A. Recurrent Imputation for Multivariate Time Series with Missing Values. In Proceedings of the 2019 IEEE International Conference on Healthcare Informatics (ICHI), Xi’an, China, 10–13 June 2019; pp. 1–3. [Google Scholar]
- Yang, S.; Dong, M.; Wang, Y.; Xu, C. Adversarial Recurrent Time Series Imputation. IEEE Trans. Neural Netw. Learn. Syst.
**2020**. [Google Scholar] [CrossRef] - Ye, F.; Liu, Z.; Liu, Q.; Wang, Z. Hydrologic Time Series Anomaly Detection Based on Flink. Math. Probl. Eng.
**2020**, 2020, 3187697. [Google Scholar] [CrossRef] - Sun, J.; Lou, Y.; Ye, F. Research on anomaly pattern detection in hydrological time series. In Proceedings of the 2017 14th Web Information Systems and Applications Conference (WISA), Liuzhou, China, 11–12 November 2017; pp. 38–43. [Google Scholar]
- Sun, J.; Lou, Y.; Chen, Y. Outlier detection of hydrological time series based on ARIMA-SVR model. Comput. Digit. Eng.
**2018**, 2, 225–230. [Google Scholar] - Tan, F.H.S.; Park, J.R.; Jung, K.; Lee, J.S.; Kang, D.K. Cascade of One Class Classifiers for Water Level Anomaly Detection. Electronics
**2020**, 9, 1012. [Google Scholar] [CrossRef] - Qin, Y.; Lou, Y. Hydrological time series anomaly pattern detection based on isolation forest. In Proceedings of the 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chengdu, China, 15–17 March 2019; pp. 1706–1710. [Google Scholar]
- Shao, P.; Ye, F.; Liu, Z.; Wang, X.; Lu, M.; Mao, Y. Improving iForest for Hydrological Time Series Anomaly Detection. In Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing, New York, NY, USA, 2–4 October 2020; pp. 170–183. [Google Scholar]
- Gao, Y.; Merz, C.; Lischeid, G.; Schneider, M. A review on missing hydrological data processing. Environ. Earth Sci.
**2018**, 77, 1–12. [Google Scholar] [CrossRef] - Hamzah, F.B.; MohdHamzah, F.; Razali, S.F.M.; Jaafar, O.; AbdulJamil, N. Imputation methods for recovering streamflow observation: A methodological review. Cogent Environ. Sci.
**2020**, 6, 1745133. [Google Scholar] [CrossRef] - Ren, H.; Cromwell, E.; Kravitz, B.; Chen, X. Using deep learning to fill spatio-temporal data gaps in hydrological monitoring networks. Hydrol. Earth Syst. Sci. Discuss.
**2019**. [Google Scholar] [CrossRef] [Green Version] - Zhao, Q.; Zhu, Y.; Wan, D.; Yu, Y.; Cheng, X. Research on the Data-Driven quality control method of hydrological time series data. Water
**2018**, 10, 1712. [Google Scholar] [CrossRef] [Green Version] - Bae, I.; Ji, U. Outlier detection and smoothing process for water level data measured by ultrasonic sensor in stream flows. Water
**2019**, 11, 951. [Google Scholar] [CrossRef] [Green Version] - Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw.
**2005**, 18, 602–610. [Google Scholar] [CrossRef] - Graves, A.; Jaitly, N.; Mohamed, A.R. Hybrid speech recognition with deep bidirectional LSTM. In Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic, 8–12 December 2013; pp. 273–278. [Google Scholar]
- Cui, Z.; Ke, R.; Pu, Z.; Wang, Y. Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. arXiv
**2018**, arXiv:1801.02143. [Google Scholar] - Romphet, P.; Kajkamhaeng, S.; Chantrapornchai, C. Hand Writing Recognition Model Enhancement Exploration Based on Skipped Connections. In Proceedings of the 2020 17th International Joint Conference on Computer Science and Software Engineering (JCSSE), Bangkok, Thailand, 4–6 November 2020; pp. 122–127. [Google Scholar]
- Leigh, C.; Alsibai, O.; Hyndman, R.J.; Kandanaarachchi, S.; King, O.C.; McGree, J.M.; Neelamraju, C.; Strauss, J.; Talagala, P.D.; Turner, R.D.; et al. A framework for automated anomaly detection in high frequency water-quality data from in situ sensors. Sci. Total Environ.
**2019**, 664, 885–898. [Google Scholar] [CrossRef] [PubMed] [Green Version]

**Figure 5.**Locations of the selected stations at a country level (

**left**) and at a specific zoom-in level (

**right**).

**Figure 9.**A comparison between the median with fixed threshold and MAD (

**a**) station 2 with median with fixed threshold; (

**b**) station 2 with MAD. (

**c**) station 4 with median with fixed threshold; (

**d**) station 4 with MAD.

**Figure 10.**Example of the median with fixed threshold applied on the additional data with anomalous step changes.

**Figure 11.**A result comparison between (

**a**) the linear interpolation and (

**b**) the spline method applied on parts of time series obtained from station 1.

**Figure 12.**A result comparison between (

**a**) the linear interpolation method, (

**b**) the spline method and (

**c**) the bidirectional LSTM method for station 2.

Station | Location | Basin |
---|---|---|

1 | Subdistrict: Nong Bua District: Baan Khai Province: Rayong | East coast Thailand |

2 | Subdistrict: Bang Ya Phraek District: Mueang Samut Sakhon Province: Samut Sakhon | Tha Chin River |

3 | Subdistrict: Nong Ruea District: Chumponburi Province: Surin | Mun River |

4 | Subdistrict: Sai Mai District: Sai Mai Province: Bangkok | Chao Phraya River |

Station | Median with Fixed Threshold | MAD | Spline | Bidirectional LSTM |
---|---|---|---|---|

1 | Window size: 51 $Threshold$: 0.05 | $Window\phantom{\rule{4pt}{0ex}}size$: 37 $Threshold$: 2 $MinMAD$: 0.01 | k: 2 s: 0.5 | $step$: 72 $unit$: 64 |

2 | $Window\phantom{\rule{4pt}{0ex}}size$: 25 $Threshold$: 0.3 | $Window\phantom{\rule{4pt}{0ex}}size$: 19 $Threshold$: 1 $MinMAD$: 0.01 | k: 4 s: 0.5 | $step$: 144 $unit$: 128 |

3 | $Window\phantom{\rule{4pt}{0ex}}size$: 25 $Threshold$: 0.1 | $Window\phantom{\rule{4pt}{0ex}}size$: 37 $Threshold$: 1.5 $MinMAD$: 0.01 | k: 3 s: 1 | $step$: 144 $unit$: 128 |

4 | $Window\phantom{\rule{4pt}{0ex}}size$: 31 $Threshold$: 0.05 | $Window\phantom{\rule{4pt}{0ex}}size$: 37 $Threshold$: 1.5 $MinMAD$: 0.01 | k: 4 s: 0.5 | $step$: 72 $unit$: 64 |

Station | Median with Fixed Threshold | MAD |
---|---|---|

1 | 0.9928 | 0.9944 |

2 | 0.8823 | 0.7804 |

3 | 0.9751 | 0.9745 |

4 | 0.9958 | 0.9967 |

Average | 0.9615 | 0.9365 |

**Table 4.**F1-score sensitivity resulting from MAD method after varying $Threshold$ and $Window\phantom{\rule{4pt}{0ex}}size$ parameters for station 1.

MAD with Fixed $\mathit{Threshold}$ | MAD with Fixed $\mathit{Window}\phantom{\rule{4pt}{0ex}}\mathit{Size}$ | ||||
---|---|---|---|---|---|

$\mathit{Threshold}$ | $\mathit{Window}\phantom{\rule{4pt}{0ex}}\mathit{Size}$ | F1-Score | $\mathit{Threshold}$ | $\mathit{Window}\phantom{\rule{4pt}{0ex}}\mathit{Size}$ | F1-Score |

2 | 13 | 0.9189 | 2 | 37 | 0.9944 |

25 | 0.9861 | 2.5 | 0.9913 | ||

37 | 0.9944 | 3 | 0.9864 | ||

49 | 0.9922 | 3.5 | 0.9811 |

Station | Linear | Spline | Bidirectional LSTM |
---|---|---|---|

1 | 0.0033 | 0.0027 | 0.0052 |

2 | 0.1675 | 0.1553 | 0.0438 |

3 | 0.0061 | 0.0064 | 0.0496 |

4 | 0.0026 | 0.0024 | 0.0175 |

Overall Average | 0.0449 | 0.0417 | 0.0291 |

Average of station 1, 3, 4 | 0.004 | 0.0038 | 0.0241 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kulanuwat, L.; Chantrapornchai, C.; Maleewong, M.; Wongchaisuwat, P.; Wimala, S.; Sarinnapakorn, K.; Boonya-aroonnet, S.
Anomaly Detection Using a Sliding Window Technique and Data Imputation with Machine Learning for Hydrological Time Series. *Water* **2021**, *13*, 1862.
https://doi.org/10.3390/w13131862

**AMA Style**

Kulanuwat L, Chantrapornchai C, Maleewong M, Wongchaisuwat P, Wimala S, Sarinnapakorn K, Boonya-aroonnet S.
Anomaly Detection Using a Sliding Window Technique and Data Imputation with Machine Learning for Hydrological Time Series. *Water*. 2021; 13(13):1862.
https://doi.org/10.3390/w13131862

**Chicago/Turabian Style**

Kulanuwat, Lattawit, Chantana Chantrapornchai, Montri Maleewong, Papis Wongchaisuwat, Supaluk Wimala, Kanoksri Sarinnapakorn, and Surajate Boonya-aroonnet.
2021. "Anomaly Detection Using a Sliding Window Technique and Data Imputation with Machine Learning for Hydrological Time Series" *Water* 13, no. 13: 1862.
https://doi.org/10.3390/w13131862