# Forecasting and Anomaly Detection in BEWS: Comparative Study of Theta, Croston, and Prophet Algorithms

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Data

#### 2.2. Machine Learning Algorithms

#### 2.2.1. Theta Method

#### 2.2.2. Croston Method

_{t}is the actual observation, o

_{t}is a Bernoulli distributed binary variable that takes a value of one when demand occurs and is zero otherwise, z

_{t}is the potential quantity of demand with a conditional distribution (becomes real only in those moments when o

_{t}=1), and t is the observation time.

_{t}= 1, …, N reflects the successive numbers of intervals and demand sizes, and N is the number of non-zero demands. If ${q}_{{j}_{t}}$ represents the time since the last non-zero observation and reflects the demand interval, then it serves as an indicator for the next non-zero observation. Croston [31] assumes that the probability of occurrence is constant between non-zero demands, while average demand sizes are assumed to be the same during zero demands. In this method, both demand sizes ${z}_{{j}_{t}}$ and demand intervals ${q}_{{j}_{t}}$ are forecast using SES [33], resulting in the following system of equations:

_{q}and α

_{z}are the smoothing options for intervals and sizes, respectively. The system of Equation (2) shows how each observation at a certain time t is transformed into the corresponding j

_{t}element of the size and demand interval. Croston’s original formulation assumed that α

_{q}= α

_{z}, but later, Schultz [34] proposed using separate smoothing parameters, which was supported by other researchers [35,36].

#### 2.2.3. Prophet Method

#### 2.3. Data Preprocessing

_{i}and $\widehat{{y}_{i}}$ are actual and predicted values, and n is the number of samples.

## 3. Results

## 4. Discussion and Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Acronyms

ARIMA (SARIMA) | (seasonal) autoregressive integrated moving average |

ARV | array of RMSE values |

BEWS | biological early warning systems |

DCC | data cleaning counter |

iForest | isolation forest |

LOF | local outlier level |

RMSE | root mean squared error |

SES | simple exponential smoothing |

SVM | support vector machine |

TA | training array |

VVO | valve opening value |

## References

- Van Vliet, M.T.H.; Jones, E.R.; Flörke, M.; Franssen, W.H.P.; Hanasaki, N.; Wada, Y.; Yearsley, J.R. Global water scarcity including surface water quality and expansions of clean water technologies. Environ. Res. Lett.
**2021**, 16, 024020. [Google Scholar] [CrossRef] - The Sustainable Development Goals Report 2022. Available online: https://unstats.un.org/sdgs/report/2023/The-Sustainable-Development-Goals-Report-2023.pdf (accessed on 20 March 2024).
- Wang, Z.; Walker, G.W.; Muir, D.C.; Nagatani-Yoshida, K. Toward a global understanding of chemical pollution: A first comprehensive analysis of national and regional chemical inventories. Environ. Sci. Technol.
**2020**, 54, 2575–2584. [Google Scholar] [CrossRef] [PubMed] - Lemm, J.U.; Venohr, M.; Globevnik, L.; Stefanidis, K.; Panagopoulos, Y.; van Gils, J.; Posthuma, L.; Kristensen, P.; Feld, C.K.; Mahnkopf, J.; et al. Multiple stressors determine river ecological status at the European scale: Towards an integrated understanding of river status deterioration. Glob. Change Biol.
**2021**, 27, 1962–1975. [Google Scholar] [CrossRef] [PubMed] - Kramer, K.J.M.; Botterweg, J. Aquatic biological early warning systems: An overview. In Bioindicators and Environmental Management; Jeffrey, D.W., Madden, B., Eds.; Academic Press Inc.: London, UK, 1991; pp. 95–126. [Google Scholar]
- Bae, M.J.; Park, Y.S. Biological early warning system based on the responses of aquatic organisms to disturbances: A review. Sci Total Environ.
**2014**, 466, 635–649. [Google Scholar] [CrossRef] [PubMed] - Haag, W.R.; Rypel, A.L. Growth and longevity in freshwater mussels: Evolutionary and conservation implications. Biol. Rev.
**2011**, 86, 225–247. [Google Scholar] [CrossRef] [PubMed] - Hartmann, J.T.; Beggel, S.; Auerswald, K.; Stoeckle, B.C.; Geist, J. Establishing mussel behavior as a biomarker in ecotoxicology. Aquat. Toxicol.
**2016**, 170, 279–288. [Google Scholar] [CrossRef] - Tran, D.; Ciret, P.; Ciutat, A.; Durrieu, G.; Massabuau, J.-C. Estimation of potential and limits of bivalve closure response to detect contaminants: Application to cadmium. Environ. Toxicol. Chem.
**2003**, 22, 914–920. [Google Scholar] [CrossRef] [PubMed] - Aggarwal, C.C. Data Mining: The Textbook; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
- Ahmad, S.; Lavin, A.; Purdy, S.; Agha, Z. Unsupervised real-time anomaly detection for streaming data. Neurocomputing
**2017**, 262, 134–147. [Google Scholar] [CrossRef] - Chuwang, D.D.; Chen, W. Forecasting Daily and weekly passenger demand for urban rail transit stations based on a time series model approach. Forecasting
**2022**, 4, 904–924. [Google Scholar] [CrossRef] - Menculini, L.; Marini, A.; Proietti, M.; Garinei, A.; Bozza, A.; Moretti, C.; Marconi, M. Comparing Prophet and deep learning to ARIMA in forecasting wholesale food prices. Forecasting
**2021**, 3, 644–662. [Google Scholar] [CrossRef] - Stefenon, S.F.; Seman, L.O.; Mariani, V.C.; Coelho, L.d.S. Aggregating Prophet and seasonal trend decomposition for time series forecasting of Italian electricity spot prices. Energies
**2023**, 16, 1371. [Google Scholar] [CrossRef] - Shen, J.; Valagolam, D.; McCalla, S. Prophet forecasting model: A machine learning approach to predict the concentration of air pollutants (PM2.5, PM10, O3, NO2, SO2, CO) in Seoul, South Korea. PeerJ
**2020**, 8, e9961. [Google Scholar] [CrossRef] [PubMed] - Hasnain, A.; Sheng, Y.; Hashmi, M.Z. Time series analysis and forecasting of air pollutants based on Prophet forecasting model in Jiangsu Province, China. Front. Environ. Sci.
**2022**, 10, 1044. [Google Scholar] [CrossRef] - Kramar, V.; Alchakov, V. Time-series forecasting of seasonal data using machine learning methods. Algorithms
**2023**, 16, 248. [Google Scholar] [CrossRef] - Petropoulos, F.; Spiliotis, E. The wisdom of the data: Getting the most out of univariate time series forecasting. Forecasting
**2021**, 3, 478–497. [Google Scholar] [CrossRef] - Jiao, Z.; Shan, X. A Bayesian Approach for Forecasting the Probability of Large Earthquakes Using Thermal Anomalies from Satellite Observations. Remote Sens.
**2024**, 16, 1542. [Google Scholar] [CrossRef] - Grekov, A.N.; Kabanov, A.A.; Vyshkvarkova, E.V.; Trusevich, V.V. Anomaly detection in biological early warning systems using unsupervised machine learning. Sensors
**2023**, 23, 2687. [Google Scholar] [CrossRef] [PubMed] - Grekov, A.N.; Vyshkvarkova, E.V.; Mavrin, A.S. Anomaly detection algorithm using the SARIMA model for the software of an automated complex for the aquatic environment biomonitoring. Artif. Intell. Decis. Mak.
**2024**, 1, 52–67. (In Russian) [Google Scholar] [CrossRef] - Grekov, A.N.; Kuzmin, K.A.; Mishurov, V.Z. Automated early warning system for water environment based on behavioral reactions of bivalves. In Proceedings of the 2019 International Russian Automation Conference (RusAutoCon), Sochi, Russia, 8–14 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–5. [Google Scholar]
- Valletta, J.J.; Torney, C.; Kings, M.; Thornton, A.; Madden, J. Applications of machine learning in animal behavior studies. Anim. Behav.
**2017**, 124, 203–220. [Google Scholar] [CrossRef] - Bertolini, C.; Capelle, J.; Royer, E.; Milan, M.; Witbaard, R.; Bouma, T.J.; Pastres, R. Using a clustering algorithm to identify patterns of valve-gaping behavior in mussels reared under different environmental conditions. Ecol. Inform.
**2022**, 69, e101659. [Google Scholar] [CrossRef] - Meyer, P.G.; Cherstvy, A.G.; Seckler, H.; Hering, R.; Blaum, N.; Jeltsch, F.; Metzler, R. Directedeness, correlations, and daily cycles in springbok motion: From data via stochastic models to movement prediction. Phys. Rev. Res.
**2023**, 5, 043129. [Google Scholar] [CrossRef] - Gnyubkin, V.F. An early warning system for aquatic environment state monitoring based on an analysis of mussel valve movement. Russ. J. Mar. Biol.
**2009**, 35, 431–436. [Google Scholar] [CrossRef] - Borcherding, J. Ten years of practical experience with the Dreissena-Monitor, a biological early warning system for continuous water quality monitoring. Hydrobiologia
**2006**, 556, 417–426. [Google Scholar] [CrossRef] - Assimakopoulos, V.; Nikolopoulos, K. The Theta model: A decomposition approach to forecasting. Int. J. Forecast.
**2000**, 16, 521–530. [Google Scholar] [CrossRef] - Hyndman, R.J.; Billah, B. Unmasking the Theta method. Int. J. Forecast.
**2003**, 19, 287–290. [Google Scholar] [CrossRef] - Fiorucci, J.A.; Pellegrini, T.R.; Louzada, F.; Petropoulos, F.; Koehler, A.B. Models for optimising the theta method and their relationship to state space models. Int. J. Forecast.
**2016**, 32, 1151–1161. [Google Scholar] [CrossRef] - Croston, J.D. Forecasting and Stock Control for Intermittent Demands. Oper. Res. Q.
**1972**, 23, 289–303. [Google Scholar] [CrossRef] - Svetunkov, I.; Boylan, J.E. iETS: State space model for intermittent demand forecasting. Int. J. Prod. Econ.
**2023**, 265, 109013. [Google Scholar] [CrossRef] - Prestwich, S.D.; Tarim, S.A.; Rossi, R. Intermittency and obsolescence: A Croston method with linear decay. Int. J. Forecast.
**2021**, 37, 708–715. [Google Scholar] [CrossRef] - Schultz, C.R. Forecasting and inventory control for sporadic demand under periodic review. J. Oper. Res. Soc.
**1987**, 38, 453–458. [Google Scholar] [CrossRef] - Snyder, R.D. Forecasting sales of slow and fast moving inventories. Eur. J. Oper. Res.
**2002**, 140, 684–699. [Google Scholar] [CrossRef] - Kourentzes, N. On intermittent demand model optimisation and selection. Int. J. Prod. Econ.
**2014**, 156, 180–190. [Google Scholar] [CrossRef] - Teunter, R.; Syntetos, A.A.; Babai, M.Z. Intermittent demand: Linking forecasting to inventory obsolescence. Eur. J. Oper. Res.
**2011**, 214, 606–615. [Google Scholar] [CrossRef] - Taylor, S.J.; Letham, B. Forecasting at scale. Am. Stat.
**2018**, 72, 37–45. [Google Scholar] [CrossRef] - Harvey, A.C.; Peters, S. Estimation procedures for structural time series models. J. Forecast.
**1990**, 9, 89–108. [Google Scholar] [CrossRef] - Fronzi, D.; Narang, G.; Galdelli, A.; Pepi, A.; Mancini, A.; Tazioli, A. Towards groundwater-level prediction using Prophet forecasting method by exploiting a high-resolution hydrogeological monitoring system. Water
**2024**, 16, 152. [Google Scholar] [CrossRef] - Aminikhanghahi, S.; Cook, D.J. A survey of methods for time series change point detection. Knowl. Inf. Syst.
**2017**, 51, 339–367. [Google Scholar] [CrossRef] [PubMed] - Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Müller, A.; Nothman, J.; Louppe, G.; et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Seabold, S.; Perktold, J. Statsmodels: Econometric and statistical modeling with python. Proc. 9th Python Sci. Conf.
**2010**, 57, 10–25080. [Google Scholar] - Herzen, J.; Lässig, F.; Piazzetta, S.G.; Neuer, T.; Tafti, L.; Raille, G.; Van Pottelbergh, T.; Pasieka, M.; Skrodzki, A.; Huguenin, N.; et al. Darts: User-friendly modern machine learning for time series. J. Mach. Learn. Res.
**2022**, 23, 1–6. [Google Scholar] - Scriosteanu, A.; Criveanu, M.M. Reverse Logistics of Packaging Waste under the Conditions of a Sustainable Circular Economy at the Level of the European Union States. Sustainability
**2023**, 15, 14727. [Google Scholar] [CrossRef] - De Oliveira, E.V.; Aragão, D.P.; Gonçalves, L.M.G. A New Auto-Regressive Multi-Variable Modified Auto-Encoder for Multivariate Time-Series Prediction: A Case Study with Application to COVID-19 Pandemics. Int. J. Environ. Res. Public Health
**2024**, 21, 497. [Google Scholar] [CrossRef] [PubMed] - Mirpulatov, I.; Gasanov, M.; Matveev, S. Soil Dynamics and Crop Yield Modeling Using the MONICA Crop Simulation Model and Time Series Forecasting Methods. Agronomy
**2023**, 13, 2185. [Google Scholar] [CrossRef] - Li, D.; Ma, J.; Rao, K.; Wang, X.; Li, R.; Yang, Y.; Zheng, H. Prediction of Rainfall Time Series Using the Hybrid DWT-SVR-Prophet Model. Water
**2023**, 15, 1935. [Google Scholar] [CrossRef] - Neves, D.; Monteiro, M.; Felício, M.J. Inventory Improvement in Tyre Retail through Demand Forecasting. Eng. Proc.
**2023**, 39, 1. [Google Scholar] [CrossRef] - Islam, M.K.; Hassan, N.M.S.; Rasul, M.G.; Emami, K.; Chowdhury, A.A. Forecasting of solar and wind resources for power generation. Energies
**2023**, 16, 6247. [Google Scholar] [CrossRef] - Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. The M4 Competition: 100,000 time series and 61 forecasting methods. Int. J. Forecast.
**2020**, 36, 54–74. [Google Scholar] [CrossRef] - Spiliotis, E.; Assimakopoulos, V.; Makridakis, S. Generalizing the Theta method for automatic forecasting. Eur. J. Oper. Res.
**2020**, 284, 26. [Google Scholar] [CrossRef] - Chowdari, K.K.; Barma, S.D.; Bhat, N.; Girisha, R.; Gouda, K.C. Evaluation of ARIMA, Facebook Prophet, and a boosting algorithm framework for monthly precipitation prediction of a semi-arid district of north Karnataka, India. In Proceedings of the Fourth International Conference on Emerging Research in Electronics, Computer Science, and Technology (ICERECT), Mandya, India, 26–27 December 2022; pp. 1–5. [Google Scholar]
- Xiao, Q.; Zhou, L.; Xiang, X.; Liu, L.; Liu, X.; Li, X.; Ao, T. Integration of hydrological model and time series model for improving the runoff simulation: A case study on BTOP modeling in Zhou River Basin, China. Appl. Sci.
**2022**, 12, 6883. [Google Scholar] [CrossRef] - Bolick, M.M.; Post, C.J.; Naser, M.Z.; Forghanparast, F.; Mikhailova, E.A. Evaluating urban stream flooding with machine learning, LiDAR, and 3D Modeling. Water
**2023**, 15, 2581. [Google Scholar] [CrossRef]

**Figure 1.**Biological early warning system diagram (

**a**) and the diagram of attaching mussels to the block (

**b**).

**Figure 2.**Seasonal component on a two-month interval (

**a**) after the decomposition of the averaged bivalves’ activity data and for the period 10–21 April 2017 (

**b**).

**Figure 4.**Results of choosing a fixed RMSE threshold: (

**a**) for false positives using the Theta method; (

**b**) in case of a false positive using the Prophet method, 1 min; and (

**c**) during normal operation of the Prophet model, 20 min.

**Figure 5.**Detection time of anomaly 1 (19 March 2017) by three methods, with 20 min averaging and three prediction points.

**Table 1.**The RMSE values plus 10% for different averaging times and the number of prediction points using the Theta method.

Averaging Time | Number of Prediction Points | Forecasting Horizon (min) | Training Sample Size, Days | ||||
---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | |||

10 s | 6 | 1 | * | * | * | * | * |

1 min | 6 | 6 | ** | ** | ** | ** | ** |

5 min | 6 | 30 | 0.313 | 0.314 | 0.314 | 0.314 | 0.286 |

10 min | 1 | 10 | ** | ** | ** | ** | ** |

10 min | 2 | 20 | 0.316 | 0.314 | 0.315 | 0.316 | 0.275 |

10 min | 3 | 30 | 0.295 | 0.292 | 0.256 | 0.253 | 0.264 |

10 min | 6 | 60 | 0.311 | 0.309 | 0.213 | 0.206 | 0.209 |

20 min | 3 | 60 | 0.294 | 0.289 | 0.292 | 0.275 | 0.286 |

30 min | 2 | 60 | 0.243 | 0.233 | 0.231 | 0.231 | 0.264 |

**Table 2.**The RMSE values plus 10% for different averaging times and the number of prediction points using the Croston method.

Averaging Time | Number of Prediction Points | Forecasting Horizon (min) | Training Sample Size, Days | ||||
---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | |||

10 s | 6 | 1 | ** | ** | ** | ** | ** |

1 min | 6 | 6 | ** | ** | ** | ** | ** |

5 min | 6 | 30 | 0.322 | 0.322 | 0.322 | 0.322 | 0.314 |

10 min | 1 | 10 | 0.363 | 0.361 | 0.361 | 0.361 | 0.361 |

10 min | 2 | 20 | 0.41 | 0.41 | 0.41 | 0.41 | 0.41 |

10 min | 3 | 30 | 0.313 | 0.313 | 0.292 | 0.292 | 0.292 |

10 min | 6 | 60 | 0.381 | 0.381 | 0.481 | 0.481 | 0.481 |

20 min | 3 | 60 | 0.371 | 0.571 | 0.369 | 0.371 | 0.373 |

30 min | 2 | 60 | 0.52 | 0.514 | 0.51 | 0.52 | 0.508 |

**Table 3.**The RMSE values plus 10% for different averaging times and the number of prediction points using the Prophet method.

Averaging Time | Number of Prediction Points | Forecasting Horizon (min) | Training Sample Size, Days | ||||
---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | |||

10 s | 6 | 1 | ** | ** | ** | ** | ** |

1 min | 6 | 6 | ** | ** | ** | ** | ** |

5 min | 6 | 30 | 0.281 | 0.361 | 0.425 | 0.491 | 0.578 |

10 min | 3 | 30 | 0.291 | 0.365 | 0.491 | 0.541 | 0.629 |

10 min | 6 | 60 | 0.192 | 0.323 | 0.381 | 0.422 | 0.496 |

20 min | 3 | 60 | 0.191 | 0.405 | 0.415 | 0.486 | 0.534 |

30 min | 2 | 60 | 0.264 | 0.273 | 0.290 | 0.292 | 0.284 |

**Table 4.**Anomaly detection time by three algorithms at different data averaging times and number of prediction points.

Averaging Time, Number of Prediction Points | Method | ||
---|---|---|---|

Theta | Croston | Prophet | |

Anomaly 1 (19 March 2017) | |||

5 min, 6 points | 18:35 | 18:35 | 18:35 |

10 min, 1 point | 18:40 | ||

10 min, 2 points | 18:30 | 18:30 | |

10 min, 3 points | 18:40 | 18:40 | 18:40 |

10 min, 6 points | 19:10 | 18:10 | 18:10 |

20 min, 3 points | 19:20 | 18:20 | 17:20 |

30 min, 3 points | 18:30 | 19:30 | 18:30 |

Anomaly 2 (14 April 2017) | |||

5 min, 6 points | 12:05 | 12:05 | 12:05 |

10 min, 1 point | 12:30 | ||

10 min, 2 points | 12:30 | 12:30 | |

10 min, 3 points | 12:40 | 12:40 | 12:40 |

10 min, 6 points | 12:10 | 12:10 | 12:10 |

20 min, 3 points | 12:20 | 12:20 | 12:20 |

30 min, 3 points | 12:30 | 12:30 | 12:30 |

Anomaly 3 (24 April 2017) | |||

5 min, 6 points | 17:35 | 17:35 | 17:35 |

10 min, 1 point | 17:50 | ||

10 min, 2 points | 17:50 | 17:50 | |

10 min, 3 points | 17:40 | 17:40 | 17:40 |

10 min, 6 points | 17:10 | 17:10 | 17:10 |

20 min, 3 points | 17:20 | 17:20 | 17:20 |

30 min, 2 points | 17:30 | 17:30 | 17:30 |

Averaging Time | Method | |||
---|---|---|---|---|

Theta | Croston | Prophet | SARIMA | |

10 s, 6 points | 4 h | 40 min | 1 h 10 min | 2 h 22 min |

1 min, 6 points | 13 min | 1 min 25 s | 15 min | 1 h 43 min |

5 min, 6 points | 3 min 36 s | 40 s | 12 min | 1 h 40 min |

10 min, 1 point | 3 min 33 s | 55 s | 12 min | 1 h 27 min |

10 min, 2 points | 1 min 54 s | 37 s | 10 min | 1 h 10 min |

10 min, 3 points | 1 min 20 s | 33 s | 9 min | 1 h 2 min |

10 min, 6 points | 46 s | 27 s | 7 min | 52 min |

20 min, 3 points | 35 s | 23 s | 6 min | 50 min |

30 min, 2 points | 29 s | 21 s | 5 min | 44 min |

**Table 6.**Anomaly detection time for Anomaly 1 (19 March 2017) by Prophet and unsupervised machine learning algorithms.

Method | Anomaly Detection Time | Model Setting (Averaging Time) |
---|---|---|

Prophet | 17:20 | 20 min |

Elliptic envelope | 18:30 | 15 min |

iForest | 18:15 | 15 min |

LOF | 18:35 | 5 min |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Grekov, A.N.; Vyshkvarkova, E.V.; Mavrin, A.S.
Forecasting and Anomaly Detection in BEWS: Comparative Study of Theta, Croston, and Prophet Algorithms. *Forecasting* **2024**, *6*, 343-356.
https://doi.org/10.3390/forecast6020019

**AMA Style**

Grekov AN, Vyshkvarkova EV, Mavrin AS.
Forecasting and Anomaly Detection in BEWS: Comparative Study of Theta, Croston, and Prophet Algorithms. *Forecasting*. 2024; 6(2):343-356.
https://doi.org/10.3390/forecast6020019

**Chicago/Turabian Style**

Grekov, Aleksandr N., Elena V. Vyshkvarkova, and Aleksandr S. Mavrin.
2024. "Forecasting and Anomaly Detection in BEWS: Comparative Study of Theta, Croston, and Prophet Algorithms" *Forecasting* 6, no. 2: 343-356.
https://doi.org/10.3390/forecast6020019