Improved Deep Learning Predictions for Chlorophyll Fluorescence Based on Decomposition Algorithms: The Importance of Data Preprocessing
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Sites and Data
2.2. Decomposition-Based Deep Learning Model Development
2.2.1. The Multi-Decomposition Architecture
2.2.2. Wavelet Transformation Analysis
2.2.3. Ensemble Empirical Mode Decomposition (EEMD)
- (1)
- Add random white noise to the original time series, , where is known.
- (2)
- The EMD algorithm is employed to decompose the composited sequences with noise into , (j = 1, 2, …, K), and the residual, .
- (3)
- Repeat the steps described above N times, each time using different Gaussian white noise, and determine the corresponding IMFs.
- (4)
- Repeat the aforementioned procedure N times, introducing different Gaussian white noise in each iteration, and obtain each corresponding IMF. Compute the average of the sum of the corresponding decomposed IMFs over N iterations to mitigate the impact of the introduced white noise on the original signal. The j-th IMF component is as follows:
- (5)
- Finally, the original series, f(t), are decomposed by EEMD models, which can be expressed as follows:
2.2.4. Singular Spectral Analysis
2.2.5. Convolutional Neural Network
2.2.6. Long Short-Term Memory
2.3. Model Implementation
2.4. Evaluation Metrics
3. Results and Discussion
3.1. The Process of Different Decomposition-Based Algorithms for Chlorophyll Fluorescence Data
3.2. Evaluating the Predictive Performance of Deep Learning Based on Multi-Decomposition Process
3.3. Comparing the Effectiveness of Different Decomposition Approaches in Forecasting HABs
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Huisman, J.; Codd, G.A.; Paerl, H.W.; Ibelings, B.W.; Verspagen, J.M.H.; Visser, P.M. Cyanobacterial blooms. Nat. Rev. Microbiol. 2018, 16, 471–483. [Google Scholar] [CrossRef] [PubMed]
- Shan, K.; Ouyang, T.; Wang, X.; Yang, H.; Zhou, B.; Wu, Z.; Shang, M. Temporal prediction of algal parameters in Three Gorges Reservoir based on highly time-resolved monitoring and long short-term memory network. J. Hydrol. 2022, 605, 127304. [Google Scholar] [CrossRef]
- Thyssen, M.; Tarran, G.A.; Zubkov, M.V.; Holland, R.J.; Gregori, G.; Burkill, P.H.; Denis, M. The emergence of automated high-frequency flow cytometry: Revealing temporal and spatial phytoplankton variability. J. Plankton Res. 2008, 30, 333–343. [Google Scholar] [CrossRef]
- Yan, Y.; Bao, Z.; Shao, J. Phycocyanin concentration retrieval in inland waters: A comparative review of the remote sensing techniques and algorithms. J. Great Lakes Res. 2018, 44, 748–755. [Google Scholar] [CrossRef]
- Cheng, K.H.; Chan, S.N.; Lee, J.H.W. Remote sensing of coastal algal blooms using unmanned aerial vehicles (UAVs). Mar. Pollut. Bull. 2020, 152, 110889. [Google Scholar] [CrossRef] [PubMed]
- Bertone, E.; Burford, M.A.; Hamilton, D.P. Fluorescence probes for real-time remote cyanobacteria monitoring: A review of challenges and opportunities. Water Res. 2018, 141, 152–162. [Google Scholar] [CrossRef]
- Shin, J.; Yoon, S.; Kim, Y.; Kim, T.; Go, B.; Cha, Y. Effects of class imbalance on resampling and ensemble learning for improved prediction of cyanobacteria blooms. Ecol. Inform. 2021, 61, 101202. [Google Scholar] [CrossRef]
- Cui, Z.D.; Du, D.P.; Zhang, X.L.; Yang, Q. Modeling and Prediction of Environmental Factors and Chlorophyll a Abundance by Machine Learning Based on Tara Oceans Data. J. Mar. Sci. Eng. 2022, 10, 1749. [Google Scholar] [CrossRef]
- Deng, T.; Chau, K.-W.; Duan, H.-F. Machine learning based marine water quality prediction for coastal hydro-environment management. J. Environ. Manag. 2021, 284, 112051. [Google Scholar] [CrossRef]
- Recknagel, F.; Orr, P.T.; Bartkow, M.; Swanepoel, A.; Cao, H. Early warning of limit-exceeding concentrations of cyanobacteria and cyanotoxins in drinking water reservoirs by inferential modelling. Harmful Algae 2017, 69, 18–27. [Google Scholar] [CrossRef]
- Shen, J.; Qin, Q.; Wang, Y.; Sisson, M. A data-driven modeling approach for simulating algal blooms in the tidal freshwater of James River in response to riverine nutrient loading. Ecol. Model. 2019, 398, 44–54. [Google Scholar] [CrossRef]
- ASegura, M.; Piccini, C.; Nogueira, L.; Alcantara, I.; Calliari, D.; Kruk, C. Increased sampled volume improves Microcystis aeruginosa complex (MAC) colonies detection and prediction using Random Forests. Ecol. Indic. 2017, 79, 347–354. [Google Scholar] [CrossRef]
- Xia, R.; Wang, G.S.; Zhang, Y.; Yang, P.; Yang, Z.W.; Ding, S.; Jia, X.B.; Yang, C.; Liu, C.J.; Ma, S.Q.; et al. River algal blooms are well predicted by antecedent environmental conditions. Water Res. 2020, 185, 129583. [Google Scholar] [CrossRef] [PubMed]
- Faruk, D.O. A hybrid neural network and ARIMA model for water quality time series prediction. Eng. Appl. Artif. Intell. 2010, 23, 586–594. [Google Scholar] [CrossRef]
- Vincon-Leite, B.; Casenave, C. Modelling eutrophication in lake ecosystems: A review. Sci. Total Environ. 2019, 651, 2985–3001. [Google Scholar] [CrossRef] [PubMed]
- Williams, R.J.; Zipser, D. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks. Neural Comput. 1989, 1, 270–280. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Hill, P.R.; Kumar, A.; Temimi, M.; Bull, D.R. HABNet: Machine learning, remote sensing-based detection of harmful algal blooms. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 3229–3239. [Google Scholar] [CrossRef]
- Zheng, L.; Wang, H.P.; Liu, C.; Zhang, S.R.; Ding, A.Z.; Xie, E.; Li, J.; Wang, S.R. Prediction of harmful algal blooms in large water bodies using the combined EFDC and LSTM models. J. Environ. Manag. 2021, 295, 109027. [Google Scholar] [CrossRef]
- Cao, H.; Han, L.; Li, L. A deep learning method for cyanobacterial harmful algae blooms prediction in Taihu Lake, China. Harmful Algae 2022, 113, 102189. [Google Scholar] [CrossRef] [PubMed]
- Pyo, J.; Duan, H.; Baek, S.; Kim, M.S.; Jeon, T.; Kwon, Y.S.; Lee, H.; Cho, K.H. A convolutional neural network regression for quantifying cyanobacteria using hyperspectral imagery. Remote Sens. Environ. 2019, 233, 111350. [Google Scholar] [CrossRef]
- Muharemi, F.; Logofatu, D.; Leon, F. Machine learning approaches for anomaly detection of water quality on a real-world data set. J. Inf. Telecommun. 2019, 3, 294–307. [Google Scholar] [CrossRef]
- Xiao, X.; He, J.Y.; Huang, H.M.; Miller, T.R.; Christakos, G.; Reichwaldt, E.S.; Ghadouani, A.; Lin, S.P.; Xu, X.H.; Shi, J.Y. A novel single-parameter approach for forecasting algal blooms. Water Res. 2017, 108, 222–231. [Google Scholar] [CrossRef]
- Wang, L.P.; Zheng, B.H. Prediction of chlorophyll-a in the Daning River of Three Gorges Reservoir by principal component scores in multiple linear regression models. Water Sci. Technol. 2013, 67, 1150–1158. [Google Scholar]
- Liu, M.Y.; He, J.Y.; Huang, Y.Z.; Tang, T.; Hu, J.; Xiao, X. Algal bloom forecasting with time-frequency analysis: A hybrid deep learning approach. Water Res. 2022, 219, 118591. [Google Scholar] [CrossRef] [PubMed]
- Buyuksahin, U.C.; Ertekina, S. Improving forecasting accuracy of time series data using a new ARIMA-ANN hybrid method and empirical mode decomposition. Neurocomputing 2019, 361, 151–163. [Google Scholar] [CrossRef]
- Qian, Z.; Pei, Y.; Zareipour, H.; Chen, N. A review and discussion of decomposition-based hybrid models for wind energy forecasting applications. Appl. Energy 2019, 235, 939–953. [Google Scholar] [CrossRef]
- Zhu, Y.; Gao, Y.; Wang, Z.; Cao, G.; Wang, R.; Lu, S.; Li, W.; Nie, W.; Zhang, Z. A Tailings Dam Long-Term Deformation Prediction Method Based on Empirical Mode Decomposition and LSTM Model Combined with Attention Mechanism. Water 2022, 14, 1229. [Google Scholar] [CrossRef]
- Luo, L.; Zhang, Y.; Dong, W.; Zhang, J.; Zhang, L. Ensemble Empirical Mode Decomposition and a Long Short-Term Memory Neural Network for Surface Water Quality Prediction of the Xiaofu River, China. Water 2023, 15, 1625. [Google Scholar] [CrossRef]
- Apaydin, H.; Sattari, M.T.; Falsafian, K.; Prasad, R. Artificial intelligence modelling integrated with Singular Spectral analysis and Seasonal-Trend decomposition using Loess approaches for streamflow predictions. J. Hydrol. 2021, 600, 126506. [Google Scholar] [CrossRef]
- Azimi, S.; Dariane, A.B. Streamflow forecasting by combining neural networks and fuzzy models using advanced methods of input variable selection. J. Hydroinform. 2018, 20, 520–532. [Google Scholar]
- Wang, J.-H.; Li, C.; Xu, Y.-P.; Li, S.-Y.; Du, J.-S.; Han, Y.-P.; Hu, H.-Y. Identifying major contributors to algal blooms in Lake Dianchi by analyzing river-lake water quality correlations in the watershed. J. Clean. Prod. 2021, 315, 128144. [Google Scholar] [CrossRef]
- Liu, G.; Liu, Z.; Chen, F.; Zhang, Z.; Gu, B.; Smoak, J.M. Response of the cladoceran community to eutrophication, fish introductions and degradation of the macrophyte vegetation in Lake Dianchi, a large, shallow plateau lake in southwestern China. Limnology 2013, 14, 159–166. [Google Scholar] [CrossRef]
- Wu, Y.; Li, L.; Zheng, L.; Dai, G.; Ma, H.; Shan, K.; Wu, H.; Zhou, Q.; Song, L. Patterns of succession between bloom-forming cyanobacteria Aphanizomenon flos-aquae and Microcystis and related environmental factors in large, shallow Dianchi Lake, China. Hydrobiologia 2016, 765, 1–13. [Google Scholar] [CrossRef]
- Labat, D. Recent advances in wavelet analyses: Part 1. A review of concepts. J. Hydrol. 2005, 314, 275–288. [Google Scholar] [CrossRef]
- Kim, M.E.; Shon, T.S.; Shin, H.S. Forecasting algal bloom (chl-a) on the basis of coupled wavelet transform and artificial neural networks at a large lake. Desalin. Water Treat. 2013, 51, 4118–4128. [Google Scholar] [CrossRef]
- Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.; Shih, H.H.; Zheng, Q.N.; Yen, N.C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 1998, 454, 903–995. [Google Scholar] [CrossRef]
- Zhang, C.; Fu, S.; Ou, B.; Liu, Z.; Hu, M. Prediction of Dam Deformation Using SSA-LSTM Model Based on Empirical Mode Decomposition Method and Wavelet Threshold Noise Reduction. Water 2022, 14, 3380. [Google Scholar] [CrossRef]
- Zhaohua, W.U.; Norden, E.H. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [Google Scholar]
- Golyandina, N. Particularities and commonalities of singular spectrum analysis as a method of time series analysis and signal processing. Wiley Interdiscip. Rev.-Comput. Stat. 2020, 12, 4. [Google Scholar] [CrossRef]
- Cui, Z.; Qing, X.; Chai, H.; Yang, S.; Zhu, Y.; Wang, F. Real-time rainfall-runoff prediction using light gradient boosting machine coupled with singular spectrum analysis. J. Hydrol. 2021, 603, 127124. [Google Scholar] [CrossRef]
- Seo, Y.; Kim, S.; Kisi, O.; Singh, V.P. Daily water level forecasting using wavelet decomposition and artificial intelligence techniques. J. Hydrol. 2015, 520, 224–243. [Google Scholar] [CrossRef]
- Sahay, R.R.; Srivastava, A. Predicting Monsoon Floods in Rivers Embedding Wavelet Transform, Genetic Algorithm and Neural Network. Water Resour. Manag. 2013, 28, 301–317. [Google Scholar] [CrossRef]
- Anctil, F.; Tape, D.G. An exploration of artificial neural network rainfall-runoff forecasting combined with wavelet decomposition. J. Environ. Eng. Sci. 2004, 3 (Suppl. S1), S121–S128. [Google Scholar] [CrossRef]
- Tahroudi, M.N.; Mirabbasi, R. Frequency decomposition associated with machine learning algorithms and copula modeling for river flow prediction. Stoch. Environ. Res. Risk Assess. 2023, 37, 2897–2918. [Google Scholar] [CrossRef]
- Luo, S.; Zhang, M.; Nie, Y.; Jia, X.; Cao, R.; Zhu, M.; Li, X. Forecasting of monthly precipitation based on ensemble empirical mode decomposition and Bayesian model averaging. Front. Earth Sci. 2022, 10, 926067. [Google Scholar] [CrossRef]
- Yuan, R.; Cai, S.; Liao, W.; Lei, X.; Zhang, Y.; Yin, Z.; Ding, G.; Wang, J.; Xu, Y. Daily Runoff Forecasting Using Ensemble Empirical Mode Decomposition and Long Short-Term Memory. Front. Earth Sci. 2021, 9, 621780. [Google Scholar] [CrossRef]
- Chen, Y.-C.; Yeh, H.-C.; Kao, S.-P.; Wei, C.; Su, P.-Y. Water Level Forecasting in Tidal Rivers during Typhoon Periods through Ensemble Empirical Mode Decomposition. Hydrology 2023, 10, 47. [Google Scholar] [CrossRef]
- Ali, M.; Prasad, R.; Xiang, Y.; Yaseen, Z.M. Complete ensemble empirical mode decomposition hybridized with random forest and kernel ridge regression model for monthly rainfall forecasts. J. Hydrol. 2020, 584, 124647. [Google Scholar] [CrossRef]
- Apaydin, H.; Sibtain, M. A multivariate streamflow forecasting model by integrating improved complete ensemble empirical mode decomposition with additive noise, sample entropy, Gini index and sequence-to-sequence approaches. J. Hydrol. 2021, 603, 126831. [Google Scholar] [CrossRef]
- Rezaie-Balf, M.; Kim, S.; Fallah, H.; Alaghmand, S. Daily river flow forecasting using ensemble empirical mode decomposition based heuristic regression models: Application on the perennial rivers in Iran and South Korea. J. Hydrol. 2019, 572, 470–485. [Google Scholar] [CrossRef]
- Unnikrishnan, P.; Jothiprakash, V. Hybrid SSA-ARIMA-ANN Model for Forecasting Daily Rainfall. Water Resour. Manag. 2020, 34, 3609–3623. [Google Scholar] [CrossRef]
- Zhang, Q.; Wang, B.-D.; He, B.; Peng, Y.; Ren, M.-L. Singular Spectrum Analysis and ARIMA Hybrid Model for Annual Runoff Forecasting. Water Resour. Manag. 2011, 25, 2683–2703. [Google Scholar] [CrossRef]
Location | Sample Size | Mean | Standard Deviation | Min | Median | Max |
---|---|---|---|---|---|---|
Duanqiao | 4081 | 8.46 | 6.19 | 0.32 | 6.95 | 55 |
Caohai Center | 4081 | 14.41 | 9.44 | 0.49 | 13 | 60 |
Parameter Name | CNN | LSTM |
---|---|---|
Time Lag | 10 | 10 |
Hidden Size | 16 | 16 |
Learning Rate | 0.001 | 0.002 |
Epoch | 100 | 50 |
Batch Size | 32 | 32 |
Activation Function | relu | relu |
Kernel_Size | 3 | / |
Input_size | 1 | 1 |
Models | Training Set | Test Set | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSE (μg/L) | MAE (μg/L) | R2 | RMSE (μg/L) | MAE (μg/L) | R2 | |||||||
LSTM | 3.435 | 2.324 | 0.813 | 5.282 | 3.525 | 0.738 | ||||||
EEMD-LSTM | 2.143 | (+Δ37.6%) | 1.475 | (+Δ36.6%) | 0.927 | (+Δ14.0%) | 3.912 | (+Δ26.0%) | 2.383 | (+Δ32.4%) | 0.857 | (+Δ16.0%) |
WT-LSTM | 1.255 | (+Δ63.5%) | 0.888 | (+Δ61.8%) | 0.975 | (+Δ19.9%) | 2.198 | (+Δ58.4%) | 1.405 | (+Δ60.1%) | 0.955 | (+Δ29.3%) |
SSA-LSTM | 0.740 | (+Δ78.5%) | 0.491 | (+Δ78.9%) | 0.991 | (+Δ21.9%) | 1.518 | (+Δ71.3%) | 0.855 | (+Δ75.7%) | 0.978 | (+Δ32.5%) |
CNN | 3.405 | 2.323 | 0.816 | 5.364 | 3.538 | 0.730 | ||||||
EEMD-CNN | 2.122 | (+Δ37.7%) | 1.477 | (+Δ36.4%) | 0.929 | (+Δ13.8%) | 3.719 | (+Δ30.7%) | 2.461 | (+Δ30.4%) | 0.870 | (+Δ19.2%) |
WT-CNN | 1.499 | (+Δ56.0%) | 1.035 | (+Δ55.4%) | 0.964 | (+Δ18.1%) | 2.635 | (+Δ50.9%) | 1.742 | (+Δ50.8%) | 0.935 | (+Δ28.0%) |
SSA-CNN | 1.266 | (+Δ62.8%) | 0.900 | (+Δ61.3%) | 0.975 | (+Δ19.4%) | 1.927 | (+Δ64.1%) | 1.361 | (+Δ61.5%) | 0.965 | (+Δ32.2%) |
Models | Training Set | Test Set | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSE (μg/L) | MAE (μg/L) | R2 | RMSE (μg/L) | MAE (μg/L) | R2 | |||||||
LSTM | 2.381 | 1.588 | 0.739 | 3.624 | 2.295 | 0.723 | ||||||
EEMD-LSTM | 1.281 | (+Δ46.2%) | 0.913 | (+Δ42.5%) | 0.924 | (+Δ25.1%) | 2.378 | (+Δ34.4%) | 1.452 | (+Δ36.7%) | 0.881 | (+Δ21.8%) |
WT-LSTM | 0.855 | (+Δ64.1%) | 0.595 | (+Δ62.6%) | 0.966 | (+Δ30.8%) | 1.633 | (+Δ54.9%) | 0.942 | (+Δ58.9%) | 0.944 | (+Δ30.5%) |
SSA-LSTM | 0.650 | (+Δ72.7%) | 0.543 | (+Δ65.8%) | 0.981 | (+Δ32.7%) | 1.092 | (+Δ69.9%) | 0.702 | (+Δ69.4%) | 0.975 | (+Δ34.8%) |
CNN | 2.444 | 1.680 | 0.725 | 3.436 | 2.251 | 0.751 | ||||||
EEMD-CNN | 1.281 | (+Δ47.6%) | 0.931 | (+Δ44.6%) | 0.924 | (+Δ27.6%) | 2.554 | (+Δ25.7%) | 1.614 | (+Δ28.3%) | 0.863 | (+Δ14.8%) |
WT-CNN | 1.309 | (+Δ46.4%) | 1.116 | (+Δ33.6%) | 0.921 | (+Δ27.1%) | 1.699 | (+Δ50.6%) | 1.284 | (+Δ43.0%) | 0.939 | (+Δ25.0%) |
SSA-CNN | 0.858 | (+Δ64.9%) | 0.691 | (+Δ58.8%) | 0.966 | (+Δ33.3%) | 1.513 | (+Δ56.0%) | 1.025 | (+Δ54.5%) | 0.952 | (+Δ26.7%) |
Methods | Advantages | Disadvantages | Variables |
---|---|---|---|
Wavelet Transform (WT) | Strict mathematical theory. Appropriate for steady-frequency and nearly periodic signals. | Requires the prior specification of wavelet basis and parameters. Separates the modes. Not suitable for highly non-stationary signals. | Water level [43] |
Algal bloom [26] | |||
Precipitation [44] | |||
Rainfall runoff [45] | |||
River flow [46] | |||
Ensemble Empirical Mode Decomposition (EEMD) | Fully data-driven. Addresses mode mixing issue. Suitable for both non-linear and non-stable signals. Fully adaptive by originally introducing the intrinsic mode functions (IMFs). | Lacks rigorous mathematical theory. Additional noise is present in the reconstructed signal. Needs many computational resources. | Precipitation [47] |
Daily Runoff [48] | |||
Water Level [49] | |||
Rainfall [50] | |||
Streamflow [51] | |||
River flow [52] Water Quality [30] | |||
Singular Spectrum Analysis (SSA) | Strict mathematical theory. Decompose a time series into distinct components. | Parameters must be fine-tuned to isolate each component. | Streamflow [31] |
Rainfall [53] | |||
Runoff [54] | |||
Rainfall runoff [42] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, L.; Xie, M.; Pan, M.; He, F.; Yang, B.; Gong, Z.; Wu, X.; Shang, M.; Shan, K. Improved Deep Learning Predictions for Chlorophyll Fluorescence Based on Decomposition Algorithms: The Importance of Data Preprocessing. Water 2023, 15, 4104. https://doi.org/10.3390/w15234104
Wang L, Xie M, Pan M, He F, Yang B, Gong Z, Wu X, Shang M, Shan K. Improved Deep Learning Predictions for Chlorophyll Fluorescence Based on Decomposition Algorithms: The Importance of Data Preprocessing. Water. 2023; 15(23):4104. https://doi.org/10.3390/w15234104
Chicago/Turabian StyleWang, Lan, Mingjiang Xie, Min Pan, Feng He, Bing Yang, Zhigang Gong, Xuke Wu, Mingsheng Shang, and Kun Shan. 2023. "Improved Deep Learning Predictions for Chlorophyll Fluorescence Based on Decomposition Algorithms: The Importance of Data Preprocessing" Water 15, no. 23: 4104. https://doi.org/10.3390/w15234104
APA StyleWang, L., Xie, M., Pan, M., He, F., Yang, B., Gong, Z., Wu, X., Shang, M., & Shan, K. (2023). Improved Deep Learning Predictions for Chlorophyll Fluorescence Based on Decomposition Algorithms: The Importance of Data Preprocessing. Water, 15(23), 4104. https://doi.org/10.3390/w15234104