Sustainable Sewage Treatment Prediction Using Integrated KAN-LSTM with Multi-Head Attention
Abstract
:1. Introduction
2. Methods
2.1. Time-Series Lag Analysis of Indicators in Sewage Treatment
2.2. Integrated KAN-LSTM Model
2.2.1. Time-Series Prediction Framework
- (1)
- Long-term dependency capture via LSTM’s gated memory mechanism.
- (2)
- Dynamic feature weighting through multi-head attention.
2.2.2. KAN Layer (Kolmogorov–Arnold Network Layer)
2.2.3. LSTM
- (1)
- Forget gate: Determines how much past information should be forgotten:
- (2)
- Input gate: Determines the importance of the current input:
- (3)
- Cell-state update: Combines information from the forget and input gates:
- (4)
- Output gate: Determines which information should be passed to the hidden state:
2.2.4. Multi-Head Attention Mechanism
- (1)
- Enhancing long-term dependencies: Compared with LSTM, the attention mechanism can capture long-range dependencies and avoid the gradient vanishing problem.
- (2)
- Different feature weight allocation: Multiple attention heads can focus on different time steps, improving the model’s representation ability.
- (3)
- Improving nonlinear modeling capabilities: Combined with the KAN layer, the multi-head attention mechanism further enhances nonlinear mapping and improves prediction capabilities.
2.2.5. Model Interpretability
3. Experimental Results and Analysis
3.1. Experimental Data
3.1.1. Data Normalization
3.1.2. Data Partitioning
3.1.3. Sequence Construction
3.1.4. Data Representation
3.2. Data Source
3.3. Model Training and Parameter Settings
3.4. Model Evaluation Metrics
Experimental Results
3.5. Analysis of Sewage Treatment Data Results
3.6. Ablation Study
3.6.1. Experimental Setup
3.6.2. Results and Analysis
- (1)
- The comparative analysis reveals that augmenting the baseline LSTM with multi-head attention consistently enhances predictive performance across all evaluation metrics. Most strikingly, the R² score for pH prediction demonstrates a remarkable improvement from 30.13% to 53.67%. These results confirm the attention mechanism’s capability to effectively identify and prioritize critical temporal patterns in sewage treatment process data.
- (2)
- The comparative analysis between LSTM with attention and LSTM with KAN reveals the latter’s distinct advantage in modeling performance. For FOss prediction, incorporating the KAN layer yields measurable improvements, reducing the MAE from 0.92 to 0.79 and RMSE from 1.30 to 1.08. These results highlight the KAN layer’s exceptional capacity for capturing complex nonlinear patterns through its dynamic B-spline transformations, outperforming the attention mechanism alone in feature representation.
- (3)
- The comparison between LSTM+KAN and the complete integrated KAN-LSTM (with additional multi-head attention) reveals the added value of attention mechanisms. Most notably, in TWTN prediction, incorporating attention yields measurable gains—the MAE improves from 1.69 to 1.41 while R² increases from 78.56% to 80.12%. These improvements confirm that attention mechanisms provide critical complementary functionality to KAN’s nonlinear modeling, enabling the dynamic weighting of temporal features in sewage treatment processes.
3.7. Discussion
4. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
LSTM | Long short-term memory; |
KAN | Kolmogorov–Arnold networK; |
FOss | Final sedimentation basin outflow suspended solid; |
TWT | Treated water transparency; |
TWTN | Treated water total nitrogen; |
MAE | Mean absolute error; |
RMSE | Root mean squared error; |
SDGs | Sustainable Development Goals. |
References
- Van Haandel, A.C.; Lettinga, G. Anaerobic sewage treatment. In A Practical Guide for Regions with a Hot Climate; John Whiley and Sons: London, UK, 1994. [Google Scholar]
- Jin, L.; Zhang, G.; Tian, H. Current state of sewage treatment in China. Water Res. 2014, 66, 85–98. [Google Scholar] [CrossRef]
- Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems 28 (NIPS 2015); Curran Associates, Inc.: Red Hook, NY, USA, 2015; Volume 28. [Google Scholar]
- Box, G.E.; Pierce, D.A. Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. J. Am. Stat. Assoc. 1970, 65, 1509–1526. [Google Scholar] [CrossRef]
- Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]
- Hamilton, J.D. Time Series Analysis; Princeton University Press: Princeton, NJ, USA, 2020. [Google Scholar]
- Xu, Y.; Mou, L.; Li, G.; Chen, Y.; Peng, H.; Jin, Z. Classifying relations via long short term memory networks along shortest dependency paths. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1785–1794. [Google Scholar]
- Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef]
- Lipton, Z.C.; Berkowitz, J.; Elkan, C. A critical review of recurrent neural networks for sequence learning. arXiv 2015, arXiv:1506.00019. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Zheng, J.; Li, H.; Suzuki, G.; Shioya, H. Time Series Lag Analysis of Indicators in Sewage Treatment Process. In Proceedings of the 2024 IEEE 13th Global Conference on Consumer Electronics (GCCE), Kitakyushu, Japan, 29 October–1 November 2024; IEEE: New York, NY, USA, 2024; pp. 377–378. [Google Scholar]
- Angeler, D.G.; Viedma, O.; Moreno, J. Statistical performance and information content of time lag analysis and redundancy analysis in time series modeling. Ecology 2009, 90, 3245–3257. [Google Scholar] [CrossRef]
- Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. Kan: Kolmogorov-arnold networks. arXiv 2024, arXiv:2404.19756. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems 30 (NIPS 2017); Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Wang, R.F.; Su, W.H. The application of deep learning in the whole potato production Chain: A Comprehensive review. Agriculture 2024, 14, 1225. [Google Scholar] [CrossRef]
- Chen, Y.; Pock, T. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1256–1272. [Google Scholar] [CrossRef]
- Pan, C.H.; Qu, Y.; Yao, Y.; Wang, M.J.S. HybridGNN: A Self-Supervised Graph Neural Network for Efficient Maximum Matching in Bipartite Graphs. Symmetry 2024, 16, 1631. [Google Scholar] [CrossRef]
- Qin, Y.M.; Tu, Y.H.; Li, T.; Ni, Y.; Wang, R.F.; Wang, H. Deep Learning for Sustainable Agriculture: A Systematic Review on Applications in Lettuce Cultivation. Sustainability 2025, 17, 3190. [Google Scholar] [CrossRef]
- Cohen, I.; Huang, Y.; Chen, J.; Benesty, J.; Benesty, J.; Chen, J.; Huang, Y.; Cohen, I. Pearson correlation coefficient. In Noise Reduction in Speech Processing; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–4. [Google Scholar]
- Zhang, Y.; Suzuki, G.; Shioya, H. Prediction and detection of sewage treatment process using N-BEATS autoencoder network. IEEE Access 2022, 10, 112594–112608. [Google Scholar] [CrossRef]
- Gardner, M.W.; Dorling, S.R. Artificial neural networks (the multilayer perceptron)—A review of applications in the atmospheric sciences. Atmos. Environ. 1998, 32, 2627–2636. [Google Scholar] [CrossRef]
- Ta, H.T. BSRBF-KAN: A combination of B-splines and Radial Basis Functions in Kolmogorov-Arnold Networks. arXiv 2024, arXiv:2406.11173. [Google Scholar]
- Ullah, A.; Muhammad, K.; Del Ser, J.; Baik, S.W.; de Albuquerque, V.H.C. Activity recognition using temporal optical flow convolutional features and multilayer LSTM. IEEE Trans. Ind. Electron. 2018, 66, 9692–9702. [Google Scholar] [CrossRef]
- Tao, C.; Gao, S.; Shang, M.; Wu, W.; Zhao, D.; Yan, R. Get The Point of My Utterance! Learning Towards Effective Responses with Multi-Head Attention Mechanism. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 4418–4424. [Google Scholar]
- Singh, D.; Singh, B. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. 2020, 97, 105524. [Google Scholar] [CrossRef]
- Quackenbush, J. Microarray data normalization and transformation. Nat. Genet. 2002, 32, 496–501. [Google Scholar] [CrossRef]
- Ali, P.J.M.; Faraj, R.H.; Koya, E.; Ali, P.J.M.; Faraj, R.H. Data normalization and standardization: A technical report. Mach. Learn Tech. Rep. 2014, 1, 1–6. [Google Scholar]
- Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
- Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
- Nakagawa, S.; Schielzeth, H. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol. Evol. 2013, 4, 133–142. [Google Scholar] [CrossRef]
- Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 2222–2232. [Google Scholar] [CrossRef]
- Genet, R.; Inzirillo, H. Tkan: Temporal kolmogorov-arnold networks. arXiv 2024, arXiv:2405.07344. [Google Scholar] [CrossRef]
- Oreshkin, B.N.; Carpov, D.; Chapados, N.; Bengio, Y. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. arXiv 2019, arXiv:1905.10437. [Google Scholar]
- Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
- Cui, K.; Camalan, S.; Li, R.; Pauca, V.P.; Alqahtani, S.; Plemmons, R.; Silman, M.; Dethier, E.N.; Lutz, D.; Chan, R. Semi-supervised change detection of small water bodies using RGB and multispectral images in peruvian rainforests. In Proceedings of the 2022 12th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS), Rome, Italy, 13–16 September 2022; IEEE: New York, NY, USA, 2022; pp. 1–5. [Google Scholar]
- Tsaregorodtsev, A.; Garonne, V.; Stokes-Rees, I. DIRAC: A scalable lightweight architecture for high throughput computing. In Proceedings of the Fifth IEEE/ACM International Workshop on Grid Computing, Pittsburgh, PA, USA, 8 November 2004; IEEE: New York, NY, USA, 2004; pp. 19–25. [Google Scholar]
- Guppy, L.; Mehta, P.; Qadir, M. Sustainable development goal 6: Two gaps in the race for indicators. Sustain. Sci. 2019, 14, 501–513. [Google Scholar] [CrossRef]
- Cosgrove, W.J.; Loucks, D.P. Water management: Current and future challenges and research directions. Water Resour. Res. 2015, 51, 4823–4839. [Google Scholar] [CrossRef]
- Camalan, S.; Cui, K.; Pauca, V.P.; Alqahtani, S.; Silman, M.; Chan, R.; Plemmons, R.J.; Dethier, E.N.; Fernandez, L.E.; Lutz, D.A. Change detection of amazonian alluvial gold mining using deep learning and sentinel-2 imagery. Remote Sens. 2022, 14, 1746. [Google Scholar] [CrossRef]
- Yao, M.; Huo, Y.; Tian, Q.; Zhao, J.; Liu, X.; Wang, R.; Xue, L.; Wang, H. FMRFT: Fusion mamba and DETR for query time sequence intersection fish tracking. arXiv 2024, arXiv:2409.01148. [Google Scholar]
Item | Unit | Abbreviation |
---|---|---|
Inflow water Biochemical oxygen demand | IF-BOD | |
Mixed liquor suspended solids | MLSS | |
Dissolved oxygen | DO | |
Return sludge concentration | RSC | |
Sludge settling velocity | % | SV |
Reaction tank pH | - | PH |
Preliminary sedimentation basin outflow suspended solid | Foss | |
Final sedimentation basin outflow suspended solid | FOss | |
Inflow water volume | IWV | |
Water temperature | °C | WT |
Return sludge amount | RSV | |
Aeration air volume | AAV | |
Treated water transparency | - | TWT |
Treated water | cm | TW |
Biochemical oxygen demand | TW-BOD | |
Treated water suspended solid | TW-SS | |
Treated water total phosphorus | TW-T-P | |
Treated water total nitrogen | TW-T-N |
Parameter | Value | Description |
---|---|---|
Look-back window | 10 | Number of past time steps used for prediction |
Embedding size | 64 | Feature dimensionality for input encoding |
Dense-layer size | 256 | Hidden layer size in the fully connected network |
Attention heads | 8 | Number of attention heads |
Dropout rate | 0.2 | Probability of dropping neurons to prevent overfitting |
Learning rate | 0.001 | Initial learning rate for the optimizer |
Batch size | 32 | Number of samples processed per training step |
Epochs | 200 | Total number of training iterations |
pH | FOss | TWT | TWTN | ||
---|---|---|---|---|---|
Integrated KAN-LSTM | MAE | 1.81 | |||
RMSE | 1.96 | ||||
LSTM [31] | MAE | 0.07 | 1.06 | 2.36 | 2.21 |
RMSE | 0.11 | 1.67 | 2.97 | 2.87 | |
KAN [32] | MAE | 0.11 | 1.62 | 2.30 | 2.09 |
RMSE | 0.19 | 1.89 | 2.78 | 2.92 | |
N-BEATS [33] | MAE | 0.07 | 1.05 | 1.85 | |
RMSE | 0.09 | 1.85 | 2.61 | 2.17 | |
R2 | |||||
TCN [34] | MAE | 1.18 | 1.80 | 1.53 | |
RMSE | 0.11 | 1.92 | 2.51 | ||
Metric | pH | FOss | TWT | TWTN | |
---|---|---|---|---|---|
LSTM-only | MAE | 0.07 | 1.06 | 2.36 | 2.21 |
RMSE | 0.11 | 1.67 | 2.97 | 2.87 | |
30.13% | 95.01% | 78.62% | 73.31% | ||
LSTM + Attention | MAE | 0.07 | 0.92 | 2.09 | 1.86 |
RMSE | 0.09 | 1.30 | 2.72 | 2.42 | |
53.67% | 95.05% | 81.45% | 76.23% | ||
LSTM + KAN | MAE | 0.79 | 1.95 | 1.69 | |
RMSE | 1.08 | 2.61 | 2.14 | ||
58.21% | 95.09% | 83.24% | 78.56% | ||
Integrated KAN-LSTM | MAE | ||||
RMSE | |||||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zheng, J.; Suzuki, G.; Shioya, H. Sustainable Sewage Treatment Prediction Using Integrated KAN-LSTM with Multi-Head Attention. Sustainability 2025, 17, 4417. https://doi.org/10.3390/su17104417
Zheng J, Suzuki G, Shioya H. Sustainable Sewage Treatment Prediction Using Integrated KAN-LSTM with Multi-Head Attention. Sustainability. 2025; 17(10):4417. https://doi.org/10.3390/su17104417
Chicago/Turabian StyleZheng, Jiaming, Genki Suzuki, and Hiroyuki Shioya. 2025. "Sustainable Sewage Treatment Prediction Using Integrated KAN-LSTM with Multi-Head Attention" Sustainability 17, no. 10: 4417. https://doi.org/10.3390/su17104417
APA StyleZheng, J., Suzuki, G., & Shioya, H. (2025). Sustainable Sewage Treatment Prediction Using Integrated KAN-LSTM with Multi-Head Attention. Sustainability, 17(10), 4417. https://doi.org/10.3390/su17104417