Application of Machine Learning Models in Optimizing Wastewater Treatment Processes: A Review

Zamfir, Florin-Stefan; Carbureanu, Madalina; Mihalache, Sanda Florentina

doi:10.3390/app15158360

Open AccessReview

Application of Machine Learning Models in Optimizing Wastewater Treatment Processes: A Review

by

Florin-Stefan Zamfir

^*

,

Madalina Carbureanu

^*

and

Sanda Florentina Mihalache

Department of Automatic Control, Computers, and Electronics, Faculty of Mechanical and Electrical Engineering, Petroleum-Gas University of Ploiesti, 39 Bucharest Avenue, 100680 Ploiesti, Romania

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8360; https://doi.org/10.3390/app15158360

Submission received: 1 July 2025 / Revised: 22 July 2025 / Accepted: 25 July 2025 / Published: 27 July 2025

(This article belongs to the Special Issue Advanced Technology and Applications of Artificial Intelligence in Wastewater Treatment)

Download

Browse Figures

Versions Notes

Abstract

The treatment processes from a wastewater treatment plant (WWTP) are known for their complexity and highly nonlinear behavior, which makes them challenging to analyze, model, and especially, to control. This research studies how machine learning (ML) with a focus on deep learning (DL) techniques can be applied to optimize the treatment processes of WWTPs, highlighting those case studies that propose ML and DL methods that directly address this issue. This research aims to study the ML and DL systematic applications in optimizing the wastewater treatment processes from an industrial plant, such as the modeling of complex physical–chemical processes, real-time monitoring and prediction of critical wastewater quality indicators, chemical reactants consumption reduction, minimization of plant energy consumption, plant effluent quality prediction, development of data-driven type models as support in the decision-making process, etc. To perform a detailed analysis, 87 articles were included from an initial set of 324, using criteria such as wastewater combined with ML, DL, and artificial intelligence (AI), for articles from 2010 or newer. From the initial set of 324 scientific articles, 300 were identified using Litmaps, obtained from five important scientific databases, all focusing on addressing the specific problem proposed for investigation. Thus, this paper identifies gaps in the current research, discusses ML and DL algorithms in the context of optimizing wastewater treatment processes, and identifies future directions for optimizing these processes through data-driven methods. As opposed to traditional models, IA models (ML, DL, hybrid and ensemble models, digital twin, IoT, etc.) demonstrated significant advantages in wastewater quality indicator prediction and forecasting, in energy consumption forecasting, in temporal pattern recognition, and in optimal interpretability for normative compliance. Integrating advanced ML and DL technologies into the various processes involved in wastewater treatment improves the plant systems’ predictive capabilities and ensures a higher level of compliance with environmental standards.

Keywords:

artificial intelligence; deep learning; nonlinear; quality; analysis; prediction; challenges; environment

1. Introduction

Regarding the global and regional context of wastewater treatment infrastructure, in Europe, over 23,000 municipal WWTPs operate, serving approximately five hundred million people, annually treating approximately fifty billion cubic meters of wastewater. In Romania, there are around four hundred municipal WWTPs (including the industrial ones), the most representative being the ones from Bucharest city (with a capacity of 6.2 m³/s), Timisoara, and Cluj-Napoca. In order to comply with European Union Water Framework Directive requirements, these WWTPs are facing high pressure regarding their efficiency improvement because, at the present time, they are consuming between one and three percent of national electricity (the cost with the necessary reactants is between fifty and twenty-five percent of the plant budget, while the maintenance cost is between twenty to thirty percent of the budget) [1,2,3]. It can be said that the attempt to apply AI techniques (ML models) to optimize wastewater treatment processes from WWTP is justified by the current situation, which is not exactly the best one.

In the literature, several reviews have previously analyzed the integration of AI into wastewater treatment processes optimization, each one with a certain specificity and certain limitations. So, in Lowe et al. [4] a review of ML application in drinking water and wastewater treatment monitoring systems is performed, but it does not not present the used systematic methodology, focusing on monitoring rather than on optimization. In Alprol et al. [5], the AI technologies used in wastewater treatment are reviewed, with the current trends being highlighted, without the use of a systematic review protocol or a comprehensive performance comparison of ML methods [6]. In Capodaglio and Callegari [7], the potential and limitations of AI technologies in wastewater applications are analyzed, but without systematic literature mapping and research trend quantitative analysis. So, the main gap of the existing literature is the fact that they lack a systematic methodology following established guidelines (PRISMA) or miss relevant studies. They also combine drinking water with wastewater treatment applications (which are not analyzed separately), focusing on monitoring rather than on optimization. Also, limited comparative analyses of ML methods are presented, necessary attention not being given to the ML models’ practical implementation and integration challenges with the usually outdated infrastructure of industrial WWTPs. Because of the rapid advancement in DL and hybrid approaches (2020–2025), these techniques have not been analyzed in depth in previous studies.

This article explores the implications of using artificial intelligence, specifically ML, a subdomain of AI, in optimizing wastewater treatment processes from an industrial WWTP. The documentation level of the specialized literature regarding the application of ML methods in the wastewater treatment domain is assessed, as this topic is important due to the challenges in optimizing the wastewater treatment processes in a plant. For instance, most of the treatment processes from a WWTP are very complex, having dynamic and highly nonlinear behavior (a suggestive example in this sense is the wastewater pH neutralization process), making them very difficult to analyze, model, control, and optimize. Thus, works from specialized literature that are aimed at optimizing wastewater treatment processes using ML methods are identified.

In order to comply with stringent environmental regulations, WWTPs face a series of problems and challenges, such as wastewater treatment process complexity (dynamic and high-nonlinear processes behavior), difficulties in ensuring real-time control of chemical and biological processes through traditional methods (which leads to high energy consumption and reactants usage), lack of specialized and highly trained human plant operators (which leads to wrong or ineffective decisions), the inability to predict and to prevent plant equipment failures and process disturbances (reactive rather that proactive actions), old plant infrastructure (which imposes the existence of modern control systems and equipment), climate change impact (which requires improved and more efficient control and prevention tools), energy recovery problems, and public pressure regarding environmental protection. Wastewater challenges are becoming more critical as the wastewater flowrate is increased due to population growth, more restrictive regulations regarding plant effluent quality, the sanctions that a WWTP can receive if it pollutes the environment, and the need of sustainable and cost-effective treatment solutions.

Nowadays, ML methods (especially DL methods), such as Convolutional Neural Networks (CNNs), Random Forest (RF), and Long Short-Term Memory (LSTM) networks, are used in the prediction of various wastewater quality parameters, in the models accuracy improvement through the identification of relevant data features, in the management of time-series data, in the identification of complex relationships in data, in the monitoring of dynamic variations in wastewater parameters, and so on [8,9]. In addition, integrating Internet of Things (IoT) technologies with ML methods increases the ML models’ real-time monitoring abilities, thus being an intriguing subject for future research [9,10].

Five important databases, such as Web of Science, Scopus, IEEE Xplore, PubMed, and arXiv, were used to perform a detailed analysis, including studies published from 2010 or newer that employ ML for optimizing wastewater treatment processes.

We have defined three main research questions (presented in Section 3.2. Objectives and Purpose of the Review); in this way, an in-depth analysis of ML applications in wastewater treatment optimization (including keyword analysis, analysis of publications’ geographical and temporal distribution) was achieved; the main uses of AI technologies in different contexts was investigated (real-time monitoring and control, wastewater quality parameters predictive modeling, process optimization, etc.); and the essential challenges, limitations, and gaps encountered in the use of ML/DL methods in the wastewater treatment domain were identified.

The paper is structured in six sections. Section 2 presents the wastewater treatment processes from a WWTP, the technologies used in this domain, challenges in wastewater treatment, and the performance indicators used in wastewater treatment. Section 3 presents the methodology used to identify gaps in the literature, and in Section 4 the classification of ML/DL methods used in the optimization of wastewater treatment processes is achieved. In Section 5, the results obtained are presented, and the conclusions are presented in Section 6.

In this study, PRISMA 2020 guidelines are strictly used for ML applications in wastewater treatment, alongside Litmap citation network analysis for literature exploration. Antecedent reviews also generally used traditional searching databases without systematic citation network maps or standardized reporting. Whereas typical literature surveys concentrate on subcategories of ML, we offer a comprehensive overview that includes classical ML, state-of-the-art deep learning, hybrids, and optimization-based approaches under one systematic framework. Moreover, this literature review focuses on the integration of IoT and digital twins with ML models for real-time wastewater optimization. Unlike the literature reviews, which mostly pay attention to the performance of the algorithms, this study emphasizes real-world deployment issues, resource needs, and practical implementation obstacles, along with their ease of mitigations. The emphasis in this article on 2020–2025 literature highlights the recent progress in transformers, understandable AI, and edge computing, which were not explicitly treated in previous surveys.

2. Wastewater Treatment Processes in an Industrial Wastewater Treatment Plant

2.1. Stages of the Wastewater Treatment Process

The treatment processes from a WWTP are vital in obtaining a quality plant effluent that does not negatively affect the plant, as well as in the efficient removal of inorganic or organic pollutants. These are complex and, in the majority of cases, highly nonlinear dynamic processes (as in wastewater pH neutralization, activated sludge, etc.), which consist of three treatment stages: the mechanical (physical) stage, the chemical stage, and the biological stage.

In the mechanical stage, the removal of solids (total suspended solids—TSS) takes place through gravitational separation, sedimentation, and pre-aeration [11]. Additionally, fats and oils are separated using fat separators and decanters.

In the plant chemical stage, the wastewater pH neutralization process (the removal of excess acidity or alkalinity from the treated wastewater through the use of a chemical reagent of opposite composition, such as hydrated lime Ca(OH)₂ and sulfuric acid H₂SO₄) occurs. Also, in this stage wastewater chemical precipitation takes place (the removal of heavy metals, phosphates, sulfides, fluorides, nitrates, etc.) through the use of a set of reagents (such as aluminum sulfate—Al₂(SO₄)₃, aluminum chloride—AlCl₃, ferric chloride—FeCl₃, ferric sulfate—Fe₂(SO₄)₃, hydroxide (OH), and sulfides (S²⁻)), and also fine particle coagulation and flocculation. Chemical oxidation and reduction are other essential processes that occur during the chemical stage, as toxic organic pollutants (such as chromium, chlorine, hypochlorite, hydrogen peroxide, and nitrite) are transformed into less harmful compounds using different oxidants (like chlorine and hydrogen peroxide) that are more easily removed. In addition, through chemical oxidation, BOD (Biochemical Oxygen Demand) and COD (Chemical Oxygen Demand) reduction, ammonia removal, chemical oxidation of non-biodegradable organic compounds, reduction of residual organic matter concentration, oxidation of iron sulfate, etc., take place [12,13].

In the third and final stage (before the plant effluent is discharged into the emissary), the biological one, which includes organic matter degradation and nutrients removal, the metabolic activity of microorganisms (such as bacteria, fungi, and protozoa) is considered the most cost-effective biological treatment procedure [11,14].

Biological treatment is made through both aerobic and anaerobic processes. In industrial applications, anaerobic biological treatment is frequently used for highly organically loaded wastewater, which is treated using Upflow Anaerobic Sludge Blanket (UASB) reactors, anaerobic membrane bioreactors, and anaerobic fluidized bed reactors. This approach offers several advantages, including reduced plant operational costs, energy recovery, and reduced sludge production. The use of aerobic or anaerobic biological treatment depends on the characteristics of the treated wastewater (BOD or COD concentration, etc.), space constraints, and energy requirements [15,16,17].

In addition to conventional biological treatment methods, advanced oxidation processes (AOPs) are sometimes employed, which produce reactive oxygen species for the degradation of pollutants [18].

For treating industrial wastewater that contains non-biodegradable or toxic organic compounds, AOP technology (utilizing highly reactive hydroxyl radicals, referred to as •OH) is employed to achieve the complete mineralization of these difficult-to-treat organic pollutants through various mechanisms, including ozonation, Fenton oxidation, UV-based photolysis, and electrochemical methods [19].

In Figure 1, a block diagram of a WWTP from the author’s point of view is presented.

As can be observed in Figure 1, the input is represented by the plant influent, which is wastewater with specific characteristics (such as pH, TSS, COD, BOD, extractables, chlorides, phenols, etc.). In the mechanical treatment subsystem, the wastewater undergoes gravitational separation, fat flotation, and sedimentation processes.

In addition, flotation processes (such as dissolved air flotation, which are often used in conjunction with fat separators and decanters) are employed to separate fats, TSS, and oils. The effectiveness of this method depends on particle size, density, and the chemicals added (coagulants and flocculants) to achieve particle aggregation [20].

By using grills and screens, large bodies and suspended solids are removed. Subsequently, oils, greases, and other substances lighter than water are removed from the treated water using separators. Sand separators and decanters are used for separating suspended solids. The obtained treated water flow rate (at the mechanical stage output) is applied to the treatment processes related to the second subsystem (chemical treatment), such as pH neutralization, precipitation, coagulation, flocculation, oxidation, and reduction.

Additionally, at this stage chemical precipitation of wastewater occurs (removing heavy metals, sulfides, nitrates, phosphates, etc.) through the use of specific chemical reagents, including aluminum sulfate (Al₂(SO₄)₃), ferric chloride (FeCl₃), and hydroxides (OH⁻). This step is followed by coagulation, which involves mixing the wastewater with coagulants, and then flocculation, the second stage of coagulation, during which particles are aggregated into larger flocs [6].

The reagents (neutralization agents, coagulants, and flocculants) are prepared and dosed in the mixing–reaction tank using dosing pumps. The mixing–reaction tank, where the chemical stage processes take place, consists of two compartments: the basin where the wastewater to be treated is mixed with the chemical reagents, and the reaction basin where the chemical reactions corresponding to the chemical processes occur. The output of the mixing–reaction tank becomes the input for the flotation decanter, where sedimentation and flocculation of particles and sludge occur [20].

The chemically treated water enters the aeration basin corresponding to the biological subsystem. In this basin, the necessary nutrients are dosed using blowers to support the development of microorganisms and provide the required oxygen for the degradation of organic matter.

Bacteria need organic matter (to feed) and oxygen (to breathe) to rapidly multiply (creating more biomass), and to clean the wastewater. The bacteria are more active when they receive more oxygen (pumped into the wastewater aeration process); thus, the pollutants from wastewater are removed faster. The bacterial communities (that form the activated sludge) are consuming BOD and COD [6].

In the final decanter, the suspensions resulting from the biological treatment are retained, and the sludge is recirculated to the aeration basin. The treated water that is discharged into the plant effluent and the excess sludge represent the outputs of the biological step.

2.2. Traditional and AI-Based Technologies Used in Wastewater Treatment

Studying the treatment processes from Romanian industrial WWTPs, it was observed that the traditional wastewater treatment is usually based on classical control, optimization, monitoring, and prediction technologies that have been refined over time. For instance, as control systems in the majority of WWTPs use SCADA systems for basic monitoring and data gathering (in some WWTPs, in recent years, SCADA systems have been replaced with PLC controllers due to their advantages), PID (Proportional-Integral-Derivative) controllers are used for classical parameter tuning (such as for wastewater pH and dissolved oxygen control). Also, the plant manual, based on human operator experience gained over time, is used, along with standard operating procedures. As optimization tools, statistical methods (for historical data analysis), trial-and-error-based parameter tuning, reactive maintenance (in the case of equipment failure), and mechanical adjustments for energy consumption reduction are typically employed. For monitoring and prediction, traditional methods include laboratory chemical tests (for instance, to evaluate plant effluent quality), manual sampling, trending analysis (for plant performance evaluation), and offline analysis, among others. But all these traditional technologies come with a set of limitations, such as their inability to handle complex and highly nonlinear processes (as the majority of the WWTP processes are), the dependence on the human plant operator know-how, excessive usage of energy and chemical reactants (which leads to supplementary and unnecessary costs), the inability to effectively prevent the occurrence of process disturbances and plant equipment failures, limited real-time optimization resources, etc.

From a chemical point of view, according to Li, N. et al. [21], for the removal of heavy metal ions (Cu(II)) from wastewater, adsorption was used with chemically modified biochar with silicon (Si) and manganese (Mn) (b-BC), obtained by hydrothermal carbonization of corn stalks. Adsorption with chemically modified biochar with Si and Mn (b-BC) is an innovative wastewater treatment method (especially for those wastewaters contaminated with heavy metal ions such as copper, lead, mercury, cadmium, and chromium), combining the used nanotechnology applied to natural materials with a high adsorption efficiency and overall with ecological sustainability and is a good candidate for advanced industrial wastewater treatment applications.

The work of Abdykadyrov, A. et al. [22] presents a wastewater treatment system and includes the designing, testing, and theoretical modelling of a pilot ozonator (based on a special high-frequency electric discharge) capable of destroying pathogenic microorganisms from biologically contaminated surface water, microorganisms that exceed the maximum permissible concentration. The proposed ozonator with high-frequency electric discharge is an advanced and efficient method for industrial wastewater treatment, it being an ideal solution for rigorous WWTP effluent treatment before discharge into the environment or reuse.

In the work by Lu, D. et al. [23], an innovative modeling framework was introduced to elucidate nitrate nitrogen migration within heterogeneous vadose zones, addressing key challenges in simulating non-equilibrium pollutant transport at the watershed scale. The proposed model Ensemble improved Stream Tube Model (ESTM), which integrates Pearson Type III distributions for robustly characterizing the heterogeneity of transport parameters (v, D) and includes nitrate degradation and adsorption functions, outperformed traditional models in simulating early breakthrough and tailing. ESTM is an innovative, robust, and adaptable tool for understanding and predicting pollutant fate in heterogeneous subsurface environments, with practical implications for groundwater resource management and pollution control.

To overcome the limitations of traditional wastewater treatment technologies and to make the transition from reactive to proactive plant management, the integration of AI and of advanced ML models (artificial neural networks—ANNs and DL techniques) provides a set of advantages in the optimization of treatment processes from a WWTP, such as in predicting plant effluent quality parameters, in modeling the complex and variable treatment processes behavior, in improving the plant operational control and efficiency overall, and in improving the predictive capabilities of plant systems and operational efficiency [2,4,10,11].

Through the usage of advanced ML techniques, complex and nonlinear relationships that characterize the treatment processes can be optimized under varying conditions, ensuring an improved prediction accuracy (such as the adjustment of carbon source dosage and dissolved oxygen levels), necessary in the decision-making process of the plant’s human operator [24,25]. For reducing the impact of a plant wastewater treatment process on the environment (greenhouse gas emissions), ML techniques can be used to develop prediction models (such as carbon emission prediction models). They can improve the design of WWTP treatment facilities [26].

Thus, AI models like fuzzy logic (FL) and ANNs have been successfully applied to model the complex and highly nonlinear behavior of wastewater treatment processes, improving the prediction accuracy and the plant’s operational efficiency [27]. The usage of Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Unit (GRU) networks, models that belong to the DL subdomain, comes with an improved prediction accuracy of the wastewater quality parameters (like pH, dissolved oxygen, turbidity, etc.), a better prediction of the plant effluent quality, an improved performance regarding the nonlinear data and pattern handling, and a superior performance in capturing temporal dependencies, all of these benefits leading to improved plant operational control strategies [8,24,28,29,30]. According to Köksal et al. [27], these AI models, especially ANNs, have been successfully applied in the estimation of wastewater quality parameters like pH, conductivity, and salinity, with high predictive accuracy. According to Da Costa et al. [31], ANNs were successfully used in wastewater treatment optimization, especially in pollution control processes, in situations where traditional statistical models have not provided the desired results [32]. In work by Xu et al. [33], CNNs (because of their ability to capture temporal dependencies) were used to achieve the prediction of wastewater pH, research that demonstrated that a CNN with six convolutional layers and three pooling layers, combined with a sliding window method, effectively predicts pH values between 7.2 and 8.6 units, ensuring compliance with the imposed standards [8].

According to Alprol et al. and Rodríguez-Alonso et al. [5,34] the usage of Recurrent Neural Networks (RNNs), particularly of Long Short-Term Memory (LSTM) networks, has proven effective in time-series data analysis, in the prediction of WWTP equipment failures, in the prediction of climate variables over time, and overall in the optimization of treatment process optimization [5,34]. The integration of DL with LSTM networks (because of their time-series data management ability to track the dynamic variations of wastewater parameters) has been analyzed from the usage potential point of view, like the prediction of essential (such as COD and pH) wastewater quality parameters [9].

In addition, the usage of hybrid models, such as adaptive neuro-fuzzy inference systems (ANFIS), combined with ANNs and fuzzy logic (FL), opens a new research direction that evaluates the potential of using treated wastewater as a water resource for agricultural reuse, being a viable solution for the sustainable management of water resources [27,35,36,37]. This combination of ML models (ANFIS, ANNs, and FL) was also applied for optimizing the treatment processes of a plant in predicting the plant effluent quality, in increasing the pollutant removal efficiency degree, in the reduction of the plant operational costs, and overall improving wastewater quality prediction and systems prediction reliability and interpretability [27,38,39,40]. According to [27], these hybrid models have significantly improved the prediction reliability and interpretability of WWTP systems, providing practical applications for agricultural reuse and wastewater management. According to Huang et al. [41], the combination of ML and DL techniques has shown promising results in the handling of multidimensional and large-scale data and in the improvement of the prediction accuracy for various parameters (such as nitrous oxide emissions, etc.). Hybrid DL models, which combine CNNs, LSTM, and GPR (Gaussian Process Regression), were used for predicting water quality parameters (total dissolved solids, electrical conductivity, etc.) in coastal aquifers [42]. According to Xu. A. et al., Meng and Zhang, and Xu. X. et al. [8,9,33], the hybrid methods that combine the advantages (ability to model nonlinear data, robustness in capturing intricate patterns in the data, efficient management of time-series data, the ability to capture long-term dependencies, etc.) of LSTM networks with CNNs were successfully used in wastewater pH prediction, accurate time-series forecasting in wastewater treatment processes, in increasing the WWTP operational efficiency, and in ensuring the environmental compliance of the wastewater treatment processes. Because of their ability to process data in parallel, CNNs require less computational power than LSTM networks (which, on the other hand, ensure a superior accuracy in scenarios with significant temporal dependencies), this being an essential advantage for real-time applications [9].

The usage of explainable AI (such as CNNs, Support Vector Machines—SVMs, ANNs, LSTM networks, GRUs, decision trees, linear and logistic regression, rule-based models, RFs, eXtreme Gradient Boosting—XGBoost, SHapley Additive exPlanations—SHAP, integrated gradients, LIME—Local Interpretable Model Agnostic Explanations, Grad-CAM—Gradient-weighted Class Activation Mapping, etc.), such as explainable ML and DL, reduces the discrepancy between human model interpretability and complex model predictions, thus gaining the trust of the plant engineers and decision entities [24,30,43,44,45,46].

Also, the usage of statistical methods (complementary to AI techniques), such as fractional factorial designs (FFDs) for wastewater treatment process optimization, led to operational downtime minimization and to the reduction of experimental usage on a large scale, being very useful tools in modelling the complex physicochemical processes from a WWTP and also in scenario development [47]. Moreover, the combination of FFDs with Belief Functions (BFs) and Support Vector Machine (SVM) learning comes with a set of improvements in the optimization of recycling processes (an example in this sense being the processes from a paper mill) [47].

Because they can handle complex, nonlinear treatment process behavior (such as wastewater pH neutralization) and very large plant databases, and also due to their adaptability to changing conditions, ML and DL techniques such as ANNs, SVMs, LSTM, CNNs, and RFs have been widely used for water quality prediction, and also for wastewater treatment optimization (the reducing of energy consumption and alignment with environmental standards) [5,43,48]. Also, these ML and DL models were successfully applied in predicting membrane fouling in membrane bioreactors (MBRs), optimizing their filtration processes, and modelling the fouling dynamics [43,49].

A new approach consist of the usage of an advanced DL method (a node-level capsule graph neural network (NLCGNN) that uses the hermit crab optimization algorithm for model parameters fine tuning) in the treatment process of the wastewater supplied by a paper mill, improving the prediction accuracy of critical quality indicators (such as COD), the model performance, and the real-time monitoring process [50].

The NLCGNN treats the wastewater treatment process for a paper mill as a connected network, where each network node represents different treatment stages (mechanical, chemical, or biological), and the network connections represent the wastewater flow rates between stages. In this way, the NLCGNN captures the complex relationships between wastewater quality parameters (such as pH, COD, BOD, TSS, etc.). The NLCGNN learns how upstream paper production affects downstream treatment, predicting fiber buildup in a stage (causing high COD values in subsequent stages) and optimizing the entire treatment chain (improving real-time monitoring, COD control, and reducing chemical consumption, among other benefits) [36].

Although these models need large amounts of data, ML and DL methods have the potential to optimize wastewater treatment processes from a plant, for instance, through the reduction of plant operational costs (minimizing the number of laboratory measurements and costs with chemical reactants, etc.) and through the real-time monitoring of the plant treatment processes [27,51]. Because of these models’ black-box nature (the mechanisms between the input and the output features are not presented), the plant operators are a bit skeptical about using these models, preferring the classical models [10]. The integration of AI with an existing WWTP control system, like SCADA (Supervisory Control and Data Acquisition), is a desideratum because it can optimize the plant wastewater treatment processes and fault detection, but the high implementation costs represent a barrier to the widespread adoption of this solution [7]. According to Gulshin and Kuzina [12], ML and DL techniques are typically applied to enhance the real-time monitoring capabilities of SCADA systems (through the integration of data from various sensors) to achieve more accurate control and management of wastewater treatment processes.

For improving the optimization, real-time monitoring, and the control of the processes within a WWTP, AI models can be integrated with IoT technologies, allowing more precise and timely interventions, improving the collection process of data and real-time analysis, and providing more accurate data (through advanced sensors) for climate models [10,52,53]. In addition, the integration of AI models with IoT technologies allows the creation of intelligent networks dedicated to WWTP monitoring, in which the data supplied by distributed sensors are analyzed to detect anomalies in the plant operation and infrastructure, and to predict unwanted pollution events [43,54].

Machine learning has been successfully implemented in various environmental and engineering fields, such as marine applications, showing its portability for the prediction of complex phenomena [55].

Therefore, the use of advanced technologies in conjunction with traditional ones (such as microbial analysis) in the treatment processes of a WWTP enhances pollutant removal and, simultaneously, leads to the development of sustainable and environmentally friendly treatment technologies. These technologies come with a set of advantages, such as improved control of plant treatment processes and, overall, an enhanced WWTP performance [9,38]. To fully achieve the potential of AI models in optimizing wastewater treatment processes, it is essential to consider all the advantages of AI in addressing the problem, as well as its limitations (such as data requirements, integration costs, complexity, and model transparency).

2.3. Performance Indicators Used in Wastewater Treatment

As mentioned before, AI models, ML, and ANNs are efficiently predicting the WWTP effluent quality (for instance, making accurate predictions for COD and BOD parameters) and are optimizing the plant treatment processes because they can handle the complex and high nonlinear relationships between processes’ key features [39,56,57]. An example in this sense is presented in Otálora et al. [56], where a type of ANN model named shallow neural network (SNN) and a type of ML model, namely Random Forest (RF), supplied high modelling accuracy (for certain pollutants, the correlation coefficient reaching up to 96%). An SNN can be identified by its architecture (which usually contains an input layer, one or two hidden layers, each hidden layer having between 10 and 100 neurons, and one output layer), its reduced training time (minutes to hours), and its higher interpretability (it can trace connections).

It is a well-known fact that ML and DL models usually require a large amount of high-quality data for training, testing, and validation purposes, this being a problem that every researcher faces in their research work due to financial limitations and data collection and management constraints [5]. In this domain of wastewater treatment processes, it is very difficult to access a large amount of data from a WWTP, so the performance of some DL models, like CNNs and Recurrent Neural Networks (RNNs), using insufficient datasets is characterized by low accuracy and performance [58].

For assessing AI-based systems in wastewater treatment process optimization (operational efficiency and environmental impact), key performance indicators (KPIs) that encompass a range of statistical metrics are used. A key KPI for AI models is their predictive accuracy, which is measured using a set of statistical metrics, including mean square error (MSE), correlation coefficient (r), and root mean square error (RMSE). In Gulshin and Kuzina, Jafar et al., and Otálora et al. [12,39,56], these statistical metrics are applied to measure how well the AI models can predict the quality parameters (COD and BOD5) of a plant effluent.

Another important KPI is the AI-based systems’ ability to minimize a WWTP’s energy consumption and the required chemical reactants, with significant benefits in terms of reduced costs and improved environmental protection. In Yu and Li [37], the economic and environmental benefits of using AI-based systems for optimizing wastewater treatment processes were demonstrated, as their use resulted in a decrease in plant energy consumption of up to 15% and a reduction in the necessary chemical reactants of up to 14%. In addition, another important KPI is the implementation of predictive maintenance strategies in the WWTP to predict plant equipment failures (the downtime was reduced by 18%) and to extend the lifespan of the plant infrastructure through the analysis of historical data sensors to anticipate potential equipment malfunctions (the maintenance costs were reduced by 10%) [34,51].

In this sense, Rodríguez-Alonso et al. [21] present the integration of predictive maintenance services with digital twin platforms for forecasting the status of water dosage pump motors, thereby preventing process shutdowns and the accidental discharge of untreated wastewater into the plant effluent through timely interventions. The implementation of AI-powered predictive maintenance in a WWTP yields significant cost savings (by preventing sudden failures of plant equipment, facilitating more efficient planning of maintenance tasks, and optimizing resource reuse, among other benefits) and notable environmental benefits [51,59]. As demonstrated through a pilot study at an industrial WWTP, all of these benefits lead to a reduction in the plant’s annual total operation cost of up to 12% [51].

Another KPI is that AI can improve wastewater treatment sustainability (through resource recovery), and it can reduce the impact of a WWTP’s greenhouse gas emissions on the environment. Studies have shown that the optimization of a plant treatment process leads to a reduction of plant emissions by up to 18% [51]. The reduction of greenhouse gas emissions is achieved through the real-time control of the operational parameters (such as chemical reactant dosage, aeration rates, etc.) that are dynamically adjusted to comply with the requirements of the treatment processes, thus avoiding unnecessary tasks and consumption of resources [51].

The AI models’ robustness, adaptability to different input conditions, and their ability to maintain high performance for various scenarios are other important KPI indicators [27]. Another important indicator is the fact that AI models can handle the complex and highly nonlinear processes from a WWTP, with real benefits to the prediction accuracy, the process control, and the plant’s human operator decision-making process [39].

All these KPI indicators are useful tools for evaluating the performance of AI-based systems used in a WWTP, ensuring that they adhere to the imposed normative standards in the domain and contribute to environmental protection.

2.4. Challenges in Wastewater Treatment

Some of the problems/challenges encountered in wastewater treatment plants caused by changes in daily and seasonal flow rates are as follows: in the morning are maximum wastewater flowrates at the WWTP inlet and low flowrates in the nighttime, rapid flowrate variations, pump cycle frequently, rainfall infiltrations that dilutes wastewater (reducing the biological treatment efficiency), hot weather increases bacteria activity (causing dissolved oxygen reduction) and cold weather slows biological processes (requiring higher chemical reactants dosage), etc. Regarding the climatic conditions, the problems can include heavy rainfall that causes hydraulic overload, temperature fluctuations (disrupting nitrification or denitrification processes), storm events that generate suspended solids that block the grills and screens, and mechanical steps. The problems caused by industrial or accidental variations can be as follows: plant discharge spikes that kill the beneficial bacteria (the biological activity needs weeks to recover), chemical substance spills (which produce a shock in terms of pH variation), which turn the entire biological stage upside down, the factory’s immediate shutdown, etc. All these critical problems force human plant operators to take reactive and rash actions (emergency reactant dosage with effects on the plant costs), equipment stress, and usually violation of the environmental regulations.

The optimization (especially the control of the necessary reactant flow rates, the reduction of plant operational costs, and the reduction of greenhouse gas emissions, etc.) of the treatment processes from a WWTP presents many challenges, due to the complexity and high nonlinear behavior of the involved processes.

A significant challenge is the fact that the analysis, modelling, prediction, and control of such complex and highly nonlinear treatment processes using ML and DL methods requires, first of all, in their training, validation, and testing processes, very large amounts of high-quality data (that are crucial for ML and DL models), supplied by the plant monitoring systems and sensors. In many cases, access to the WWTP databases is not allowed, or they are very difficult to access due to the plant operating imposed regulations, or in the best case scenario (when the plant database is available) the available data are inconsistent (missing or noisy data) because of sensor malfunctions or measurement errors [43].

The usage of ML methods with insufficient and poor quality data (noisy or missing data) leads to misleading predictions regarding the various quality parameters that describe the plant processes, suboptimal control of the plant processes, an increase of the operating costs, wrong decisions in the decision-making process, obtaining a plant effluent that does not comply with the imposes normative, etc. [43,60]. To solve this situation (lack of data, or poor quality available data), the WWTP can invest in data collection and management infrastructure (quality sensors, performant systems for real-time data acquisition and monitoring, standardized data formats and protocols, etc.), but this is another challenge that imposes additional costs [43].

Additionally, the usage of ML methods in the optimization of wastewater treatment processes is usually not very welcomed by the WWTP human operators (lack of trust), due to the lack of transparency and to the black-box nature of these methods (such as ANNs), which makes them very hard to interpret and also to implement [43]. A solution to this problem can be the development of explainable ML and DL models that come with a set of facilities (such as visualization tools, a user manual, model algorithms explained in detail, graphs, etc.) dedicated to the WWTP human operators to make them more understandable and accessible for the plant staff [43]. On the other hand, the development of other explainable AI techniques (such as linear and logistic regression, decision trees, rule-based models, RF, eXtreme Gradient Boosting—XGBoost, SHapley Additive exPlanations—SHAP, integrated gradients, Gradient-weighted Class Activation Mapping—Grad-CAM, etc.) goes beyond the optimization of wastewater treatment processes, bringing real benefits to the interpretability of complex climate models, making things more accessible for researchers and decision makers [4,61]. The success of explainable ML and DL models (due to the improved data processing power and pattern recognition abilities) in the improvement of climate models’ accuracy and prediction suggests promising results in this domain [4,46].

Another challenge is reducing plant energy consumption and operational costs, which in most countries is a significant problem due to the use of traditional treatment methods (sedimentation, filtration, coagulation, flocculation, etc.) [9]. According to Shahouni et al. [49], AI -based systems can predict the presence of micro pollutants in wastewater, predict system membrane fouling (essential for maintaining wastewater purification efficiency), and identify patterns and trends in large databases, elements that traditional methods may miss.

On the other hand, because of the plant influent fluctuating nature (changes in the input flowrate, such as daily and seasonal variations, variations caused by weather conditions, industrial or accidental variations, etc.), the WWTP must continuously adapt (using adaptive learning techniques like reinforcement learning, ANNs, transfer learning, online learning, etc.) to these influent variations through the usage of real-time monitoring and control systems to maintain plant efficiency and to ensure an effluent that respects the imposed normative [24,43].

Moreover, to remove specific pollutants (such as nitrous oxide emissions—N₂O, hydrogen sulfide—H₂S, carbon dioxide—CO₂, COD, BOD, ammonium—NH₄⁺, nitrates—NO₃⁻ and nitrites—NO₂⁻, phosphates—PO₄³⁻, heavy metals, etc.), in many cases, due to the wastewater treatment process’ dynamic and high-nonlinear behavior, the usage of advanced predictive models is necessary (such as hybrid models that integrate multiple AI techniques, such as ANNs, ANFIS, and FL, evolutionary algorithms and metaheuristics, etc.), which raises other specific model issues and challenges [41].

Additional challenges are given by the use of green materials necessary for adsorption, filtration, catalysis, or support for microorganisms (biochar, activated carbon from natural sources, reused agricultural residues, etc.) in the wastewater treatment processes, which raises financial and technical problems, as well as the need for lifecycle studies to ensure sustainability. So, the usage of AI techniques extends beyond wastewater treatment optimization, having an essential role in green material design and optimization for wastewater treatment, thus promoting the use of advanced solutions for environmental management and protection [62].

The integration of ML and DL models into the existing control systems (SCADA or programmable logic controllers—PLC) in a WWTP imposes the updating of existing software and hardware, and also plant human operator and engineer training (to develop the necessary skills to implement and manage plant AI-based systems effectively), processes that can be time-consuming and very expensive, this being another challenge in the analyzed domain [43].

Another challenge consists of integrating AL (ML and DL) models with IoT technologies to create intelligent networks dedicated to monitoring WWTPs located in rural or remote areas, where traditional monitoring methods are ineffective or infeasible [10,54].

Overall, the complex, dynamic, and highly nonlinear behavior of the wastewater treatment processes from an industrial plant imposes the usage of modern approaches to analyze, model, predict, and especially to control them, to optimize and to make more efficient the entire plant operating process [52,63].

Multi-objective models have been proposed to deal with more complex trade-offs in water systems, such as maximizing resilience for given costs [3].

All these challenges necessitate continuous research and development in this field, particularly in model interpretability, data management, and the development of sustainable wastewater treatment technologies that ensure a performant wastewater treatment process. However, AI-based systems, once implemented in a WWTP, can significantly improve the efficiency of wastewater treatment processes, thereby contributing to environmental protection (by providing a quality effluent) and resource recovery.

So, AI techniques (such as ML and DL) are a very useful tool for the optimization of wastewater treatment processes, due to their ability to process the data supplied by a WWTP, to achieve quality predictions for very important parameters, in discovering the hidden relationships between these parameters, and so on, despite the challenges that inherently arise, such as the need for a very large amount of data (very large databases, which must be very well structured) and the interpretability of complex models [27,31,62].

3. Materials and Methods

3.1. Methodology Used for Machine Learning Models Applied in Wastewater Treatment Processes

The documentation phase began by identifying 300 works. To make the selection process relatively straightforward, the Litmaps tool was used for mapping and sorting academic literature (Figure 2). Various criteria and keyword combinations were specified, such as “wastewater + machine learning”, “wastewater + deep learning”, and “wastewater + artificial intelligence”. The starting year of papers was selected as 2010.

Figure 3 graphs the connections formed through citations over time between research articles. Each point on the graph represents a distinct paper. They are arranged horizontally from left to right in order of recency; those placed to the left are the oldest, and those on the right are the newest. Vertically on the graph, the papers cited the most often (and which, therefore, can be inferred as being the most influential or at least the most recognizable within certain scholarly circles) are placed higher.

Each dot size reflects the influence of a given paper. Larger dots represent works that have been cited more often. The lines between the dots represent citation relationships; when one paper references another, a line is drawn between the two. There is usually a significant amount of ongoing research activity in denser parts of the graph, where the dots are closely packed and heavily connected. These parts of the graph can often be interpreted as active fields or subfields where researchers frequently reference each other’s work.

Figure 3 shows how densely interconnected papers are in the citation connectivity map, focusing on the extent to which these papers mutually reference one another. When looking at how interconnected these papers are, one sees that some are so interconnected that, when mapped out, they practically encompass a whole citation network unto themselves. Moreover, there is a network of significance: a collection of the most influential papers in a field and the extent to which they form a citation web.

Figure 4 is another citation network, but it shows explicitly age-adjusted citation influence, meaning it accounts for how recently a paper was published. The vertical axis now represents more citations adjusted for the paper’s age, which helps highlight emerging influential work, even if it has not had many years to accumulate raw citations.

A few papers from 2025 are at the top right. They are the most recent and have the highest age-adjusted citation impact, meaning they have gained many citations very quickly compared to others. That suggests those papers have become highly influential quickly, possibly a breakthrough or a trend-setting work in their field.

Older foundational works such as Pai (2011) and Chen (2016) remain visible. However, they are positioned much lower now because their citation counts are expected to be higher due to age, so they do not stand out as much under age-adjusted metrics.

Figure 5 shows the temporal distribution of the selected works. The adoption of machine learning to engineering is happening much faster now than in the past, and it looks to us that wastewater treatment is part of this very intense area of growth. Perhaps this is because wastewater treatment is a particularly suitable application due to the folllowing:

Complex, nonlinear process dynamics
Large datasets from sensor monitoring
Need for predictive control and optimization
Regulatory pressure for improved efficiency

Table 1 shows the main journals that contain the articles mentioned in Figure 3 and Figure 4. Of the 300 articles, 144 journals contained them. From these 144, the ones that contained three or more articles were selected. High-impact journals (e.g., nature, science, water research) are not only looking for solving an environmental problem but also consider other fields (groundbreaking discoveries, global environmental impacts) while accommodating to ML applied to WWTT. On the other hand, the Journal of Water Process Engineering and Water focus on water treatment technologies and are expected to be more friendly to ML applications in this specialized field. Journals such as Water and Journal of Water Process Engineering specifically address the community of water treatment engineering and thus are desired publications for researchers studying ML applications in WWTPs.

The following section investigates the selection process, from an extensive collection of 300 works to the final count of 80. These 80 papers contain predictions about processes related to wastewater treatment. PRISMA 2020 guidelines were used to represent this selection process visually. These guidelines are a widely accepted standard for clarity and thoroughness in reporting systematic reviews. The PRISMA 2020 (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow chart is a comprehensive and standardized framework that documents the literature search and selection of studies [64]. It helps organize the data and clarifies how systematic review authors choose studies and why they include (or exclude) certain studies.

The sections in Figure 6 are presented in detail below. The three sections represent stages of the selection process. The articles were selected from databases and with the help of other methods.

1.

Identification Section

Left side (Databases and Registers):
- 300 records were identified from 5 databases (Web of Science, Scopus, IEEE Xplore, PubMed, arXiv).
- No records (0) came from registers
- 0 duplicate records were removed (using Litmaps, no duplicate records were selected)
- No records were excluded by automation tools or for other reasons

Right side (Other Methods):
- 5 records were identified from websites
- No records (0) came from organizations
- 19 records were found through citation searching

2.

Screening Section

Database path:
- 300 records were screened (after removing duplicates)
- This stage excluded 37 records. These were papers that were not about wastewater and did not mention ML/DL, which we had hoped would be discussed in the types of papers we were including.
- 263 reports were sought for retrieval (full texts)
- 53 reports could not be obtained (full texts were inaccessible)
- 210 reports were assessed for eligibility.
Other methods path:
- 24 reports were pursued for retrieval
- 11 reports could not be retrieved
- 13 reports were assessed for eligibility
Exclusion Reasons (Eligibility Assessment)
- From the database path, reports were excluded for these reasons:
  - 68 were not about wastewater prediction
  - 29 did not use ML/DL models
  - 11 had insufficient validation content
  - 12 were duplicate or redundant analyses
  - 6 used outdated technology
  - 4 had methodological flaws
- Six reports were excluded from other methods (duplicate or redundant analyses).

3.

Included Section

In total, the review encompassed 80 articles from databases and registers and 7 through alternative means.
Final articles: 87.

Figure 7 shows the citation relationships of 80 academic articles that were selected from a larger set of 300. Publications are organized along the horizontal axis according to their publication dates. The ones on the right are the most recent, and the ones on the left are the oldest. The vertical axis indicates how frequently these publications have been cited, with the ones higher up being referenced more often.

This perspective shows a basic downward slope from left to right, which is what you would expect: age should equal citation accumulation; after all, and papers to the right on this line should, in theory, be the least cited, since they are the freshest in the scholarly conversation. Yet there are two papers relatively high on the y-axis, even though their x-values indicate they should be less influential simply because they are newer.

Similar to Table 1, Table 2 includes selected journals that feature more than two papers on wastewater treatment processes, as identified from articles that have passed the screening process.

3.2. Objectives and Purpose of the Review

The essential aim of this systematic review is to recognize and assess the methods involving machine learning and deep learning that are presently being utilized in processes of wastewater treatment, with a certain focus on their efficacy, the difficulties in making them work, and how they could be used to make treatment operations better.

This review answers three principal research queries, which serve as a compass for navigating a thorough analysis of artificial intelligence uses in wastewater management.

The primary research question is pinpointing which ML methods are most effectively used in wastewater treatment optimization (which required an in-depth analysis of the specialized literature using Web of Science, Scopus, IEEE Xplore, PubMed, and arXiv). This means an in-depth look at not just supervised learning methods but also at hybrid and ensemble models that appear to be used more in the last few years.

The second research question investigates the main uses of these AI technologies, especially in the contexts of the following:

Real-time monitoring and control systems;
Predictive modeling for key wastewater quality parameters like COD and BOD;
Process optimization and automation;
The integration of these systems with IoT and digital twin technologies.

The third research question targets the essential challenges, limitations, and gaps encountered in the use of ML/DL for wastewater treatment (which required a critical analysis of the literature). These include issues with the quality and availability of data, issues with the complex relationships between data and with the dynamic and highly nonlinear behavior of the treatment processes that lead to an under-analysis of ML algorithms in the wastewater treatment processes optimizing domain, questions of model interpretability and transparency, integration problems with existing infrastructure, and attitudinal challenges in the form of regulatory and operational acceptance.

This systematic review, by PRISMA guidelines, employed a comprehensive search strategy across several databases.

The evolution and interconnectedness of research in this field are revealed via bibliometric analysis, showing the progression from basic AI applications to advanced hybrid models; identification of key papers that have defined the trajectory of the field; unaffiliated research that distinguishes between monitoring, prediction, forecasting, control, and optimizing dimensions of the work; and apparent movement toward real-time applications and IoT integration.

4. Classification of ML Models Applied in the Optimization of Wastewater Treatment Processes

As was presented in Section 2.2 and Section 2.4, AI techniques such as ML and DL (supervised learning—classification, regression; unsupervised learning—clustering, dimensionality reduction) and hybrid models (such as ANNs, ANFIS, FL, CNNs, LSTM, genetic algorithms (GAs), etc.) have been effectively applied in the wastewater treatment processes optimization; in modeling the complex, dynamic, and high-nonlinear behavior of these processes; in facilitating the analysis of complex datasets; in a quality prediction of the relationships between different pollutants and key effluent parameters (such as wastewater pH); with real benefits on the plant operating process efficiency; on the environmental monitoring systems accuracy; in the detection and monitoring of environmental pollutants; in improving the SCADA system real-time monitoring capabilities; in enhancing the accuracy of climate models and predictions (their capability to handle large datasets and to complex patters identification); and so on [7,31,49,65].

Another important aspect is that the usage of these AI techniques (advanced techniques for data analysis and predictive modelling) not only come with real benefits for WWTP optimal operating (minimization of resources and energy consumption, efficient pollutants removing, etc.), but also it aligns them with the global standards for environmental protection through the classification of the treated wastewater usability, this being a continuously growing trend [7,27,65]. The continuous development of ML and DL models provides, besides the optimizations of wastewater treatment processes, valuable insights for the decision-making process and regulation development, addressing at the same time critical problems such as pollution control and resource management [28,31,41]. The usage of ML and DL methods in the optimization of wastewater treatment processes not only helps solve stringent environmental problems but also opens the way for the identification of innovative solutions. Next, a short description of the ML models applied in wastewater treatment process optimization is provided.

A hierarchical grid displaying the structure of supervised learning and unsupervised learning techniques for wastewater treatment applications is depicted in Figure 8. In this chapter, the supervised learning methods are classified into two main groups: Traditional ML and Deep Learning techniques.

4.1. ML Models Applied in the Optimization of Wastewater Treatment Processes

4.1.1. Support Vector Machines (SVMs)

SVMs are a representative ML technique (because, in high-dimensional feature spaces, they can identify the optimal hyperplanes necessary for class separation) that is usually applied for regression and classification tasks [66]. According Pandya et al. and Guo et al. [52,67], for classification tasks, SVMs use a kernel function that facilitates the class separation (by projecting data into higher-dimensional spaces), finding a hyperplane that maximizes the margin between various classes, thus being a very useful tool for scenarios in which the data are not linearly separable. For regression (usually for linear regression) tasks, SVMs use a similar idea; they map the input data into a higher-dimensional space, in order to achieve continuous output prediction [29,56,68].

Although it presents sensitivity to the kernel choice and high computational demands, the usage of SVMs is recommended because of the advantages it offers, such as flexibility, adaptability, high accuracy and robust performance, robust theoretical foundation, efficient handling of high-dimensional data, identification of an optimal hyperplane that maximizes the margin between classes (ensuring good results, even when the data are not linearly separable), ability in working with nonlinear and non-separable data, robustness to outliers, robust solutions, the manipulation of datasets that presents a high number of features, successful application in multi-class classification tasks (due its combination with other algorithms and to the usage of kernel techniques), and so on [52,66,67,69].

Also, SVMs present a set of limitations, such as the fact that they are a computationally intensive method and less efficient than other methods (when using very large datasets, the training time increases), they have problems with imbalanced datasets (all the samples are classified into the majority class), the selection of the right kernel and model hyper parameters are a challenging and time-consuming process (usually it requires the expert’s knowledge and cross-validation), etc. [47,67].

Datasets are said to be imbalanced when there are different numbers of classes or groups that are not evenly represented. There are normal conditions which account for 95% of the data, corresponding to operational conditions, whereas equipment failures, alarm conditions, or pollutant exceedances only account for 5%. This lack of balance is typically a problem in wastewater treatment applications: normal or operational data are much more prevalent than emergency/fail data. Some concentration ranges for pollutants are much more common than extreme concentrations. More data exists for specific treatment processes (normal biological treatment) than it does for special cases (chemical precipitation events). This imbalance is problematic for machine learning, as most models are naturally biased to prefer the majority class, which can beundesirable for extremely rare events that are, however, crucial to detect (for example, equipment breaking or violating compliance). Specialized methods are necessary, such as oversampling the minority class, undersampling the majority class, cost-sensitive learning algorithms, or ensemble methods designed for imbalanced data.

SVMs were successfully applied in various domains, such as the analysis of water quality (demonstrating high accuracy in water samples classification and prediction of water quality indices), membrane flux modeling in water treatment processes (demonstrating high predictive accuracy, with values for the coefficient of determination R² higher that 0.99), the classification of water quality (97% accuracy, surpassing methods such as K-Nearest Neighbors—KNNs and RF), and in complex classification tasks such as text and object recognition [40,49,52,66,70].

Analyzing the method advantages and disadvantages, it can be said that SVM models remains a powerful ML tool, especially regarding the high accuracy and robust performance in the case of complex classification and regression tasks.

4.1.2. Decision Trees (DTs) and Ensemble Models (EMs)

ML’s powerful tools are DTs and EMs, such as Random Forest (RF) and Gradient Boosting (GB) techniques, such as XGBoost (Extreme Gradient Boosting) and LightGBM (Light Gradient Boosting Machine), each one with its strengths and applications.

DTs are a supervised ML technique that uses a tree-like decision model, successfully applied in water quality prediction and environmental monitoring because of their simplicity and interpretability, although they are prone to overfitting (when used with complex datasets) [29,52]. A solution to this problem is represented by the usage of EMs such as RFs and GB. Because DTs can handle numerical and categorical data, they are useful tools for modelling the behavior of wastewater treatment processes [52].

RFs are an EM method based on DTs that surpasses the overfitting problem by combining multiple decision trees during the training process, thus reducing the variance and improving model robustness and accuracy [69,71,72]. Because of the fact that they are effective in handling nonlinear relationships and interactions between features without data distribution assumptions, RFs are successfully applied in areas such as wastewater treatment and environmental monitoring [71,73].

The GB technique and its most used implementations (XGBoost, which outperforms other models because of its efficiency and scalability, and LightGBM, which was optimized for very large datasets and high-dimensional data) are sequentially developing models that correct the error of the previously supplied models [69,72,74]. XGBoost and LightGBM are suitable ML methods for complex prediction tasks (such as sludge production in WWTP) that provide superior predictive performance, unlike traditional ML methods [74]. Although it requires complex and detailed tuning, considering that it can be computationally expensive and time-consuming, XGBoost is known for its higher accuracy and ability in handling missing data and large datasets (it uses a parallel tree boosting method) for its regularization and scalability facilities [49,75]. LightGBM (which uses a histogram-based decision tree algorithm in order to discretize feature values), on the other hand, has an improved computation (it quickly process very large datasets), focusing only on discrete boundaries; it achieves real-time prediction, updating the existing models with new samples; and it ensures faster training and prediction times, being an efficient tool that needs low storage requirements [67]. In addition, because it uses the Gradient-based One-Side Sampling (GOSS) algorithm, LightGBM prioritizes data samples with larger gradients, in order to ensure classification accuracy while reducing data volume [67].

According to More et al. and Shao et al. [71,74], XGBoost and LightGBM are very efficient at fitting nonlinear data, being preferred (because of their superior predictive capabilities) over traditional ML techniques, the choice of one of these two algorithms depending on the need for speed versus the need for precision, the task requirements, and the available computational resources.

While DTs supply a simple and interpretable modelling approach, EM methods such as RFs and GB are improving the model’s predictive performance by surpassing DTs’ limitations, being a valuable tool in modelling complex and high-dimensional datasets.

4.1.3. K-Nearest Neighbors (KNNs) and Regression Trees (RTs)

KNNs and RTs are ML techniques used for classification and regression purposes, each with a set of advantages and limitations regarding wastewater quality prediction and optimization. KNNs are known to be a lazy learning algorithm, which in the data classification process uses the majority class (determined using the Euclidean distance metric) of their k-nearest neighbors. Although they can be computationally intensive and sensitive to the features’ scale, KNNs are efficient for nonlinear data distributions [41,69,76]. Despite their limitation regarding the handling of high-dimensional data, KNNs were successfully applied in environmental optimization processes, environmental monitoring, water quality prediction and modeling, and so on [52,66,76].

Being robust to overfitting, RTs (such as those used in RFs) are based on the decision tree algorithms (the data are divided into data subsets; the multiple trees’ outputs are aggregated) for the continuous output prediction process, with this reducing the data variance and improving the prediction accuracy [71,77]. Maybe the most important of RTs’ advantages is their ability to work with linear and nonlinear relationships between data and their interpretability, thus surpassing models like KNNs when it comes to complex prediction tasks [52,71].

KNNs (because of their efficiency in working with complex wastewater datasets) and RTs (preferred for their interpretability, scalability, and ability to model intricate data patterns) have been successfully applied in wastewater treatment optimization processes (such as wastewater quality prediction and management, resources and energy consumption reduction), in groundwater quality prediction, and so on, with the selection of one of these two models depending on the process characteristics [52,69,74,77,78]. In addition, KNNs’ flexibility is a very important advantage in wastewater optimization, because, in a WWTP, the inputs are highly variable and influenced by many factors. Also, KNNs’ ability to work efficiently with very large datasets and their robustness to noisy data make them a useful tool for prediction tasks and real-time monitoring of the WWTP processes [52].

Regarding RTs, because they are easy to use and interpret, and because they provide a visual representation of the decision-making process, they ensure a good and in-depth understanding of different inputs’ influence on the plant outputs, in order to provide informed human operators’ decisions. Also, their ability to model nonlinear relationships (due to the complex interactions between physical, chemical, and biological WWTP processes) without extensive data preprocessing makes them a very good tool for wastewater treatment optimization [52].

Overall, KNNs and RTs can be integrated (because they can handle various data types, are providing interpretable results, etc.) with other ML techniques in order to optimize WWTP operation (improved prediction accuracy, higher operational plant efficiency, enhanced wastewater quality management, better prediction of key wastewater parameters, etc.) and to ensure compliance with the environmental normative [5]. In Table 3, the main advantages, disadvantages, and applications of ML models in wastewater treatment optimization processes are presented.

4.2. Deep Learning Models (DL, Subfield of ML) Applied in the Optimization of Wastewater Treatment Processes

4.2.1. Artificial Neural Networks (ANNs)

Some of ANNs’ specific applications in WWTPs, applications through which various problems are solved that can occur during the plant operating process, are as follows: the prediction of plant effluent quality (with the achievement of real-time prediction of wastewater quality indicators, the prevention of plant emissary pollution, and compliance with the imposed normative domain is assured), the optimization of the wastewater treatment processes parameters (with the optimal chemical reactants dosage and energy consumption reduction being assured), the prediction of plant equipment failure (treatment process interruptions being prevented due to reagent dosage pumps, blower, transducers, or sensors failure, etc.), the forecasting of sludge production (high volume of resulting sludge can raise storage and recycling problems and also supplementary costs), real-time control of treatment processes (the usage of ANN-based controllers usually is a viable solution to manual control; this way the controller parameters are tuned in real-time), and so on.

A versatile computational model used for predictive and analytical tasks is ANNs, such as Multilayer Perceptron—MLP and Radial Basic Function—RBF networks, mainly used for classification and regression purposes.

MLP is a type of ANN with a single input layer, one or more hidden layers (the neurons of these type of layers uses a nonlinear activation function), and one output layer, a network that uses a feedforward architecture and that is trained using backpropagation (which adjust the network weights) for error minimization through gradient descendent [31,67,71]. Because of their flexibility and predictive power, MLPs are usually used in environmental data analysis (for trend prediction) and for improving treatment processes [31].

An extension of MLPs with multiple hidden layers are Deep Neural Networks (DNNs) that use a nonlinear activation function (ReLu—Rectified Linear Unit) to improve the network prediction accuracy (in the case of complex datasets) and, in order to prevent overfitting, are trained using unsupervised learning models [39,72].

Although they are used for the same tasks, MLP and RBF networks are different because of their architectures (MLPs can have one or more hidden layers, while RBFs usually have just one hidden layer), due to the type of activation functions that are used in hidden layers (ReLu is usually used by MLPs and Gaussian Radial Basis activation function for RBFs), and due to the training methodologies. These differences leave their mark on the network’s performance; RBFs usually exceeding MLPs because of their high speed and nonlinear data-efficient handling.

In order to compare the ANNs’ performance with other models, specific metrics such as the coefficient of determination (R²) and mean squared error (MSE) [27,31] are used.

ANNs have been successfully applied in the wastewater treatment domain, such as in the prediction of essential pollutants and in the optimization of control strategies [39,43,79]. The usage of ANNs with GAs is a hybrid solution (GA-ANN) that uses evolutionary principles in order to improve the solution development process, highlighting ANNs’ adaptability in solving complex problems, such as those from wastewater treatment process optimization [31].

4.2.2. Recurrent Neural Networks (RNNs)

RNNs such as LSTM and GRU models are essential in time-series modeling because of their ability in processing sequential data in an efficient way, and in capturing long-term dependencies.

Designed to overcome RNNs’ limitations (vanishing gradient problem), LSTM architecture, with its gating mechanisms (special cell structure with forget, input, and output gates), allows them to retain information over long data sequences and to learn and remember long-term dependencies, abilities that are essential in modeling the dynamic and high nonlinear behavior of wastewater treatment processes, and in efficient time-series modelling for wastewater management [9]. Additionally, the use of LSTM networks with CNNs (CNN-LSTM) has demonstrated their utility in handling the intricate temporal dynamics of wastewater treatment systems [33,80].

Overall, LSTM networks outperform models like Deep Belief Networks (DBNs) and CNN-LSTM hybrids in terms of COD trend prediction and in handling time-series data essential for wastewater quality prediction [9].

A simpler variant of LSTM networks is GRUs, which ensure faster training times and reduced risk of overfitting [24]. Because of their robustness in handling time-series data that presents complex temporal patterns and their ability to recognize patterns over long data sequences (as in the LSTM case), both GRUs and LSTM networks have been successfully applied in equipment failure forecasting and in water quality prediction [24,34,81]. According to Xu, X. et al. and Wongburi and Park [33,80], LSTM application in predicting essential wastewater parameters (such as nitrous oxide (N₂O) and COD) in wastewater treatment optimization, and in ensuring normative compliance, showed superior performance (significantly better prediction accuracy and stability) than classical RNNs. The integration of LSTM networks (due to their ability to model complex temporal dependencies and nonlinearities) with IoT has been successfully applied in enhancing the WWTPs’ operational efficacy, in improving real-time monitoring systems, and in better predictive maintenance [9]. Overall, due to its advantages, LTSM networks are a useful tool in the wastewater treatment domain, where accurate predictions can improve process optimization and environmental protection [9,33].

4.2.3. Convolutional Neural Networks (CNNs)

Because of their hierarchical structure that includes convolutional layers (through which filters are applied to the input data), pooling layers (for dimensionally reducing), and fully connected layers (by which extracted features are interpreted), CNNs are suitable for various tasks in WWTPs, such as the detection of anomalies in pollutant levels supplied by the plant sensors, in monitoring water quality (in distinguishing between clean and dirty water, utilizing reflectance images as inputs), and in real-time and extensive monitoring applications [5,51]. In addition, the integration of CNNs with other models, such as multi-kernel radial basis function neural network (MKRBFNN), has improved the feature extraction (features extraction from influent parameters) processes, which become automated (the manual feature selection was eliminated), improved the prediction accuracy (the models efficacy in capturing complex relationships) in WWTPs, and also reduced method overfitting (the efficient handling of noisy data was ensured) [11].

On the other hand, CNNs can efficiently handle missing or noisy data supplied by WWTP sensors and can automatically extract and interpret process features from complex and very large databases, these being very important skills in wastewater quality prediction and environmental monitoring.

According to Jamshidzadeh et al. [42], CNNs can be integrated with Gaussian Process Regression (GPR) in order to improve the prediction accuracy and ensure interval predictions, which are very useful when dealing with data uncertainty. The combination of CNNs with GPR and LOST (Long Short-Term Memory Neural Networks) is a viable solution to improve the robustness of predictions in dynamic and high nonlinear processes and in solving data noise and variability problems [42]. Also, CNNs, through the use of a binomial event discriminator, are capable of identifying critical anomalies (significant patterns) in sensor data fluctuations, which has been successfully applied in water supply systems [82]. CNNs combined with Deep NeuralNetworks (DNNs) are another viable solution for efficiently handling noisy data, using the abilities of DNNs as a powerful feature extraction process and for pattern recognition [58].

Beyond water quality monitoring, such applications based on CNNs are also used in multiple other applications, such as waste management and waste classification systems, highlighting their versatility in environmental applications [83].

Therefore, though the integration of CNNs with other models and due to their feature extraction abilities, the domain knowledge (the missing or noisy data supplied by sensors) is efficiently used and managed, ensuring accurate and reliable predictions and environmental monitoring.

4.2.4. Autoencoders and Deep Belief Networks (DBNs)

Autoencoders and DBNs are important architectures belonging to the DL domain, each with specific characteristics and applications. So, the autoencoders (which contain an encoder that compresses the input and a decoder that reconstructs the input) are an ANN type, used to learn efficient codifications from input data in order to reduce the dimensionality and to improve feature learning [39,58]. DBNs are a class of generative ANN model whose architecture is composed of multiple layers of stochastic variables. This type of network is used because of its ability to model complex distributions, having various applications in the environmental domain (the prediction of complex relationships through data patterns identification) [5].

While DBNs are more oriented to understanding data distribution, autoencoders are focusing on reconstructing the model inputs, making them suitable (because of their versatility) for various types of environmental applications, such as anomaly detection, data sensors compression and key features extraction, prediction of water quality variations or pollutant concentrations, accurate real-time monitoring and predictions for water quality and treatment efficiency, environmental monitoring, complex prediction tasks in WWTPs, in wastewater treatment, etc., and they contine to evolve and to improve [84].

4.2.5. Hybrid Deep Learning (HDL) Approaches

HDL approaches that combine DL with optimization or statistical models have been explored in order to improve the model’s prediction accuracy and to increase the model’s robustness in different domains (such as wastewater treatment processes optimization), being a useful tool for dimensionality reduction and feature extraction in high nonlinearity handling (DL combined with Partial Least Squares Regression—PLSR). A hybrid model that combines the advantages of CNNs (feature extraction ability) and LSTM networks (temporal sequence learning) with GPR (interval prediction strength) was used in order to predict water quality parameters, such as conductivity and total dissolved solids (TDS), demonstrating superior performance compared with the traditional ML methods [42].

Combining the advantages of CNNs with those of LSTM networks proved to supply superior prediction accuracy for essential water parameters (such as dissolved oxygen levels) compared with traditional ML models, like Support Vector Regression (SVR) and DTs [28]. Additionally, the combination of DL with EM methods, such as RNNs with RFs, was proved to maximize the method’s predictive abilities (by capturing temporal dependencies and feature relationships), with benefits regarding the model’s accuracy and robustness for environmental monitoring tasks [26].

The usage of hybrid models highlights the potential of integrating DL with optimization algorithms and statistical models in order to overcome prediction challenges, supplying a flexible and powerful tool for different applications. An example in this sense is the CNNE-LOST-GPRE (Convolutional Neural Networks–Long Short-Term Memory Neural Networks–Gaussian Process Regression) model, which uses an optimization algorithms (rat swarm algorithm) to improve the accuracy and adaptability of these hybrid models, to achieve superior performance in predictive tasks, making these models more suitable for real time complex and nonlinear applications [42].

Recently, there has been a successful development in fuzzy neural networks for water quality prediction, and interval type-2 fuzzy neural networks show better predictive performance considering adaptive membership functions [85].

Statistical techniques (such as cross-validation and grid search) and statistical metrics (Root Mean Square Error—RMSE, Mean Absolute Error—MAE) are also integrated in DL methods in order to streamline the hyper parameter fine tuning process; reduce overfitting risk; select the most effective model; ensure high accuracy in complex prediction tasks; and to obtain more accurate, reliable, and interpretable models necessary for water quality prediction and for environmental monitoring [28,29,58].

This combination of DL models with statistical ones also facilitates the development of models that are capable of handling the complex relationships of real-world data, models that are computationally efficient [86].

In Table 4, the main DL models’ advantages, disadvantages, and applications in the optimization of wastewater treatment processes are presented.

4.2.6. Advanced AI Techniques for Dynamic Optimization

Reinforcement Learning and graph neural networks are some of the advanced AI techniques that can deployed in wastewater treatment plants in order to save energy. Both of these methods can be combined to achieve both real-time control (RF) and system-level optimization (GNN).

Reinforcement learning is an ML method in which the agent learns the best control policies through environmental interaction. Agents are rewarded or punished according to the consequences of their actions. With continuous multi-objective RL, RL agents can be trained to operate treatment plants in real time, with instantaneous decisions on aeration rates, the addition of chemicals, and sludge retention times, to achieve long-term operation goals and respond to variable influent characteristics and environmental conditions. Recent investigations reveal a cost reduction of between 12% and 937%, depending on three diverse weather cases, with RL agents for system control with respect to traditional PID controllers [87].

Graph neural networks are employed to characterize relationships and dependencies in complex interconnected systems. This feature makes them suitable for WWTPs, where various operations are interconnected through process flows and feedback control. GNNs can naturally capture spatial dependency and temporal representation. A specific model, the NLCGNN, has also been developed, which is used to predict parameters such as COD, BOD, and TSS [88]. RL-GNN integration is an important step towards the dynamic optimization of wastewater treatment and can solve three problems:

How to perform adaptive control under uncertainty;
Multi-timescale optimization;
Scalability of dynamic optimization.

5. Results

AI models like artificial neural networks, support vector machines, and adaptive neuro-fuzzy inference systems have been used to predict the key parameters of wastewater that are quite difficult to predict.

Hybrid frameworks that pair ANNs with supplementary methodologies, such as fuzzy logic, have been constructed to reinforce the two key aspects of any predictive model: reliability (it behaves the same way under similar conditions) and interpretability (humans can understand the model). These characteristics are especially important when assessing treated wastewater for reuse in agriculture [27]. The potential of these models to optimize the processes of treating wastewater, cut costs, and comply with environmental regulations has been shown. They provide real-time monitoring and even predictive analyses [53].

Techniques of artificial intelligence have been embedded within digital twins and based on the Internet of Things to create even smarter and more responsive wastewater management systems that allow real-time data analysis and decision-making [7,53].

Even with these advancements, difficulties persist, such as the necessity for extensive datasets to train models successfully and the sophistication involved in melding AI systems with current wastewater infrastructures [7,27].

Current applications are much simpler and not as reliable as would be needed for an AI model that could fully and accurately do the job of controlling a modern plant in real time [27,80].

5.1. Comparative Analysis of ML Methods

For the following performance comparisons of these papers, the authors qualitatively summarize the reported results instead of quantitatively normalizing them, as the studies employed different evaluation metrics (R², RMSE, accuracy), varied dataset scales, target variables (COD, BOD, TSS), and validation protocols. Numerical equivalences could not be compared directly because of inconsistent reporting criteria. The following are estimated trends rather than performance rankings.

In the following, a comprehensive analysis of both supervised and unsupervised ML methods used in wastewater treatment is made to generate the main results of the paper (Figure 9).

5.1.1. Applications of Supervised ML Techniques in the Optimization of Treatment Processes

Facilities that treat wastewater have increasingly turned to artificial intelligence to run more efficiently and to try to predict how clean their treated water will be. These plants use models (like artificial neural networks, support vector machines, and random forests) to understand the complex relationships between all the different factors that affect their treatment process [89].

Support vector machines offer another powerful way of dealing with large amounts of data and intricate relationships. They have seen success in both forecasting the quality of effluent and in adjusting the treatment process to achieve much better results [76]. Meanwhile, random forests have become quite popular because they are durable and can work with all sorts of information, whether numerical data or categorical data.

Wastewater treatment plants are creating soft sensors, intelligent monitoring systems that employ machine learning to gauge vital measurements without the more expensive gear usually required. These digital sensors are placed inside the plant’s control system (SCADA), letting operators use the new prediction capabilities to better tune the plant in real-time [12].

Intelligent systems learn from historical plant data, so they forecast the treatment process’s behavior. RNNs and LSTM networks are particularly good at spotting immediate patterns and long-term trends in data. They are suitable for predicting a treatment system’s critical water quality measures [12,13,80,90,91].

Intelligent treatment plants assess water quality and determine whether it is clean enough to reuse for purposes such as irrigation or industrial processes [27]. They predict not just how well the plant will work but also enable real-time monitoring, allowing for quick decisions that lead to stable operations and less environmental damage [12,92]. Those are the plants’ smart parts. When the intelligent treatment plant uses historical data, it achieves excellent performance at predicting when equipment is going to break down.

These advanced prediction tools allow treatment plants to manage resources, use energy more efficiently, and meet environmental regulations more easily. Systems that learn from years of historical data become incredibly skilled at forecasting with the kind of accuracy that tools without this kind of data just cannot achieve when figuring out if and when equipment might break down; using the kind of data that plants have accrued over the years makes the predictions that much better [34,63,93].

The ubiquitous pumps, motors, valves, and other pieces of plant equipment are just as likely as any other to fail in ways harmful to the treatment process [59,94].

The plants produce much data through equipment, control process sensors, and smart meters. The quantity of data generated by the plants is vast, and they are well akin to the sorts of information inherent in the IoT and Industry 4.0 [10].

Table 5, Table 6, Table 7, Table 8 and Table 9 concern methods of supervised machine learning that require labeled training data (input–output pairs) for predictive modeling in wastewater treatment. This forecasting technique is primarily used for predicting the effluent quality parameter metrics, estimating process performance metrics, and optimizing operational control strategies in WWTPs. Of course, this is among the simplest applications of machine learning within the WWT sector. A range of techniques from traditional neural networks and support vector machines to advanced Deep Learning Architectures (DLAs), hybrid models, and optimization-enhanced modeling techniques are used.

The algorithms of machine learning listed in Table 5 are well established, having demonstrated successful outcomes in engineering applications. These methods require less computational power and resources, thus enjoying a good deal of interpretability.

Table 6 displays some advanced architectures of neural networks that can learn from large datasets. These models have high performance on data that exist in a much larger than normal number of dimensions (high-dimensional data). They can also automatically extract relevant features from this data. However, they require a large amount of training data and computational resources.

Intelligent systems that merge neural networks with fuzzy logic or other AI paradigms privide both learning capability and interpretability (Table 7). These models bridge the gap between accuracy and explainability in applications such as wastewater treatment.

Table 8 presents optimization algorithms, which are augmented machine learning methods that improve model performance, parameter tuning, and solution quality. They perform an automated search for best configurations and can handle multi-objective scenarios.

In Table 9, there are machine learning models that are tailored toward the difficulties faced in the wastewater treatment sector and its industrial integration. These models have been selected with the domain-specific requirements of the wastewater treatment sector in mind: necessities such as interpretability, real-time implementation, and integrable infrastructure.

Figure 10 evaluates five distinct models of supervised learning, ANN, SVM, RF, LSTM, and ANFIS, along with five critical dimensions relevant to their deployment in wastewater management.

Random forest leads the interpretability dimension, followed closely by hybrid models. Clear decision pathways are formed by these tree-based and rule-based systems. Engineers follow those pathways and validate their use, which is what makes these systems so valuable for regulatory compliance and process troubleshooting. Moderately interpretable, support vector machines enable decision boundaries to be understood in terms of support vectors. In contrast, artificial neural networks and long short-term memory networks work as black boxes with internal representations that are increasingly more complex (and thus, increasingly less understandable) when compared to support vector machines. The random forest and hybrid models used in wastewater treatment give operators a clear idea of the reasons behind certain decisions.

LSTM models are the best at forecasting the results of wastewater treatment plants because they are the best at understanding how things change over time.

Curiously, other methods, such as random forest or hybrid models, may also be able to perform very well, showing that there is no single best way to solve these forecasting problems. This gives plant operators a suite of reliable tools from which to choose, based on their individual needs.

The computational efficiency of SVM is the highest, with ANN in second place. Simpler models, like SVM, run faster and use less energy, which is essential for the limited-resource context of treatment plants. While it is probably the most accurate model, LSTM requires a significant amount of computing power, which creates bottlenecks that limit its real-world usage.

For real-time control, the best performer is ANN, followed closely by SVM. Both make decision-making possible within a timeframe of milliseconds, which is absolutely essential for maintaining effective treatment control. Random forest seems to have a moderate speed, which places it in a nice dimension between speed and accuracy. Hybrid models, if they exist in a certain configuration, do not have sufficient speed for real-time control. LSTM is too slow for real-time applications.

Limited data is not an issue for SVM and random forest because these two machine learning methods work well with small datasets, which is excellent because we know that collecting good-quality wastewater data is both expensive and difficult. LSTM and ANN are expensive in terms of the size of the dataset they require to achieve acceptable performance. Thus, LSTM and ANN are much less practical for new wastewater treatment facilities if we want to apply machine learning.

As shown in Figure 11, no single model excels in every respect. For resource-constrained environments, SVM provides the best balance of efficiency, speed, and data demands while still achieving acceptable accuracy. Random forest gives the best balance between accuracy and interpretability for regulatory compliance. LSTM is the highest performer in terms of accuracy when the available computational resources and data volume are extensive. ANN is the best in terms of speed, and the best for accuracy under time constraints. Hybrid models offer compromise solutions for moderately demanding applications.

5.1.2. Applications of Unsupervised ML Techniques in the Optimization of Treatment Processes

Grouping data into various categories is what clustering is all about; it answers the question of ‘what’ by showing the different kinds of data in a dataset. But then, to understand what kinds of data, we need to look at the structure of the data to figure out ‘why’ it is grouped in these ways.

Models based on deep learning, like convolutional neural networks and recurrent neural networks, can discern patterns and detect anomalous behavior in very large quantities of unlabeled data. This capability makes them potentially valuable for predicting water quality. In many situations, the data cannot be pre-classified [39].

Methods such as principal component analysis help clarify intricate datasets by identifying the most significant underlying patterns. They simplify complex data, reducing it to the most meaningful core elements [9].

In the work by Chen et al. [85], three clustering algorithms were tested to automatically classify inlet water quality without any pre-labeled examples. The algorithms applied to the water samples were K-means, DBSCAN, and AGNES.

Every algorithm possesses unique strengths. K-means performed the best; it achieved an accuracy of 94.7% in grouping similar water types and accurately identified the ideal number of clusters. The density-based algorithm DBSCAN was most effective at identifying outliers and other unusual data patterns; it often achieved perfect results but struggled when the data were particularly noisy or complex. The hierarchical algorithm AGNES used a tree-like strategy that let the engineers visually discern the natural grouping patterns in the data, achieving an accuracy rate of 95.5% [103].

The study validated that these unsupervised techniques can efficiently categorize water quality information without human direction, providing automated and valuable tools for engineers’ decisions about water treatment [103].

This is especially relevant when we have very little labeled data or are mining the sorts of new environmental patterns that would justify using unsupervised learning.

Clustering and dimensionality reduction are essential in unsupervised learning for making prediction models work with the kinds of data found in wastewater treatments. The prediction models must often deal with not just a significant amount of data (high volume) but also a significant amount of variables (high dimensionality). However, it is possible to identify patterns and trends in wastewater data using clustering techniques, such as fuzzy clustering.

Techniques for dimensional reduction, such as PCA and t-SNE, make the intricate data from wastewater easier to comprehend and manipulate. They take the convoluted, multi-dimensional data and turn it into something not nearly as hard to digest, without losing significant relationships in the data [104]. t-SNE serves operators very well for one specific purpose: helping them create visual maps of the data that reveal patterns and unusual behavior.

PCA plays a different, though equally valuable, role. It tells us which factors we ought to focus on if we want to make accurate predictions. For instance, it can identify whether the operators should be most concerned with membrane performance, the amount of ammonia, or the concentration of nitrogen if they are trying to predict how well the treatment is going to work [43,105].

Both methods essentially simplify the complex nature of wastewater treatment data, making it more manageable and meaningful for informed decision-making.

Table 10 presents ways of using unsupervised machine learning methods that do not require any kind of labeled training data. Because these methods focus on uncovering hidden structures and relationships in datasets, they are well-suited for exploratory data analysis, process monitoring, and anomaly detection in the context of wastewater treatment. These methods are essential when the kinds of labels that enable supervised learning are either unavailable or prohibitively expensive.

Figure 12 compares the performance of six unsupervised learning algorithms used in wastewater treatment applications. Each algorithm is indicated by a colored line. The chart portrays a clear image of which algorithms excel in specific areas and gives an overall comparison of all six algorithms across all performance measures.

The interpretability dimension is led by AGNES, which obtains a perfect score. AGNES provides maximum interpretability through the hierarchical dendrogram structure it produces. This clearly shows how the data points are progressively grouped into clusters. The tree-like visualization allows users to understand exactly how clustering decisions are being made at each level. Fuzzy clustering uses a key property that allows for strong interpretability: it provides membership degrees that indicate how strongly each data point belongs to different clusters. This property allows for the formulation of linguistic explanations through fuzzy rules. Good interpretability is achieved by t-SNE, primarily because of its visualization capabilities, which allow intricate high-dimensional relationships to be rendered visible in two or three dimensions. K-means, DBSCAN, and PCA all achieve a moderate level of performance. K-means provides easily understandable cluster centroids, DBSCAN presents a clear way of understanding point classifications, and PCA provides principal components that are easy to interpret. However, none of them performs as well as hierarchical methods when it comes to providing explicit decision pathways.

Scalability represents the primary strength of K-means algorithms. Its time complexity is linear with regard to the number of data points and clusters. PCA works well with good scalability properties, efficiently processing high-dimensional sensor data from wastewater treatment systems.

K-means is the best clustering algorithm for big data. The reason for K-means’ success is its simplicity and speed. K-means can be efficiently run on large amounts of data and is easily parallelizable. DBSCAN is able to cluster data of arbitrary shape and is robust to outliers. But when compared to K-means, DBSCAN is slower and not as good for big data.

Noisy or messy data are not a problem for DBSCAN, which identifies and isolates outliers quite well, allowing the clustering process to take place without interference from these problematic observations. Compared with DBSCAN, fuzzy clustering and PCA handle messy data and noise far better than AGNES or K-means, which are both pretty sensitive to messy data and outliers. t-SNE demonstrates moderate performance in this application; it falls somewhere in the middle. The other methods require more complex parameter tuning and setup decisions.

The parameter-free nature of PCA also contributes to its high score as a basic method for dimensionality reduction. However, choosing the optimal components may necessitate some domain expertise. DBSCAN, fuzzy clustering, and t-SNE are all of moderate ease of use in the world of data mining, but they are also of moderate difficulty because they require some carefulness in achieving good results. Each method requires the user to specify certain parameters that can dramatically affect the outcome. DBSCAN requires not only the epsilon value to be carefully determined but also the minimum points; fuzzy clustering requires the selection of a fuzziness parameter that is appropriate for the dataset at hand. t-SNE not only requires a careful selection of perplexity value but is also somewhat sensitive to the choice of certain other hyperparameters. The AGNES algorithm works at a moderate level on this metric. It requires a decision about linkage criteria and distance metrics, as well as a need to determine the optimal number of clusters from the dendrogram, which can be hard to do and is often domain-dependent.

Dimensionality reduction is what PCA and t-SNE are meant to do, and they do it well. PCA is good for linear reductions, keeping the important information, and t-SNE is good for the opposite: taking complex, nonlinear patterns and showing them in a way that humans can understand easily. The other algorithms are not really built for this purpose.

In terms of accuracy, K-means, DBSCAN, and fuzzy clustering all do a good job, but in different ways. K-means is good at finding what it seems to be finding—clear, round clusters. DBSCAN seems to be able to identify separated, oddly-shaped clusters and can handle varying densities with ease. Fuzzy clustering seems to do a good job at presenting what can be thought of as nuanced results that capture the overlapping operational conditions.

AGNES achieves moderate performance on reduced datasets; it does not handle noisy data well. t-SNE and PCA are not good for clustering since they are not designed for that task. t-SNE is designed for visualization, and PCA is for data reduction. They are both better at serving those purposes than they are at clustering.

Large-scale wastewater treatment operations benefit the most from K-means. When handling noisy data in which outliers may possibly be important signals, DBSCAN is a better algorithm to use, as it is particularly good at finding even the most unusual-seeming patterns. For smaller datasets where it is necessary to explain the results to stakeholders, AGNES is perfect. PCA and t-SNE are not truly clustering tools—they are better utilized for data preparation and for producing visualizations.

In Figure 13, a comprehensive overview of machine learning techniques and models commonly applied in wastewater treatment and water quality analysis can be seen.

5.1.3. Efficiency of Ensemble Methods in Optimizing Wastewater Treatment Processes

Ensemble methods can increase the efficiency of the treatment of wastewater by using several different models to arrive at a combination that performs better than any single model. This is common in machine learning. The ensemble models often used in deep learning, particularly those that use recurrent neural networks, are very powerful. These models can predict water quality in wastewater treatment plants with an accuracy of 97.81%. What makes these joint models so powerful is their ability to deal with different facets of the data at the same time. They can monitor how things change over time while also zeroing in on the big, important drivers that affect treatment performance. This dual capability lets them better tackle the inherent messiness of environmental data than the traditional single-model approach [73].

The performance of models that use random forest and XGBoost has been well established. A user of this work would not have to question whether or not either of these models performs better than the other, since XGBoost plainly dominates in terms of both predictive accuracy (as reflected in much lower RMSE values) [69].

Ensemble models have shown their ability in predicting the production of sludge. They are particularly good at fitting highly nonlinear data. Using XGBoost and random forest can make the prediction much better than simple models, almost always winning head-to-head comparison tests. They manage complex patterns and variability in the data much better than simple or even slightly complex models [74].

When predicting the generation of hazardous waste, it is much better to use ensemble models that couple classification and regression. These perform in a superior way over direct regression models. Direct regression models have a tough time handling imbalanced datasets [77].

The operational gain comes primarily from the ensemble methods simply being much more accurate in their predictive capabilities. These models can adapt to various environmental conditions and can also integrate data from different sources, making them a good choice in terms of an optimization tool for real-world wastewater treatment scenarios [69,73].

The radar chart in Figure 14 shows very clearly that ensemble methods outperform single models in each of the wastewater treatment applications. The performance gap is not slight either; ensemble methods achieve an overall accuracy of 97.81% in predicting water quality, while single models only manage 75%.

Energy consumption forecasting is an ensemble method that excels, with 85% effectiveness compared to 65% for individual models like XGBoost or random forest. Again, ensemble methods lead the way when predicting sludge production, with an advantage of 88% versus 70% for complex data pattern handling. Thus, approaches using the ensemble paradigm provide a clear advantage for these environmental predictions. When predicting hazardous waste, ensemble methods lead at 82 percent, while individual models are significantly lower at 68 percent. This is especially helpful when dealing with the imbalanced datasets that are the norm in wastewater treatment situations.

In Table 11, the water quality forecasting row describes how ensemble deep learning blends recurrent neural networks with random forest algorithms to achieve 97.81% accuracy. RNNs take care of the water quality data when it is arranged in a time-dependent way. In contrast, random forest provides stable features of importance rankings and make solid decisions based on the features they select. Thus, these two methods harmoniously work together on the water quality data.

Prediction of energy consumption shows the combination of random forest and XGBoost in ensemble methods based on trees. In comparative studies, XGBoost has yielded the best RMSE performance.

The prediction of sludge production demonstrates how ensemble methods outperform simpler models. These methods effectively manage the intricate patterns and variability that make predicting biological treatment processes especially hard. There are endless biological, chemical, and operational factors at play that influence production rates. Hence, it is especially valuable that the ensembles integrate numerous prediction strategies.

The ensemble framework combines classification and regression models to predict the generation of hazardous waste. This is a particularly valuable approach when working with imbalanced datasets, which are common in environmental monitoring applications. Such datasets occur when one class is much smaller in number than the other classes.

5.2. Limitations and Challenges of Implementing ML Models in Applications Dedicated to Optimizing Wastewater Treatment Processes

There are many limitations and challenges that machine learning faces across all domains when they are applied in the real world.

Machine learning has a predicament in the application of predicting groundwater quality. It needs three things, in particular, high-quality data, specialized expertise, and algorithm adaptability to the complexity of the real world [52]. Catalysis is an area where datasets are limited in size, and the complexity of ML models can cause some problems, particularly overfitting and a lack of generalization [18]. The integration of artificial intelligence with membrane bioreactors is limited by data quality problems, like sensor noise and missing values, that affect the accuracy and dependability of the models [43].

ML models encounter multiple significant problems that restrict their genuine implementation in the real world, especially in crucial tasks like water treatment. The foremost issue is their black box nature; it is quite often impossible to fathom how they achieve their outcomes, which makes it hard to get decision-makers to trust them with processes that really matter [4,18,43].

Another big problem is data. ML models require la significant amount of high-quality data to work properly, but this is both expensive and difficult to get hold of in water treatment. When poor data lead to unreliable models, the amount of good data that is either scarce or hard to access becomes even more critical [49,52,100]. These models demand a significant amount of computing power and continuous maintenance. Treatment plants that integrate ML into their systems require considerable retrofitting. The plants themselves must be retooled to accommodate the ML component, and smart systems must be integrated with types of hardware and software that are sometimes decades old [52,101].

There are also extra difficulties that come with teaching employees to use these emerging technologies, plus the ongoing need to maintain data integrity and security. There is also the issue of all the secret algorithms: they are inescapable if you are going to use ML, and they cannot be shared across the industry in any standardized way because companies guard them as trade secrets [4].

The radar chart in Figure 15 provides an overview of severity assessment across eight challenge categories, with two datasets overlaid to show both how hard each challenge is and how often it shows up in the research literature. The most serious challenge, at 90% severity, turns out to be Data Quality, reflecting just how critical it is that all the parts of the sensor system, communication system, and computational system work correctly in the service of achieving environmental goals.

Interpretability scores for models are at 85% severity, representing the tension between model performance and transparency. This tension particularly affects safety-critical environments, such as water treatment, where operators must understand and trust AI-driven decisions. Advanced machine learning models have a black-box nature that creates significant adoption barriers in environments where explainability is essential for regulatory compliance and operational confidence.

System adaptability challenges have a severity rating of 88%, reflecting the persistent gap between the training environments and the real world. This challenge is particularly severe in an application area like wastewater treatment, where environmental variability consistently outpaces controlled model training scenarios. The result is a need for continual recalibration that, in some cases, creates instability in the system’s operation and undermines the efficiency gains.

Resource requirements score 70% in severity, showcasing computational requirements and very large dataset needs that especially smaller facilities with limited technical capacities find almost insurmountable. The integration complexity carries a severity rate of 65%, which is high. It takes a significant amount of hardware and software, and especially a large and skilled workforce, to make ML systems work well in the real world.

Eighty-two percent of severity is achieved by model generalization problems, which most frequently manifest as a failure of sufficient performance when the model is deployed in an environment that is ‘out of sample’ or outside of the training set. This directly limits ML scalability and applicability in practice, as models that perform surprisingly well in pilot phases often fail to solve complex problems in real-world deployments.

Table 12 provides an analysis and serves as a detailed reference document that categorizes and quantifies each major challenge type. It does so with specific details about manifestation, affected applications, severity levels, and root causes.

The data quality category emerges as the most critical challenge affecting the seven studies out of eight that we reviewed. It manifests in the forms of sensor noise, missing values, and data scarcity. This is across the multiple application domains that we looked at, including membrane bioreactors, BSM models, and groundwater prediction systems. The high severity rating we gave here reflects how fundamental data quality is to machine learning success.

Five out of eight studies affected by model interpretability challenges have high severity levels, especially those dealing with deep neural networks and critical water treatment systems. These applications require a clear operator understanding and a trustworthy model to facilitate adoption. Our analysis identifies the murky decision-making processes of modern algorithms, particularly those with multiple hidden layers, as the key obstacle to interpretability. This lack of clear explanations creates barriers in environments that require an understandable model to either comply with regulations or to have enough ‘operational confidence’ to trust the model’s outputs.

Six of the eight studies have severe system adaptability issues that affect WWTP operations and deployment scenarios in real-world environments, wherever changing conditions challenge model assumptions. Continuous recalibration is necessary, but how operationally stable can a recalibrated system be if it requires that much attention to keep it from drifting?

Resource requirement challenges impact four of the eight studies. This primarily affects catalysis research and the prediction of membrane behavior and places a significant computational and dataset demand on the organization that is doing the research. By significant, it means that those organizations are also limited in their technical infrastructure. They either need to invest in that infrastructure to do the research or find cloud computing solutions that can overcome these demands.

Three out of the eight studies experienced medium-level severity effects on integration complexity, with the chief influence seen in digital water systems and existing infrastructures. This creates a requirement for not just technical but also substantial organizational changes to effect the needed modifications in hardware and software and to introduce requisite training across a diverse workforce.

Problems with model generalization affect half of the number of studies that we analyzed and do so with high severity (Table 12). This most adversely affects the development of catalysts, as well as other types of complex systems. The performance of ML solutions when applied to these systems suffers because the models tend to over-fit training data.

Table 13 provides a useful guideline of where to start with machine learning training in WWTPs by job function and technical difficulty. It addresses the fact that there are different roles that require different levels of ML sophistication, engineers need high-tech skills to develop models and remain compliant, operators need tools that anyone can use easily while on shift, and maintenance needs a predictive system that can immediately start to reduce downtime and costs.

5.3. Integration of IoT and ML for Real-Time Predictions

Significant progress has been made in environmental management using AI, especially deep neural networks that model the complex, time-varying relationships in water treatment systems [96].

Real-time monitoring and optimization of treatment plants is made possible by digital twins, which are another breakthrough. A successful instance was developed for a facility in Eindhoven, Netherlands [7].

Combining IoT sensors with machine learning allows for real-time predictions of water quality. In aquaculture, some of the latest deep learning models (transformer models) use IoT data to rapidly respond to environmental changes. Meanwhile, wastewater treatment plants use IoT along with LSTM models to better manage their facilities, conserving energy and cutting costs by predicting the chemical oxygen demand in real-time [98]. The combination of AI with self-powered IoT sensor systems is a leap forward for continuous monitoring in wastewater treatment plants, making autonomous and no-power data capture a reality [106].

The incorporation of the Internet of Things and machine learning into plant control systems vastly enhances real-time monitoring and the kind of predictive maintenance that lets operators make adjustments before things go wrong, thereby reducing the amount of manual intervention required [12].

Digital twin platforms in water treatment plants show how IoT and machine learning are coming together using real-time data and predictive analytics to enhance operational efficiency and reduce maintenance costs. At some water utilities, where the digital twin is more developed, it is used for predicting maintenance needs [34].

Real-time predictions in an IoT environment where resources are constrained are fundamentally an edge computing problem. These environments require solutions that minimize hold-up times, save energy, and guarantee that predictions happen in real time, even when the network is down and all that is available are ultralow-power, ultralow-performance Edge devices [34]. Decision-making related to the selection of Cloud vs. local infrastructure has strong implications on the performance of IoT-supported systems, with a recent study showing trade-offs between computational power and real-time responsiveness on automated environmental monitoring [85].

Edge computing boosts the efficiency of the processing of that data, allowing for much more timely and often much more accurate predictions to be made by machine learning algorithms. This matters a great deal in various applications (like water quality monitoring and wastewater treatment) where all of that class of algorithm really need to work in an efficient, environment-responsive, and often in a real-time or near-real-time way, to do its job. Edge computing also boosts IoT security, so it is a very large bonus for the sensitive data involved in water treatment. Timely access to all that data and timely access to all the alarms that systems like clarifier alarms generate are critical for maintaining the efficiency of the process. Edge computing integrates well with IoT and offers the promise of even better operational efficiency [4,34,43,65].

Water management is undergoing a transformation thanks to digital twins. These virtual copies of real-world systems update in real time and are starting to be used in the water supply sector. Knowing both the predicted problem location and the time at which it will happen allows network managers to solve problems before they become deep crises. At the same time, model use is allowing those in the water supply sector to trim costs and extend the lives of the networks and the facilities.

Digital twins have real power because they can continuously monitor everything and predict when equipment might fail. This allows maintenance teams to fix things before they shut down or cause problems with water quality [53,107].

5.4. Limitations of This Systematic Review

Even if this systematic review is comprehensive, considering the broad field discussed, it is highly unlikely that there will be no limitations in terms of the interpretation and generalization of our findings. These limitations can be influenced by methodology and the current state of research through the application of machine learning in the optimization of wastewater treatments. Even if the authors used many academic databases, there is a possibility that some papers were missing. First, there is a language barrier, and some relevant papers published in other languages could not be found. Additionally, the time restriction may exclude some foundational earlier work.

Also, the systematic review may be subject to publication and selection bias. Papers with positive results are overrepresented, and during the screening process papers were eliminated that did not follow PRISMA guidelines.

The included studies are not heterogeneous (experimental designs, laboratory, pilot, full-scale); this diversity limits the ability to conduct quantitative analysis and to establish performance hierarchies. Another limitation is regarding data quality and validation constraints; many publications lack details about data, model validation, and long-term performance.

Even though these limitations exist, this systematic review offers a good look at the present-day situation and the possible future of machine learning applications in optimizing wastewater treatments. Our identified limitations point to methods of several studies that need to be better standardized and reported, which may, in turn, help them be better understood and more accurately judged. The insights gained in this way should be of use to both researchers and practitioners and even to the private sector when it considers taking on particular projects.

6. Conclusions

This comprehensive review demonstrates that technologies associated with machine learning signify the kind of paradigm shift we rarely see in the optimization of wastewater treatment. These are, indeed, powerful new tools for achieving the kinds of capabilities we want in terms of monitoring, predicting, and controlling these very complex treatment systems. The analysis shows that predictive supervised learning models have become key tools in establishing the predictive nature of important parameters related to the quality of wastewater, like BOD and COD. Of these strategies, hybrid models are the most promising solution. The computational strength of neural networks is combined with the comprehensibility of fuzzy logic systems. This combination tackles one of the field’s most persistent challenges: ensuring that both accurate predictions and transparent, trustworthy decision-making processes are available for operators.

Integrating digital twin technology and IoT systems within machine learning frameworks is a breakthrough for real-time monitoring capabilities. By bringing together these technologies, the smart wastewater treatment facilities of the future will be able to develop responsive and adaptive operational frameworks.

The widespread implementation of these methods faces several critical challenges that must be addressed. Data quality and availability are fundamental limitations. High-quality training datasets are a necessity for the performance of machine learning models. Yet many treatment facilities either do not have such datasets or have datasets that are not suitable for this purpose.

Integrating systems is a significant challenge because current wastewater treatment plants usually must be altered substantially to accommodate new, advanced, and machine-learning-driven technologies. This includes not only compatibility factors but also economic ones. Upgrading a plant involves huge financial investments, and the justification for these investments usually must be based on either performance improvements or favorable cost-benefit ratios.

High implementation cost is the primary barrier to ML adoption. Upgrades on present WWTPs, on the other hand, are very costly in terms of new sensors, computing infrastructure, software licensing, and system integration. Unlike the ecological benefits that can be proven in pilot studies, economic sustainability requires long-term investments whose returns are not yet known. While this study shows that ML systems could lower energy usage and chemical consumption, the economic justification is complex due to the highly variable operational conditions, and plant managers may not be able to estimate savings accurately. Payback period uncertainty introduces financial risk that many reject. Once translated into economic value, in the form of avoided penalties, these environmental benefits can be calculated. Ultimately, whilst the environmental performance is likely to be the key long-term desideratum, economic viability represents the primary barrier to the widespread adoption of ML in wastewater treatment and is therefore the most important and challenging to overcome.

For data quality, we advocate implementing standardized sensor networks, including automated quality checks and industry-level databases monitoring for a minimum of 2 years. Regarding transparency hurdles, creating explainable AI models with SHAP and LIME tailored for wastewater operators can overcome this problem. Economic implementation barriers can be overcome with phased deployment strategies that deliver increasing value rather than requiring full up-front costs.

Future research should prioritize the following:

(A): Standardization: developing a standardized evaluation framework with well-defined performance metrics and cross-validation.
(B): Software for edge computing devices to develop ML models and model deployment with specific hardware recommendations.
(C): Evaluation: multi-site validation across 10+ geographically diverse WWTPs using the same set of hardware across sites.
(D): Documentation: model-to-deployment documentation for integrating ML models into existing SCADA systems, and ML model selection to SCADA systems for managing WWTPs, accounting for vendor compatibility matrices.

Prospective research needs to set up common benchmark standards and datasets, so that the performance comparison of different ML applications in WWTPs optimization can be more comprehensive.

Even though there are difficulties, the proof backs up the transformative potential of the implementation of machine learning in wastewater treatment. Its power to predict, with low operational costs and improved reliability, makes it essential for contemporary environmental management. Future research must concentrate on constructing models that are both robust and interpretable enough to be adaptable to real-world conditions. We feel that emphasis must be placed on hybrid modeling approaches that combine various machine learning techniques such that these new models perform better, are more stable, and are more usable in situations where we have limited data or are otherwise data poor, without sacrificing transparency in their decision-making processes.

For developing countries, the authors believe that the most viable ML technologies for WWTP optimization are SVMs and RF, because of theirs multiple advantages, such as minimal computational infrastructure (can run on basic computers with free software), are suppling good results even with small datasets (between one hundred and five hundred datasets), very good predictions for WWTP effluent quality (SVMs especially supplies very good COD and BOD predictions) with limited training data and are providing the plant human operator an easy to interpret optimization for chemical reagent dosing (RF). So, SVMs and RF, despite data quality problems, ensure robust performance, with minimal maintenance. On the other hand, ANNs, deep learning methods (such as LSTM, CNNs, etc.), and advanced IoT integration are solutions for highly developed countries, due to high computational requirements, the need for huge datasets, and complex maintenance demands.

This research establishes a complete roadmap for harnessing artificial intelligence to raise the efficiency, reliability, and environmental impact of wastewater treatment, an essential step in the water sector toward the next generation of smart, adaptive treatment systems.

Author Contributions

Conceptualization, F.-S.Z. and M.C.; methodology, F.-S.Z. and M.C.; software, F.-S.Z. and M.C.; validation, F.-S.Z. and M.C.; formal analysis, F.-S.Z. and M.C.; investigation, F.-S.Z. and M.C.; resources, F.-S.Z. and M.C.; data curation, F.-S.Z.; writing—original draft preparation, F.-S.Z. and M.C.; writing—review and editing, F.-S.Z., M.C. and S.F.M.; visualization, S.F.M.; supervision, S.F.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by internal funding from Petroleum-Gas University of Ploiesti, Romania.

Data Availability Statement

The raw data supporting the conclusions of this article will be made. available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AGNES	Agglomerative Hierarchical Clustering
AI	Artificial Intelligence
ANFIS	Adaptive Neuro-Fuzzy Inference System
ANNs	Artificial Neural Networks
AOPs	Advanced Oxidation Processes
BF	Belief Functions
BiLSTM	Bidirectional LSTM
BOD	Biochemical Oxygen Demand
CNNE-LOST-GPRE	Convolutional Neural Networks—Long Short-Term Memory Neural Networks—Gaussian Process Regression
CNN-LSTM	CNN combined with LSTM
CNNs	Convolutional Neural Networks
COD	Chemical Oxygen Demand
DBNs	Deep Belief Networks
DBSCAN	Density-based spatial clustering of applications with noise
DL	Deep Learning
DLA	Deep Learning Architectures
DNNs	Deep Neural Networks
DTs	Decision Trees
EDL	Ensemble Deep Learning
EMs	Ensemble Models
FFDs	Fractional Factorial Designs
FL	Fuzzy Logic
GAs	Genetic Algorithms
GB	Gradient Boosting
GNN	Graph Neural Networks
GOSS	Gradient-based One-Side
GPR	Gaussian Process Regression
Grad-CAM	Gradient-weighted Class Activation Mapping
GRUs	Gated Recurrent Unit networks
HDL	Hybrid Deep Learning
IoT	Internet of Things
KNNs	K-Nearest Neighbors
KPIs	Key Performance Indicators
LightGBM	Light Gradient Boosting Machine
LSTM	Long Short-Term Memory
MAE	Mean Absolute Error
MBRs	Membrane Bioreactors
MKRBFNN	Multi-Kernel Radial Basis Function Neural Network
ML	Machine Learning
MLP	Multilayer Perceptron
MSE	Mean Square Error
NLCGNN	Node-Level Capsule Graph Neural Networks
PCA	Principal Components Analysis
PLC	Programmable Logic Controller
PLSR	Partial Least Squares Regression
PRISMA	Preferred Reporting Items for Systematic Reviews and Meta-Analyses
PSO	Particle Swarm Optimization
R²	Coefficient of determination
RBF	Radial Basic Function
ReLu	Rectified Linear Unit
RF	Random Forest
RMSE	Root Mean Square Error
RNNs	Recurrent Neural Networks
RQ	Research Questions
RTs	Regression Trees
SCADA	Supervisory Control and Data Acquisition
SHAP	SHapley Additive exPlanations
SNNs	Shallow Neural Networks
SSA	Salp Swarm Algorithm
SVMs	Support Vector Machines
SVR	Support Vector Regression
TDS	Total Dissolved Solids
t-SNE	Distributed Stochastic Neighbor Embedding
TSS	Total Suspended Solids
UASB	Upflow Anaerobic Sludge Blanket reactor
WWTP	Wastewater Treatment Plant
XGBoost	eXtreme Gradient Boosting

References

European Union. Directive 2000/60/EC of the European Parliament and of the Council Establishing a Framework for Community Action in the Field of Water Policy (Water Framework Directive). 2000. Available online: https://eur-lex.europa.eu/eli/dir/2000/60/oj/eng (accessed on 17 July 2025).
European Environment Agency (EEA). Wastewater Treatment in Europe. Available online: https://www.eea.europa.eu/en (accessed on 18 July 2025).
Romanian Ministry of Environment. National Reports on Wastewater Treatment Infrastructure. Available online: https://mmediu.ro/ (accessed on 18 July 2025).
Lowe, M.; Qin, R.; Mao, X. A Review on Machine Learning, Artificial Intelligence, and Smart Technology in Water Treatment and Monitoring. Water 2022, 14, 1384. [Google Scholar] [CrossRef]
Alprol, A.E.; Mansour, A.T.; Ibrahim, M.E.E.-D.; Ashour, M. Artificial Intelligence Technologies Revolutionizing Wastewater Treatment: Current Trends and Future Prospective. Water 2024, 16, 314. [Google Scholar] [CrossRef]
Kato, S.; Kansha, Y. Comprehensive Review of Industrial Wastewater Treatment Techniques. Environ. Sci. Pollut. Res. 2024, 31, 51064–51097. [Google Scholar] [CrossRef]
Capodaglio, A.G.; Callegari, A. Use, Potential, Needs, and Limits of AI in Wastewater Treatment Applications. Water 2025, 17, 170. [Google Scholar] [CrossRef]
Xu, A.; Zou, X.; Wang, C. Prediction of the Wastewater’s pH Based on Deep Learning Incorporating Sliding Windows. Comput. Syst. Sci. Eng. 2023, 47, 1043–1059. [Google Scholar] [CrossRef]
Meng, X.; Zhang, Y. Quantitative Modeling and Predictive Analysis of Chemical Oxygen Demand in Wastewater Treatment Systems Utilizing Long Short-Term Memory Neural Network. Sustainability 2024, 16, 10359. [Google Scholar] [CrossRef]
Alzahrani, A.I.A.; Chauhdary, S.H.; Alshdadi, A.A. Internet of Things (IoT)-Based Wastewater Management in Smart Cities. Electronics 2023, 12, 2590. [Google Scholar] [CrossRef]
Sheikh Khozani, Z.; Ehteram, M.; Mohtar, W.H.M.W.; Achite, M.; Chau, K. Convolutional Neural Network–Multi-Kernel Radial Basis Function Neural Network–Salp Swarm Algorithm: A New Machine Learning Model for Predicting Effluent Quality Parameters. Environ. Sci. Pollut. Res. 2023, 30, 99362–99379. [Google Scholar] [CrossRef]
Gulshin, I.; Kuzina, O. Optimization of Wastewater Treatment Through Machine Learning-Enhanced Supervisory Control and Data Acquisition: A Case Study of Granular Sludge Process Stability and Predictive Control. Automation 2024, 6, 2. [Google Scholar] [CrossRef]
Lu, X.; Huang, S.; Liu, H.; Yang, F.; Zhang, T.; Wan, X. Research on Intelligent Chemical Dosing System for Phosphorus Removal in Wastewater Treatment Plants. Water 2024, 16, 1623. [Google Scholar] [CrossRef]
Begmatov, S.; Beletsky, A.V.; Mardanov, A.V.; Ravin, N.V. Genomic Analysis of the Uncultured AKYH767 Lineage from a Wastewater Treatment Plant Predicts a Facultatively Anaerobic Heterotrophic Lifestyle and the Ability to Degrade Aromatic Compounds. Water 2025, 17, 1061. [Google Scholar] [CrossRef]
Nnaji, C.C. A Review of the Upflow Anaerobic Sludge Blanket Reactor. Desalination Water Treat. 2014, 52, 4122–4143. [Google Scholar] [CrossRef]
Liu, Z.; Ding, H.; Huang, X. Anaerobic Membrane Bioreactor Treating Municipal Wastewater: Mainstream or Sidestream? Environ. Sci. Technol. 2024, 58, 20329–20332. [Google Scholar] [CrossRef]
Kim, J.; Kim, K.; Ye, H.; Lee, E.; Shin, C.; McCarty, P.L.; Bae, J. Anaerobic Fluidized Bed Membrane Bioreactor for Wastewater Treatment. Environ. Sci. Technol. 2011, 45, 576–581. [Google Scholar] [CrossRef]
Yuan, Q.; Wang, X.; Xu, D.; Liu, H.; Zhang, H.; Yu, Q.; Bi, Y.; Li, L. Machine Learning-Assisted Catalysts for Advanced Oxidation Processes: Progress, Challenges, and Prospects. Catalysts 2025, 15, 282. [Google Scholar] [CrossRef]
Zhang, M.; Dong, H.; Zhao, L.; Wang, D.; Meng, D. A Review on Fenton Process for Organic Wastewater Treatment Based on Optimization Perspective. Sci. Total Environ. 2019, 670, 110–121. [Google Scholar] [CrossRef]
Carbureanu, M. The Development of a Neuro-Fuzzy Expert System for Wastewater pH Control. J. Control Eng. Appl. Inform. 2014, 16, 30–41. [Google Scholar]
Li, N.; Zhu, F.; Wang, Z.; Wu, J.; Gao, Y.; Li, K.; Zhao, C.; Wang, X. Harnessing Corn Straw Biochar: A Breakthrough in Eco-Friendly Cu(II) Wastewater Treatment. Waste Manag. 2025, 197, 25–34. [Google Scholar] [CrossRef]
Abdykadyrov, A.; Kalandarov, P.; Taissariyeva, K.; Marxuly, S.; Abdykadyrkyzy, R.; Kassimov, A.; Yermekbayev, M.; Yerzhan, A. Study of the Process of Neutralizing and Oxidizing Harmful Phenol Compounds in Wastewater Using Ozone Technology. Water Conserv. Manag. 2024, 8, 420–429. [Google Scholar] [CrossRef]
Lu, D.; Ou, J.; Qian, J.; Xu, C.; Wang, H. Prediction of Non-Equilibrium Transport of Nitrate Nitrogen from Unsaturated Soil to Saturated Aquifer in a Watershed: Insights for Groundwater Quality and Pollution Risk Assessment. J. Contam. Hydrol. 2025, 274, 104649. [Google Scholar] [CrossRef]
Li, H.; Liu, C.; Guo, X.; Sun, H.; Li, X.; Jiang, H.; Miao, S. Applying Machine Learning Approach to Design Operational Control Strategies for a Wastewater Treatment Plant in Typical Scenarios. Water 2025, 17, 310. [Google Scholar] [CrossRef]
Yan, X.; Zhang, T.; Du, W.; Meng, Q.; Xu, X.; Zhao, X. A Comprehensive Review of Machine Learning for Water Quality Prediction over the Past Five Years. J. Mar. Sci. Eng. 2024, 12, 159. [Google Scholar] [CrossRef]
Yang, X.; Li, H.; Yang, Y. Study on the Construction and Application of a Prediction Model for Carbon Emissions in Wastewater Biological Treatment Processes. Processes 2025, 13, 999. [Google Scholar] [CrossRef]
Köksal, D.D.; Ahi, Y.; Todorovic, M. Assessing Agricultural Reuse Potential of Treated Wastewater: A Hybrid Machine Learning Approach. Agronomy 2025, 15, 703. [Google Scholar] [CrossRef]
Khosravi, K.; Farooque, A.A.; Karbasi, M.; Ali, M.; Heddam, S.; Faghfouri, A.; Abolfathi, S. Enhanced Water Quality Prediction Model Using Advanced Hybridized Resampling Alternating Tree-Based and Deep Learning Algorithms. Environ. Sci. Pollut. Res. 2025, 32, 6405–6424. [Google Scholar] [CrossRef]
Barzegar, R.; Aalami, M.T.; Adamowski, J. Short-Term Water Quality Variable Prediction Using a Hybrid CNN–LSTM Deep Learning Model. Stoch. Environ. Res. Risk Assess. 2020, 34, 415–433. [Google Scholar] [CrossRef]
Ekinci, E.; Özbay, B.; Omurca, S.İ.; Sayın, F.E.; Özbay, İ. Application of Machine Learning Algorithms and Feature Selection Methods for Better Prediction of Sludge Production in a Real Advanced Biological Wastewater Treatment Plant. J. Environ. Manag. 2023, 348, 119448. [Google Scholar] [CrossRef]
Da Costa, M.F.P.; Araújo, R.D.S.; Silva, A.R.; Pereira, L.; Silva, G.M.M. Predictive Artificial Neural Networks as Applied Tools in the Remediation of Dyes by Adsorption—A Review. Appl. Sci. 2025, 15, 2310. [Google Scholar] [CrossRef]
Zaghloul, M.S.; Achari, G. Application of Machine Learning Techniques to Model a Full-Scale Wastewater Treatment Plant with Biological Nutrient Removal. J. Environ. Chem. Eng. 2022, 10, 107430. [Google Scholar] [CrossRef]
Xu, X.; Wei, A.; Tang, S.; Liu, Q.; Shi, H.; Sun, W. Prediction of Nitrous Oxide Emission of a Municipal Wastewater Treatment Plant Using LSTM-Based Deep Learning Models. Environ. Sci. Pollut. Res. 2023, 31, 2167–2186. [Google Scholar] [CrossRef]
Rodríguez-Alonso, C.; Pena-Regueiro, I.; García, Ó. Digital Twin Platform for Water Treatment Plants Using Microservices Architecture. Sensors 2024, 24, 1568. [Google Scholar] [CrossRef]
Mehrpour, P.; Mirbagheri, S.A.; Kavianimalayeri, M.; Sayyahzadeh, A.H.; Ehteshami, M. Experimental pH Adjustment for Different Concentrations of Industrial Wastewater and Modeling by Artificial Neural Network. Environ. Technol. Innov. 2023, 31, 103212. [Google Scholar] [CrossRef]
Vadivel, T.; Barathi, K.; Arulkumaran, G.; Anand, M.B.; Cherubini, C. Wastewater Management Using a Neural Network-Assisted Novel Paradigm for Waste Prediction from Vermicomposting. Water 2024, 16, 3450. [Google Scholar] [CrossRef]
Xu, W.-L.; Wang, Y.-J.; Wang, Y.-T.; Li, J.-G.; Zeng, Y.-N.; Guo, H.-W.; Liu, H.; Dong, K.-L.; Zhang, L.-Y. Application and Innovation of Artificial Intelligence Models in Wastewater Treatment. J. Contam. Hydrol. 2024, 267, 104426. [Google Scholar] [CrossRef]
Ikram, M.; Liu, H.; Al-Janabi, A.M.S.; Kisi, O.; Mo, W.; Ali, M.; Adnan, R.M. Enhancing the Prediction of Influent Total Nitrogen in Wastewater Treatment Plant Using Adaptive Neuro-Fuzzy Inference System–Gradient-Based Optimization Algorithm. Water 2024, 16, 3038. [Google Scholar] [CrossRef]
Jafar, R.; Awad, A.; Jafar, K.; Shahrour, I. Predicting Effluent Quality in Full-Scale Wastewater Treatment Plants Using Shallow and Deep Artificial Neural Networks. Sustainability 2022, 14, 15598. [Google Scholar] [CrossRef]
He, M.; Qian, Q.; Liu, X.; Zhang, J.; Curry, J. Recent Progress on Surface Water Quality Models Utilizing Machine Learning Techniques. Water 2024, 16, 3616. [Google Scholar] [CrossRef]
Huang, Z.; Bai, Y.; Liu, H. Symmetry-Inspired Prediction of Nitrous Oxide Emissions in Wastewater Treatment Using Deep Learning and Explainable Analysis. Symmetry 2025, 17, 297. [Google Scholar] [CrossRef]
Jamshidzadeh, Z.; Latif, S.D.; Ehteram, M.; Sheikh Khozani, Z.; Ahmed, A.N.; Sherif, M.; El-Shafie, A. An Advanced Hybrid Deep Learning Model for Predicting Total Dissolved Solids and Electrical Conductivity (EC) in Coastal Aquifers. Environ. Sci. Eur. 2024, 36, 20. [Google Scholar] [CrossRef]
Frontistis, Z.; Lykogiannis, G.; Sarmpanis, A. Machine Learning Implementation in Membrane Bioreactor Systems: Progress, Challenges, and Future Perspectives: A Review. Environments 2023, 10, 127. [Google Scholar] [CrossRef]
Li, Y.; Mao, S.; Yuan, Y.; Wang, Z.; Kang, Y.; Yao, Y. Beyond Tides and Time: Machine Learning Triumph in Water Quality. Am. J. Appl. Math. Stat. 2023, 11, 89–97. [Google Scholar] [CrossRef]
Radović, S.; Pap, S. Machine Learning as a Support Tool in Wastewater Treatment Systems—A Short Review. In Proceedings of the Eleventh International Symposium GRID 2022, University of Novi Sad, Faculty of Technical Sciences, Department of Graphic Engineering and Design, Novi Sad, Serbia, 3–5 November 2022; pp. 799–807. [Google Scholar]
Nasir, F.B.; Li, J. Understanding Machine Learning Predictions of Wastewater Treatment Plant Sludge with Explainable Artificial Intelligence. Water Environ. Res. 2024, 96, e11136. [Google Scholar] [CrossRef]
Besseris, G. Non-Linear Saturated Multi-Objective Pseudo-Screening Using Support Vector Machine Learning, Pareto Front, and Belief Functions: Improving Wastewater Recycling Quality. Appl. Sci. 2024, 14, 9971. [Google Scholar] [CrossRef]
Nagpal, M.; Siddique, M.A.; Sharma, K.; Sharma, N.; Mittal, A. Optimizing Wastewater Treatment through Artificial Intelligence: Recent Advances and Future Prospects. Water Sci. Technol. 2024, 90, 731–757. [Google Scholar] [CrossRef]
Shahouni, R.; Abbasi, M.; Dibaj, M.; Akrami, M. Utilising Artificial Intelligence to Predict Membrane Behaviour in Water Purification and Desalination. Water 2024, 16, 2940. [Google Scholar] [CrossRef]
Baskar, G.; Parameswaran, A.N.; Sathyanathan, R. Optimizing Papermaking Wastewater Treatment by Predicting Effluent Quality with Node-Level Capsule Graph Neural Networks. Environ. Monit. Assess. 2025, 197, 176. [Google Scholar] [CrossRef]
Yu, J.; Li, G. Evaluating Artificial Intelligence-Based Industrial Wastewater Anaerobic Ammonium Oxidation Treatment Optimization and Its Environmental, Economic, and Social Benefits Using a Life Cycle Assessment–System Dynamics Model. Processes 2024, 13, 59. [Google Scholar] [CrossRef]
Pandya, H.; Jaiswal, K.; Shah, M. A Comprehensive Review of Machine Learning Algorithms and Its Application in Groundwater Quality Prediction. Arch. Comput. Methods Eng. 2024, 31, 4633–4654. [Google Scholar] [CrossRef]
Shtepa, V.; Junakova, N.; Zaiets, N.; Lutska, N.; Chernysh, Y.; Balintova, M. Conceptual Model of Digitization of the Municipal Wastewater Disposal Systems. Water 2024, 16, 3483. [Google Scholar] [CrossRef]
Awwad, A.; Husseini, G.A.; Albasha, L. AI-Aided Robotic Wide-Range Water Quality Monitoring System. Sustainability 2024, 16, 9499. [Google Scholar] [CrossRef]
Zhou, L.; Sun, Q.; Ding, S.; Han, S.; Wang, A. A Machine-Learning-Based Method for Ship Propulsion Power Prediction in Ice. J. Mar. Sci. Eng. 2023, 11, 1381. [Google Scholar] [CrossRef]
Otálora, P.; Guzmán, J.L.; Berenguel, M.; Acién, F.G. Data-Driven pH Model in Raceway Reactors for Freshwater and Wastewater Cultures. Mathematics 2023, 11, 1614. [Google Scholar] [CrossRef]
Manav-Demir, N.; Gelgor, H.B.; Oz, E.; Ilhan, F.; Ulucan-Altuntas, K.; Tiwary, A.; Debik, E. Effluent Parameters Prediction of a Biological Nutrient Removal (BNR) Process Using Different Machine Learning Methods: A Case Study. J. Environ. Manag. 2024, 351, 119899. [Google Scholar] [CrossRef]
Raheli, B.; Talebbeydokhti, N.; Saadat, S.; Nourani, V. Uncertainty Assessment of Surface Water Salinity Using Standalone, Ensemble, and Deep Machine Learning Methods: A Case Study of Lake Urmia. Iran. J. Sci. Technol. Trans. Civ. Eng. 2024, 48, 1029–1047. [Google Scholar] [CrossRef]
Chang, T.-Y.; Cho, W.-T.; Tseng, S.-Y.; Ouyang, Y.; Lai, C.-F. Predictive Maintenance of Water Purification Unit for Smart Factories. In Cognitive Cities; Shen, J., Chang, Y.-C., Su, Y.-S., Ogata, H., Eds.; Communications in Computer and Information Science; Springer: Singapore, 2020; Volume 1227, pp. 62–70. ISBN 978-981-15-6112-2. [Google Scholar]
Yu, A.; Xiao, Q. A Water Quality Prediction Model Based on Long Short-Term Memory Networks and Optimization Algorithms. IEEE Access 2024, 12, 175607–175615. [Google Scholar] [CrossRef]
Makumbura, R.K.; Mampitiya, L.; Rathnayake, N.; Meddage, D.P.P.; Henna, S.; Dang, T.L.; Hoshino, Y.; Rathnayake, U. Advancing Water Quality Assessment and Prediction Using Machine Learning Models, Coupled with Explainable Artificial Intelligence (XAI) Techniques like Shapley Additive Explanations (SHAP) for Interpreting the Black-Box Nature. Results Eng. 2024, 23, 102831. [Google Scholar] [CrossRef]
Recio-Colmenares, C.L.; Flores-Gómez, J.; Morales Rivera, J.P.; Palacios Hinestroza, H.; Sulbarán-Rangel, B. Green Materials for Water and Wastewater Treatment: Mechanisms and Artificial Intelligence. Processes 2025, 13, 566. [Google Scholar] [CrossRef]
Inca-Balseca, C.-L.; Salazar, C.; Rodríguez, J.; Barrera, M.; Kurbatova, A.I.; Inca, E.; Padilla-Padilla, N.-M.; Moreno-Yepez, I.-N.; Toapanta-Dacto, J.-V.; Ávila-Gaibor, G.-J.; et al. Stochastic State-Space Modeling for Sludge Concentration Height at the Ucubamba Guangarcucho Wastewater Treatment Plant. Water 2025, 17, 793. [Google Scholar] [CrossRef]
Haddaway, N.R.; Page, M.J.; Pritchard, C.C.; McGuinness, L.A. PRISMA2020: An R Package and Shiny App for Producing PRISMA 2020-compliant Flow Diagrams, with Interactivity for Optimised Digital Transparency and Open Synthesis. Campbell Syst. Rev. 2022, 18, e1230. [Google Scholar] [CrossRef]
De La Hoz-M, J.; Ariza-Echeverri, E.A.; Vergara, D. Exploring the Role of Artificial Intelligence in Wastewater Treatment: A Dynamic Analysis of Emerging Research Trends. Resources 2024, 13, 171. [Google Scholar] [CrossRef]
Niyongabo, A.; Zhang, D.; Guan, Y.; Wang, Z.; Imran, M.; Nicayenzi, B.; Guyasa, A.K.; Hatungimana, P. Predicting Urban Water Consumption and Health Using Artificial Intelligence Techniques in Tanganyika Lake, East Africa. Water 2024, 16, 1793. [Google Scholar] [CrossRef]
Guo, H.; Chen, Z.; Teo, F.Y. Intelligent Water Quality Prediction System with a Hybrid CNN–LSTM Model. Water Pract. Technol. 2024, 19, 4538–4555. [Google Scholar] [CrossRef]
Cechinel, M.A.P.; Neves, J.; Fuck, J.V.R.; De Andrade, R.C.; Spogis, N.; Riella, H.G.; Padoin, N.; Soares, C. Enhancing Wastewater Treatment Efficiency through Machine Learning-Driven Effluent Quality Prediction: A Plant-Level Analysis. J. Water Process Eng. 2024, 58, 104758. [Google Scholar] [CrossRef]
Alali, Y.; Harrou, F.; Sun, Y. Unlocking the Potential of Wastewater Treatment: Machine Learning Based Energy Consumption Prediction. Water 2023, 15, 2349. [Google Scholar] [CrossRef]
Rogers, O.N., III; Ambili, P.S. Water Quality Prediction with Machine Learning Algorithms. EPRA Int. J. Multidiscip. Res. IJMR 2024, 10, 82–86. [Google Scholar] [CrossRef]
More, K.S.; Wolkersdorfer, C. Predicting and Forecasting Mine Water Parameters Using a Hybrid Intelligent System. Water Resour. Manag. 2022, 36, 2813–2826. [Google Scholar] [CrossRef]
Taşan, M.; Taşan, S.; Demir, Y. Estimation and Uncertainty Analysis of Groundwater Quality Parameters in a Coastal Aquifer under Seawater Intrusion: A Comparative Study of Deep Learning and Classic Machine Learning Methods. Environ. Sci. Pollut. Res. 2023, 30, 2866–2890. [Google Scholar] [CrossRef]
Anandan, P.; Sundaram, A. Unveiling Agricultural Soil Runoff: Remote Sensing and Ensemble Deep Learning Models to Evaluate Impact of Climate on Water Quality and Human Health. Remote Sens. Earth Syst. Sci. 2024, 7, 722–737. [Google Scholar] [CrossRef]
Shao, S.; Fu, D.; Yang, T.; Mu, H.; Gao, Q.; Zhang, Y. Analysis of Machine Learning Models for Wastewater Treatment Plant Sludge Output Prediction. Sustainability 2023, 15, 13380. [Google Scholar] [CrossRef]
Kramar, V.; Alchakov, V. Time-Series Forecasting of Seasonal Data Using Machine Learning Methods. Algorithms 2023, 16, 248. [Google Scholar] [CrossRef]
Lai, Y.; Xiao, K.; He, Y.; Liu, X.; Tan, J.; Xue, W.; Zhang, A.; Huang, X. Machine Learning for Membrane Bioreactor Research: Principles, Methods, Applications, and a Tutorial. Front. Environ. Sci. Eng. 2025, 19, 34. [Google Scholar] [CrossRef]
Xie, W.; Yu, Q.; Fang, W.; Zhang, X.; Geng, J.; Tang, J.; Jing, W.; Liu, M.; Ma, Z.; Yang, J.; et al. Data-Driven Approaches Linking Wastewater and Source Estimation Hazardous Waste for Environmental Management. Nat. Commun. 2024, 15, 5432. [Google Scholar] [CrossRef]
Dritsas, E.; Trigka, M. Efficient Data-Driven Machine Learning Models for Water Quality Prediction. Computation 2023, 11, 16. [Google Scholar] [CrossRef]
Minh, T.N.; Thanh Truyen, N.; Hong Loan, D.T. Artificial Neural Networks for Modeling Pollutant Removal in Wastewater Treatment: A Review. Galore Int. J. Appl. Sci. Humanit. 2024, 8, 88–98. [Google Scholar] [CrossRef]
Wongburi, P.; Park, J.K. Prediction of Wastewater Treatment Plant Effluent Water Quality Using Recurrent Neural Network (RNN) Models. Water 2023, 15, 3325. [Google Scholar] [CrossRef]
Jongjaraunsuk, R.; Taparhudee, W.; Suwannasing, P. Comparison of Water Quality Prediction for Red Tilapia Aquaculture in an Outdoor Recirculation System Using Deep Learning and a Hybrid Model. Water 2024, 16, 907. [Google Scholar] [CrossRef]
Urbanovičs, V.; Parshutin, S.; Rubulis, J.; Bonders, M.; Dambeniece, K.; Ozols, R.; Štēbelis, D.; Dejus, S. Performance Evaluation of Machine Learning Methods for Drinking Water Contamination Detection. In Proceedings of the 3rd International Joint Conference on Water Distribution Systems Analysis & amp; Computing and Control for the Water Industry (WDSA/CCWI 2024), Ferrara, Italy, 1–4 July 2024; p. 110. [Google Scholar]
Rosca, C.-M.; Stancu, A.; Tănase, M.R. A Comparative Study of Azure Custom Vision Versus Google Vision API Integrated into AI Custom Models Using Object Classification for Residential Waste. Appl. Sci. 2025, 15, 3869. [Google Scholar] [CrossRef]
Hussein, Z.; Almngoshi; Balaji, V.; Ramesh, R.; Arokia Jesu Prabhu, L.; Venubabu, R.; Eswaramoorthy, V. Enhancing Predictive Maintenance in Water Treatment Plants through Sparse Autoencoder Based Anomaly Detection. J. Mach. Comput. 2024, 4, 279–289. [Google Scholar] [CrossRef]
Rosca, C.-M.; Stancu, A.; Popescu, M. The Impact of Cloud Versus Local Infrastructure on Automatic IoT-Driven Hydroponic Systems. Appl. Sci. 2025, 15, 4016. [Google Scholar] [CrossRef]
Cao, J.; Xue, A.; Yang, Y.; Lu, R.; Hu, X.; Zhang, L.; Cao, W.; Cao, G.; Geng, X.; Wang, L. A Hybrid Deep Learning Framework for Predicting Industrial Wastewater Influent Quality Based on Graph Optimisation. J. Water Process Eng. 2024, 65, 105831. [Google Scholar] [CrossRef]
Hernández-del-Olmo, F.; Gaudioso, E.; Duro, N.; Dormido, R.; Gorrotxategi, M. Advanced Control by Reinforcement Learning for Wastewater Treatment Plants: A Comparison with Traditional Approaches. Appl. Sci. 2023, 13, 4752. [Google Scholar] [CrossRef]
Geng, Y.; Zhang, F.; Liu, H. Multi-Scale Temporal Convolutional Networks for Effluent COD Prediction in Industrial Wastewater. Appl. Sci. 2024, 14, 5824. [Google Scholar] [CrossRef]
Rutland, H.; You, J.; Liu, H.; Bowman, K. Application of Machine Learning for FOS/TAC Soft Sensing in Bio-Electrochemical Anaerobic Digestion. Molecules 2025, 30, 1092. [Google Scholar] [CrossRef]
Moghadam, S.V.; Sharafati, A.; Feizi, H.; Marjaie, S.M.S.; Asadollah, S.B.H.S.; Motta, D. An Efficient Strategy for Predicting River Dissolved Oxygen Concentration: Application of Deep Recurrent Neural Network Model. Environ. Monit. Assess. 2021, 193, 798. [Google Scholar] [CrossRef]
Khan, M.S.J.; Sidek, L.M.; Kumar, P.; Alkhadher, S.A.A.; Basri, H.; Zawawi, M.H.; El-Shafie, A.; Ahmed, A.N. Machine Learning Based-Model to Predict Catalytic Performance on Removal of Hazardous Nitrophenols and Azo Dyes Pollutants from Wastewater. Int. J. Biol. Macromol. 2024, 278, 134701. [Google Scholar] [CrossRef]
Gulshin, I.; Makisha, N. Predicting Wastewater Characteristics Using Artificial Neural Network and Machine Learning Methods for Enhanced Operation of Oxidation Ditch. Appl. Sci. 2025, 15, 1351. [Google Scholar] [CrossRef]
Karadimos, P.; Anthopoulos, L. Machine Learning-Based Energy Consumption Estimation of Wastewater Treatment Plants in Greece. Energies 2023, 16, 7408. [Google Scholar] [CrossRef]
Amer, S.; Mohamed, H.K.; Badr Monir Mansour, M. Predictive Maintenance by Machine Learning Methods. In Proceedings of the 2023 Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, 21–23 November 2023; pp. 58–66. [Google Scholar]
Kalabarige, L.R.; Krishna, D.; Potnuru, U.K.; Mishra, M.; Alharthi, S.S.; Koutavarapu, R. Tree-Based Machine Learning and Nelder–Mead Optimization for Optimized Cr(VI) Removal with Indian Gooseberry Seed Powder. Water 2024, 16, 2175. [Google Scholar] [CrossRef]
Namadi, P.; Deng, Z. Deep Learning-Based Ensemble Modeling of Vibrio Parahaemolyticus Concentration in Marine Environment. Environ. Monit. Assess. 2023, 195, 229. [Google Scholar] [CrossRef]
Bagherzadeh, F.; Nouri, A.S.; Mehrani, M.-J.; Thennadil, S. Prediction of Energy Consumption and Evaluation of Affecting Factors in a Full-Scale WWTP Using a Machine Learning Approach. Process Saf. Environ. Prot. 2021, 154, 458–466. [Google Scholar] [CrossRef]
Arepalli, P.G.; K, J.N.; Rout, J.K. Aquaculture Water Quality Classification with Sparse Attention Transformers: Leveraging Water and Environmental Parameters. In Proceedings of the 2024 13th International Conference on Software and Computer Applications, Bali Island, Indonesia, 1–3 February 2024; pp. 318–325. [Google Scholar]
Wei, C.; Zhao, T.; Cao, J.; Li, P. Water Quality Prediction Model Based on Interval Type-2 Fuzzy Neural Network with Adaptive Membership Function. Int. J. Fuzzy Syst. 2025. [Google Scholar] [CrossRef]
Gulshin, I.; Kuzina, O. Machine Learning Methods for the Prediction of Wastewater Treatment Efficiency and Anomaly Classification with Lack of Historical Data. Appl. Sci. 2024, 14, 10689. [Google Scholar] [CrossRef]
Voipan, D.; Voipan, A.E.; Barbu, M. Evaluating Machine Learning-Based Soft Sensors for Effluent Quality Prediction in Wastewater Treatment Under Variable Weather Conditions. Sensors 2025, 25, 1692. [Google Scholar] [CrossRef]
Ruiz, D.; Casas, A.; Escobar, C.A.; Perez, A.; Gonzalez, V. Advanced Machine Learning Techniques for Corrosion Rate Estimation and Prediction in Industrial Cooling Water Pipelines. Sensors 2024, 24, 3564. [Google Scholar] [CrossRef]
Chen, K.; Shi, X.; Zhang, Z.; Chen, S.; Ma, J.; Zheng, T.; Alfonso, L. Using Unsupervised Learning to Classify Inlet Water for More Stable Design of Water Reuse in Industrial Parks. Water Sci. Technol. 2024, 89, 1757–1770. [Google Scholar] [CrossRef]
Nouraki, A.; Alavi, M.; Golabi, M.; Albaji, M. Prediction of Water Quality Parameters Using Machine Learning Models: A Case Study of the Karun River, Iran. Environ. Sci. Pollut. Res. 2021, 28, 57060–57072. [Google Scholar] [CrossRef]
Aparna, K.G.; Swarnalatha, R.; Murchana, C. Optimizing Wastewater Treatment Plant Operational Efficiency through Integrating Machine Learning Predictive Models and Advanced Control Strategies. Process Saf. Environ. Prot. 2024, 188, 995–1008. [Google Scholar] [CrossRef]
Rosca, C.-M.; Stancu, A. Integration of AI in Self-Powered IoT Sensor Systems. Appl. Sci. 2025, 15, 7008. [Google Scholar] [CrossRef]
Quaranta, E.; Ramos, H.M.; Stein, U. Digitalisation of the European Water Sector to Foster the Green and Digital Transitions. Water 2023, 15, 2785. [Google Scholar] [CrossRef]

Figure 1. WWTP block diagram.

Figure 2. Different search criteria for article selection are used in the Litmaps platform. (a) Search interface showing ‘wastewater + deep learning’ keyword combination. (b) Search interface showing ‘wastewater + machine learning’ keyword combination. (c) Search interface showing ‘wastewater + AI’ keyword combination.

Figure 3. Graph of citations associated with 300 articles according to the year of publication.

Figure 4. Graph of citations associated with 300 articles according to age-adjusted citations.

Figure 5. Distribution of published articles on wastewater treatment in ML context by Year (2011–2025).

Figure 6. PRISMA 2020 flow chart for a systematic literature review on machine learning and deep learning models for wastewater process prediction [64].

Figure 7. Citation relationships of academic articles that were selected using the method described in Figure 5.

Figure 8. Taxonomic organization of supervised and unsupervised learning methods for wastewater treatment applications.

Figure 9. Supervised and unsupervised ML methods used in wastewater treatment.

Figure 10. Comparative analysis of supervised learning methods used in wastewater treatment.

Figure 11. Performance metrics and characteristics of supervised learning methods. Colors indicate performance levels: red (undesirable), orange (moderate), green (desirable).

Figure 12. Comparing unsupervised learning techniques used in wastewater prediction across six key criteria.

Figure 13. Performance metrics and characteristics of unsupervised learning methods. Colors indicate performance levels: red (undesirable), orange (moderate), green (desirable).

Figure 14. Model performance comparison between ensemble methods and single models.

Figure 15. Distribution and severity of challenges across research literature.

Table 1. Distribution of published articles on wastewater treatment in ML context by journal and year (2018–2025).

Journal	2018	2019	2020	2021	2022	2023	2024	2025	Total
Water			1		3	8	14	3	29
Journal of Water Process Engineering			2	3	4	1	4		14
Journal of Environmental Management		1		1	2	2	4		10
Sustainability				1	2	2	4		9
Chemical Engineering Research & Design		1	2	3	2				8
Environmental Monitoring and Assessment		1	1	1	1	1	1	2	8
Applied Sciences						1	3	3	7
Science of the Total Environment			1	1	4	1			7
Chemosphere	1			1	1		3		6
Environmental Science and Pollution Research				1	3	2			6
Journal of Cleaner Production	1			1	1	1	1		5
Journal of Environmental Chemical Engineering					1		3	1	5
Processes					1		2	2	5
Water Research		1		1		3			5
Chemical Engineering Journal	1			3					4
Engineering Applications of Artificial Intelligence					1		2	1	4
IEEE Access			1				3		4
Desalination					1	1	1		3
Elsevier eBooks					1			2	3

Table 2. Distribution of published articles on wastewater treatment in ML context by journal and year (2020–2025).

Journal	2020	2021	2022	2023	2024	2025	Total
Water	1		2	5	10	3	21
Sustainability		1	1	1	4		7
Applied Sciences					2	3	5
Environmental Monitoring and Assessment		1	1			2	4
Environmental science and pollution research			1	2			3
Processes					1	2	3
Atmosphere					1	1	2
Electronics				2			2
Italian National Conference on Sensors					2		2

Table 3. Advantages, disadvantages, and applications of ML models in wastewater treatment optimization processes.

ML Technique	Advantages	Disadvantages	Applications
SVMs	High accuracy and robustness	Computationally intensive	Water quality prediction
	Works with nonlinear, non-separable data	Sensitive to kernel choice	Wastewater classification
	Efficient in high-dimensional spaces	Challenging hyperparameter tuning	MBR fouling modeling
	Strong theoretical foundation	Requires expert knowledge	Filtration and recycling optimization
	Robust to outliers	Poor with imbalanced data	Energy use reduction in WWTPs
DTs and EMs	Interpretable (DTs)	EMs are complex and less interpretable	Water/environmental monitoring
	Scalable, real-time, fast (LightGBM)	Time-consuming tuning	Real-time prediction in WWTPs
	Reduced overfitting (RFs)	DTs are prone to overfitting	Sludge and membrane performance prediction
	High predictive accuracy (XGBoost, LightGBM)	Needs expert tuning
KNNs and RTs	Non-parametric, flexible (KNNs)	Computationally expensive (KNNs)	Classification/regression
	High precision, outlier tolerance (KNNs)	Poor with high-dimensional data (KNNs)	Wastewater quality prediction
	Interpretable models complex patterns (RTs)	Overfitting on small/noisy data (RTs)	Groundwater quality and energy forecasting
	Robust to overfitting and noise	Complex deep trees (RTs)	Real-time monitoring in WWTPs

Table 4. DL models’ advantages, disadvantages, and applications in the optimization of wastewater treatment processes.

DL Technique	Advantages	Disadvantages	Applications
ANNs	High predictive power	Complex training	Trends and pollution prediction
	Flexible architectures	Risk of overfitting	Process modeling & control
	Nonlinear modeling (e.g., ReLU)	Time-consuming tuning	Optimization in treatment processes
	Hybrid adaptability (e.g., GA-ANN)	Computationally intensive	Forecasting wastewater inflow dynamics
RNNs	Effective time-series modeling	Training complexity	Water quality prediction
	Real-time monitoring	Overfitting (esp. LSTM)	N₂O/COD prediction
	Handles complex temporal patterns	Data quality challenges	IoT-based real-time monitoring
	Suitable for IoT integration	High resource demands	Predictive maintenance
CNNs	Accurate feature extraction	Computationally intensive	Contamination detection
	Robust to noise	Needs large datasets	Water quality monitoring
	Dynamic system predictions	Overfitting risk	Dynamic wastewater system analysis
	Real-time anomaly detection	Limited interpretability in deep layers	Fault detection in water filtration systems
Autoencoders and DBNs	Dimensionality reduction	Training complexity	Sensor data compression
	Hierarchical feature learning	Overfitting challenges	Key feature extraction
	Effective on complex data	Data preparation challenges	Real-time quality monitoring
	Noise reduction	Limited interpretability	Environmental surveillance
HDL	High accuracy	High computational cost	WWTPs operational efficiency
	Adaptable to nonlinear data	Overfitting risk	Anomaly detection
	Strong optimization and fine-tuning	Complex integration	Sensor-based malfunction detection
	Robustness	Data quality dependent	Environmental and treatment process optimization

Table 5. Traditional ML models are applied in wastewater treatment optimization.

Technique/Model	Application	Key Parameters/Output	References
Support Vector Methods
Support Vector Machines (SVMs)	Effluent quality forecasting, treatment optimization	TSS, nutrient levels, classification	[43,44,46,47,76,77]
Support Vector Regression (SVR)	Regression-based water quality prediction	Continuous water quality parameters	[24,29,43]
Tree-based and Ensemble Models
Random Forest (RF)	Feature selection, membrane fouling prediction	COD, BOD5, MBR performance, feature importance	[8,18,24,69,71,76,77,92]
XGBoost	Ensemble gradient boosting for effluent prediction	COD, energy consumption, ensemble predictions	[30,69,75,77,92]
Tree-based Machine Learning	Decision trees for pollutant removal optimization	Cr(VI) removal, sludge output prediction	[74,95]
Ensemble Learning Methods	Combined multiple models for enhanced accuracy	TDS, EC, salinity, improved robustness	[28,58,96]
Statistical and Regression Models
k-nearest neighbors (KNN)	Water quality prediction Effluent quality forecasting	BOD5, COD, TSS, pH prediction	[41,52,69,76,77]
Partial Least Squares Regression (PLSR)	Data-driven modeling with dimensionality reduction	COD, NH3-N, multi-collinear data	[9,43]
Gaussian Process Regression	Probabilistic regression with uncertainty	Hydropower production prediction	[18,42,49]

Table 6. Deep learning models applied in wastewater treatment optimization.

Technique/Model	Application	Key Parameters/Output	References
Neural Networks
Artificial Neural Networks (ANNs)	Predict WWTP performance, effluent quality, and odor concentration	COD, BOD5, TSS, NH3-N, TP, odor emissions	[9,27,39,56,76,79,80,89,97]
Recurrent Neural Networks (RNN)	Temporal pattern recognition in treatment data	NH3-N, BOD5, TP, time-series parameters	[9,33,39,49,60,80]
Long Short-Term Memory (LSTM)	Time-series prediction, COD modeling, N₂O emissions	COD, BOD5, TP, TN, N₂O emissions, temporal patterns	[9,29,33,60,75,80,92]
Bidirectional LSTM (BiLSTM)	Enhanced time-series modeling with past/future context	Groundwater levels, N₂O emissions	[28,41,68]
Deep Recurrent Neural Networks	Advanced RNN architectures	Dissolved oxygen concentration	[90]
Convolutional Neural Networks (CNN)	pH prediction, water quality classification, microplastic detection	pH values, water quality patterns, microplastic classification	[5,8,11,29,39,41,42,49]
Advanced Deep Learning Architectures
Graph Neural Networks (GNN)	Node-level capsule networks for effluent prediction	COD, BOD, TSS using graph structures	[50]
Attention Transformers	Advanced sequence modeling for water quality	Water and environmental parameters	[98]
Sparse Attention Transformers	Efficient attention mechanisms for large-scale data	Water and environmental parameters	[98]
Temporal Fusion Transformers	Treatment efficiency enhancing	BOD5, COD	[12]
Deep Belief Networks	Hierarchical feature learning	Sludge bulking monitoring, process dynamics	[47,50]
Hybrid Deep Learning Approaches
Hybrid CNN-LSTM Models	Combined spatial-temporal water quality prediction	DO, temperature, turbidity, multi-dimensional data	[29,67]
LSTM-GRU Hybrid	Operational control strategy optimization	Effluent COD, TN, BOD operational parameters	[24,97]
PLO-CNN-BiLSTM-Attention	Advanced N₂O emission prediction with optimization	N₂O emissions with symmetry considerations	[41]
CNN-MKRBFNN-SSA	Multi-kernel RBF with swarm optimization	COD, BOD, TSS effluent parameters	[11]

Table 7. Neuro-fuzzy and hybrid intelligent systems in wastewater treatment optimization.

Technique/Model	Application	Key Parameters/Output	References
Adaptive Neuro-Fuzzy (ANFIS)	Interpretable hybrid modeling	Quality classification, total nitrogen	[27,35,36,37,38,48,99]
ANFIS-GBO	ANFIS with Gradient-Based Optimization	Total nitrogen prediction, enhanced accuracy	[38]
Hybrid Intelligent Systems	Combined multiple AI approaches	Mine water parameter prediction	[48,71]

Table 8. Optimization-enhanced ML models in wastewater treatment optimization.

Technique/Model	Application	Key Parameters/Output	References
Bio-inspired Optimization
Genetic Algorithm (GA)	Optimization of neural network parameters	Treatment efficiency optimization	[11,38]
Particle Swarm Optimization (PSO)	Soft sensor parameter optimization	NH4-N prediction, parameter tuning	[5,76]
Salp Swarm Algorithm (SSA)	Bio-inspired optimization for ML parameters	CNN-RBF optimization, enhanced accuracy	[11,38,42,98]
Multi-objective Optimization
NSGA-II	Multi-objective evolutionary optimization	Energy-quality trade-off optimization	[41]
Belief Functions + Pareto Front.	Multi-objective optimization with uncertainty	Wastewater recycling quality parameters	[47]

Table 9. Specialized and application-specific models in wastewater treatment optimization.

Technique/Model	Application	Key Parameters/Output	References
Interpretable and Explainable Models
Explainable AI—SHAP	Model interpretation and feature importance	Water quality parameters, model transparency	[41,47,61]
Data-driven pH Models	Specialized pH control systems	pH values in raceway reactors	[35]
Industrial Integration Models
Digital twin and SCADA Integration with ML	Real-time monitoring and control systems	System-wide parameters, process automation	[12,34,80]
Soft Sensors (ANN + Optimization)	Virtual sensing for real-time monitoring	NH4-N, FOS/TAC ratios, process parameters	[5,76,89,100,101]
Data Processing and Enhancement
Data Amplification	Data augmentation for small datasets	Enhanced dataset size, improved model training	[41]
Time-Series Models	Time-series data transformation, Seasonal and temporal pattern prediction	pH prediction, temporal feature extraction	[8,47,75]
Remote Sensing + ML Integration	Spatial water quality assessment	Agricultural runoff, climate impact analysis	[73]
Hyperparameter Optimization	Automated parameter tuning	Model performance optimization	[102]
Machine Learning for Energy Prediction	Energy consumption optimization	Energy consumption in WWTPs	[69,93]
Graph Optimization in Deep Learning	Graph-based optimization for industrial applications	Industrial wastewater influent quality	[86]
Life Cycle Assessment	Environmental impact modeling	Environmental, economic, social benefits	[51]

Table 10. Unsupervised machine learning techniques for wastewater treatment processes.

Technique/Model	Application	Key Parameters/Output	References
Principal Component Analysis (PCA)	Dimensionality reduction and feature selection	MLSS, pressure, effluent quality parameters	[9,39,43]
t-distributed Stochastic Neighbor Embedding (t-SNE)	Nonlinear dimensionality reduction for data visualization	Process dynamics, pattern visualization, anomaly detection	[42,50]
K-means Clustering	Water quality classification and grouping	Water quality indicators, inlet water classification	[103]
DBSCAN	Density-based clustering with noise and outlier handling	Water quality indicators, outlier detection, noise isolation	[103]
AGNES (Agglomerative Nesting)	Hierarchical agglomerative clustering	Water quality indicators, dendrogram representation, hierarchical patterns	[103]
Fuzzy Clustering + PCA	Pattern identification in membrane bioreactor processes	Membrane fouling trends, process pattern recognition	[43]
Autoencoders	Anomaly detection and feature extraction	Process anomalies, predictive maintenance, and data compression	[84]

Table 11. Ensemble learning methods in WWTP prediction.

Application Area	Ensemble Method	Key Models Combined	Performance Metric	Advantage
Water Quality Prediction	Ensemble Deep Learning (EDL)	RNN + Random Forest	97.81% Accuracy	Temporal dependencies + robust feature selection
Energy Consumption Prediction	Tree-based Ensemble	Random Forest + XGBoost	Best RMSE performance	Superior handling of complex datasets
Sludge Production Prediction	Ensemble Structure	XGBoost + Random Forest	Outperforms simple models	Manages complex patterns and variability
Hazardous Waste Generation	Classification + Regression	Coupled ensemble models	Superior to direct regression	Handles imbalanced datasets effectively

Table 12. Categorization and analysis of major challenge types.

Challenge Category	Specific Issues	Affected Applications	Root Causes
Data Quality	Sensor noise, missing values, data scarcity	MBRs, quality prediction	Equipment limitations, environmental variability
Model Interpretability	Black box nature, lack of transparency	Deep neural networks, critical water treatment	Complex algorithms, multiple hidden layers
System Adaptability	Unexpected variations, recalibration needs	WWTP operations, real-world deployment	Dynamic environmental conditions
Resource Requirements	Computational demands, extensive datasets	Catalysis, membrane behavior prediction	Complex model architectures, training demands
Integration Complexity	Hardware/software modifications, workforce training	Digital water systems, existing infrastructure	Legacy systems, organizational resistance
Model Generalization	Overfitting, poor performance on new data	Catalysis development, Complex systems	Limited training data, model complexity
Expertise Requirements	Specialized knowledge, algorithm tuning	Groundwater quality, model deployment	Technical complexity, skills shortage
Regulatory & Ethical	Privacy, security, accountability, and public trust	Critical water treatment, public systems	Regulatory gaps, societal concerns

Table 13. WWTP ML training recommendations.

Role	Priority	Technology	Application & Metrics
Engineers	Primary	Random Forest SVM	Performance/ interpretability balance, regulatory compliance
Engineers	Secondary	ANNs	COD/BOD prediction, effluent quality assessment
Engineers	Advanced	LSTM Networks	time-series, equipment failure prediction
Operators	Primary	Decision Trees	Interpretable rules (“if pH < 6.5, then adjust aeration”)
Operators	Secondary	Statistical Models	SCADA integration, real-time NH4-N/DO prediction
Maintenance	Primary	Anomaly Detection	Predictive maintenance
Maintenance	Secondary	IoT Integration	Sensor data processing, edge computing

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zamfir, F.-S.; Carbureanu, M.; Mihalache, S.F. Application of Machine Learning Models in Optimizing Wastewater Treatment Processes: A Review. Appl. Sci. 2025, 15, 8360. https://doi.org/10.3390/app15158360

AMA Style

Zamfir F-S, Carbureanu M, Mihalache SF. Application of Machine Learning Models in Optimizing Wastewater Treatment Processes: A Review. Applied Sciences. 2025; 15(15):8360. https://doi.org/10.3390/app15158360

Chicago/Turabian Style

Zamfir, Florin-Stefan, Madalina Carbureanu, and Sanda Florentina Mihalache. 2025. "Application of Machine Learning Models in Optimizing Wastewater Treatment Processes: A Review" Applied Sciences 15, no. 15: 8360. https://doi.org/10.3390/app15158360

APA Style

Zamfir, F.-S., Carbureanu, M., & Mihalache, S. F. (2025). Application of Machine Learning Models in Optimizing Wastewater Treatment Processes: A Review. Applied Sciences, 15(15), 8360. https://doi.org/10.3390/app15158360

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Application of Machine Learning Models in Optimizing Wastewater Treatment Processes: A Review

Abstract

1. Introduction

2. Wastewater Treatment Processes in an Industrial Wastewater Treatment Plant

2.1. Stages of the Wastewater Treatment Process

2.2. Traditional and AI-Based Technologies Used in Wastewater Treatment

2.3. Performance Indicators Used in Wastewater Treatment

2.4. Challenges in Wastewater Treatment

3. Materials and Methods

3.1. Methodology Used for Machine Learning Models Applied in Wastewater Treatment Processes

3.2. Objectives and Purpose of the Review

4. Classification of ML Models Applied in the Optimization of Wastewater Treatment Processes

4.1. ML Models Applied in the Optimization of Wastewater Treatment Processes

4.1.1. Support Vector Machines (SVMs)

4.1.2. Decision Trees (DTs) and Ensemble Models (EMs)

4.1.3. K-Nearest Neighbors (KNNs) and Regression Trees (RTs)

4.2. Deep Learning Models (DL, Subfield of ML) Applied in the Optimization of Wastewater Treatment Processes

4.2.1. Artificial Neural Networks (ANNs)

4.2.2. Recurrent Neural Networks (RNNs)

4.2.3. Convolutional Neural Networks (CNNs)

4.2.4. Autoencoders and Deep Belief Networks (DBNs)

4.2.5. Hybrid Deep Learning (HDL) Approaches

4.2.6. Advanced AI Techniques for Dynamic Optimization

5. Results

5.1. Comparative Analysis of ML Methods

5.1.1. Applications of Supervised ML Techniques in the Optimization of Treatment Processes

5.1.2. Applications of Unsupervised ML Techniques in the Optimization of Treatment Processes

5.1.3. Efficiency of Ensemble Methods in Optimizing Wastewater Treatment Processes

5.2. Limitations and Challenges of Implementing ML Models in Applications Dedicated to Optimizing Wastewater Treatment Processes

5.3. Integration of IoT and ML for Real-Time Predictions

5.4. Limitations of This Systematic Review

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI