Machine Learning in Flocculant Research and Application: Toward Smart and Sustainable Water Treatment

Caichang Ding; Ling Shen; Qiyang Liang; Lixin Li

doi:10.3390/separations12080203

,

and

¹

School of Computer and Information Science, Hubei Engineering University, Xiaogan 432000, China

²

School of Intelligent Engineering, Hubei Industrial Polytechnic, Shiyan 442000, China

³

School of Environment and Chemical Engineering, Heilongjiang University of Science and Technology, Harbin 150022, China

^*

Authors to whom correspondence should be addressed.

Separations2025, 12(8), 203;https://doi.org/10.3390/separations12080203

This article belongs to the Section Environmental Separations

Version Notes

Order Reprints

Abstract

Flocculants are indispensable in water and wastewater treatment, enabling the aggregation and removal of suspended particles, colloids, and emulsions. However, the conventional development and application of flocculants rely heavily on empirical methods, which are time-consuming, resource-intensive, and environmentally problematic due to issues such as sludge production and chemical residues. Recent advances in machine learning (ML) have opened transformative avenues for the design, optimization, and intelligent application of flocculants. This review systematically examines the integration of ML into flocculant research, covering algorithmic approaches, data-driven structure–property modeling, high-throughput formulation screening, and smart process control. ML models—including random forests, neural networks, and Gaussian processes—have successfully predicted flocculation performance, guided synthesis optimization, and enabled real-time dosing control. Applications extend to both synthetic and bioflocculants, with ML facilitating strain engineering, fermentation yield prediction, and polymer degradability assessments. Furthermore, the convergence of ML with IoT, digital twins, and life cycle assessment tools has accelerated the transition toward sustainable, adaptive, and low-impact treatment technologies. Despite its potential, challenges remain in data standardization, model interpretability, and real-world implementation. This review concludes by outlining strategic pathways for future research, including the development of open datasets, hybrid physics–ML frameworks, and interdisciplinary collaborations. By leveraging ML, the next generation of flocculant systems can be more effective, environmentally benign, and intelligently controlled, contributing to global water sustainability goals.

Keywords:

flocculants; machine learning; structure–property modeling; process optimization; flocculation mechanisms

1. Introduction

Global and national commitments to dual carbon goals and energy conservation underscore the urgent need for innovative pollutant control technologies to drive emission reduction and sustainable environmental management [1,2,3,4,5]. Within this framework, flocculants are pivotal in wastewater treatment, enabling efficient aggregation and separation of pollutants to support sustainable water management objectives [6,7,8,9,10]. Their growing prominence in recent research stems from their cost-effective and versatile ability to remove diverse contaminants [11], such as suspended solids [12,13,14] and heavy metals [15,16,17,18,19], across various wastewater types, driven by mechanisms like charge neutralization, which destabilizes particle charges, and polymer bridging, which forms settleable flocs to enhance treatment efficacy.

Although flocculants are widely employed in wastewater treatment, critical research gaps in performance optimization, mechanistic insights, and adaptability to complex wastewaters limit their potential [20]. Current flocculants often lack tailored designs for specific pollutants and wastewater conditions, constraining their efficacy in treating high-organic or heavy metal-laden effluents and necessitating targeted improvements in chemical and physical properties [21,22]. Furthermore, the microscopic mechanisms of flocculation, such as charge neutralization and polymer bridging, remain poorly understood in complex wastewater matrices, where interactions with diverse pollutants and dynamic aggregation processes are underexplored. Compounding these challenges, flocculants struggle to address complex industrial wastewaters, such as dye or oily effluents, due to variability in pH and ionic strength, while effective formulations for emerging contaminants, like microplastics and pharmaceuticals, remain underdeveloped. These gaps underscore the urgent need for innovative strategies to develop versatile, high-performance flocculants for sustainable wastewater treatment [23,24].

Machine learning (ML), harnessing algorithms such as deep learning and random forests to process high-dimensional datasets [25,26,27], delivers precise predictive modeling, parameter optimization, and real-time adaptability [28], establishing it as a transformative tool for advancing flocculant research [29,30,31,32]. To address critical gaps in performance customization, mechanistic understanding, and applicability to complex wastewaters, this review proposes a novel framework that systematically synthesizes the literature across ML methodologies, flocculant synthesis processes, and application strategies. By leveraging ML to predict flocculant efficacy with high accuracy, elucidate microscopic mechanisms like charge neutralization and polymer bridging through dynamic data analysis [33,34], optimize treatment processes for challenging wastewaters containing heavy metals [35], microplastics, or organic pollutants, and enable automated, data-driven dosage control, this framework overcomes limitations in tailored design, mechanistic clarity, and process adaptability [36]. This integrative approach not only unifies fragmented research but also provides a robust theoretical and practical foundation for developing intelligent, sustainable wastewater treatment technologies aligned with global environmental goals.

This integrative approach not only unifies fragmented research but also provides a robust theoretical and practical foundation for developing intelligent, sustainable wastewater treatment technologies aligned with global environmental goals.

This review is the first to systematically consolidate the progress of machine learning (ML) in enhancing flocculant research, providing a structured framework for advancing wastewater treatment innovation. It opens with an overview of flocculant types, synthesis methods, and flocculation mechanisms, including charge neutralization and bridging. Next, it explores ML-driven approaches, categorized into data processing and modeling techniques, which enable robust optimization. The review further examines ML’s optimization roles in two domains: (1) flocculant synthesis, through predictive analysis of material structures, chemical compositions, and synthesis parameters; (2) application processes, via improved flocculant selection, real-time process monitoring, dosage forecasting, and kinetics prediction. Concluding with an evaluation of challenges and future opportunities, this review charts a path toward intelligent, eco-friendly flocculant technologies.

This review was conducted by systematically screening the recent literature on the application of machine learning in flocculant research and water treatment. The main steps included (1) identifying core ML algorithms used for flocculant synthesis and application, (2) categorizing flocculant types and their mechanism with respect to ML-driven improvements, (3) summarizing experimental and modeling advances, and (4) evaluating challenges, future prospects, and interdisciplinary opportunities. A hybrid approach of bibliometric analysis and a targeted full-text review ensured comprehensive and critical coverage of the field.

2. Flocculant Classification and Conventional Preparation

Flocculants can be broadly categorized into inorganic coagulants, synthetic organic polymers, and bioflocculants [37]. The major types of flocculants and their features are summarized in Table 1.

Table 1. Application of flocculation in wastewater treatment.

Each class exhibits distinct mechanisms of particle destabilization and bridging, preparation methods, and environmental footprints. Understanding these traditional preparation strategies and their limitations is critical to appreciating the potential enhancements offered by machine learning (ML)-driven design. Table 2 demonstrates the excellent roles played by different flocculants in the treatment of various types of wastewater.

Table 2. Machine learning for flocculant application optimization.

Figure 1 presents some different types of flocculants. Inorganic metal salts (e.g., alum and ferric chloride) rapidly neutralize colloid charge but generate metal-laden sludge, synthetic polymers are highly effective at low doses yet non-biodegradable, and bio-based polymers (e.g., chitosan and starch derivatives) are renewable and eco-friendly but often require higher dosages [37].

Figure 1. Conceptual classification of flocculants by origin and typical applications [37].

2.1. Inorganic Coagulants

Inorganic coagulants—primarily aluminum and iron salts, such as aluminum sulfate (alum), ferric chloride, and polyaluminum chloride (PAC)—have long dominated full-scale water treatment due to their low cost and rapid charge neutralization of colloidal suspensions [67,68].

Commercial salts are dissolved in water. Careful pH adjustment (usually to 4–6) promotes hydrolysis to polymeric species (e.g., Al₁₃ clusters in PAC) that enhance sweep coagulation and adsorption [69,70].

Reaction time, temperature (20–40 °C), and initial metal concentration dictate the distribution of oligomeric species, which, in turn, influence floc size and density [71]. Despite effectiveness, inorganic coagulants often generate large sludge volumes with high metal content, necessitating disposal or recovery processes that increase operational costs and environmental burdens [72,73]. Moreover, residual metal ions may remain in treated effluent, raising concerns over secondary contamination [74].

2.2. Organic Synthetic Flocculants

Synthetic organic flocculants—mainly high-molecular-weight polyacrylamides (PAMs) and their derivatives—operate via polymer bridging, where long chains adsorb onto multiple particle surfaces to form larger aggregates [75,76]. Preparation typically follows free-radical polymerization. Acrylamide is often copolymerized with charged monomers or functional comonomers to tailor charge density and hydrophobicity [77].

In addition to cationic and anionic organic flocculants, non-ionic organic flocculants, such as polyacrylamide (PAM) and polyvinyl alcohol (PVA), are widely used due to their high molecular weight and ability to promote bridging between particles without altering solution charge. These non-ionic types are especially effective in the treatment of suspensions with low ionic strength or in processes where charge alteration is undesirable.

Redox systems initiate chain growth at 0–30 °C. Reaction time controls polymer chain length (molecular weight up to 10⁷ Da) [78,79]. Radical recombination or chain transfer to water terminates growth. Ultrafiltration or precipitation isolates the polymer, which is then dried and milled for distribution.

Challenges include residual monomer toxicity, limited biodegradability, and high energy consumption during synthesis and drying [80]. Additionally, precise control over molecular weight distribution and charge placement is difficult to achieve, often requiring laborious trial-and-error optimization [81,82].

2.3. Bioflocculants

Bioflocculants—derived from natural polymers such as polysaccharides, proteins, or microbial exopolysaccharides—offer renewable, biodegradable alternatives with lower toxicity [11,83].

Chitosan is obtained via deacetylation of chitin under alkaline conditions (e.g., 50% NaOH at 80–100 °C for 4–6 h), followed by acid dissolution and neutralization [84]. Starch derivatives require acid or enzymatic hydrolysis and grafting of functional groups [85]. Specific microbes, such as Bacillus subtilis and Paenibacillus sp., produce high-molecular-weight exopolysaccharides. Fermentation parameters (carbon source, pH, and temperature) critically influence yield and flocculation activity [86,87]. Carboxymethylation, quaternization, or graft copolymerization can introduce cationic sites to improve charge neutralization but require tight control of reaction stoichiometry and degree of substitution [88].

However, greener bioflocculant production faces scalability challenges. Biomass variability leads to inconsistent polymer composition, while downstream purification (e.g., solvent extraction, dialysis) is costly [88]. Moreover, limited mechanistic understanding of structure–activity relationships restricts the rational design of high-performance bioflocculants.

2.4. Limitations of Conventional Optimization

Traditional optimization of flocculant synthesis and application generally employs one-factor-at-a-time (OFAT) experiments and response surface methodology (RSM). While RSM (e.g., Box–Behnken design) reduces experimental burden, it struggles with high-dimensional parameter spaces and nonlinear interactions [89]. Key limitations include the following: Each new formulation or process condition demands separate experimental runs, making comprehensive exploration of synthesis parameters (monomer ratios, initiator dose, and temperature) impractical [90]. Models derived from small-scale reactors often fail to predict pilot or full-scale behavior due to changes in mixing regimes and mass transfer. Empirical models may fit data well locally but provide limited understanding of underlying molecular or colloidal phenomena, impeding extrapolation to novel systems [91]. These constraints motivate the adoption of ML approaches capable of handling large, complex datasets and capturing nonlinear effects to accelerate flocculant development.

3. Machine Learning for Molecular Design, Process Simulation, and Performance Prediction of Flocculants

Machine learning (ML) has rapidly become a powerful tool in environmental science, enabling data-driven modeling, prediction, and optimization in complex water-treatment processes [92]. In water quality and treatment applications, ML models can capture nonlinear relationships and learn from diverse sensor and laboratory data, supporting tasks such as classification of water quality and regression prediction of pollutants [93]. The core types of ML include supervised learning (training on labeled input–output data), unsupervised learning (discovering patterns or clusters in unlabeled data), and reinforcement learning (learning through feedback in dynamic systems) [94,95,96]. Supervised methods such as random forests (RFs) and support vector machines (SVMs) are widely used to predict treatment outcomes (e.g., turbidity removal or contaminant concentrations) from water quality indicators. For instance, RFs and SVMs have been applied to forecast effluent quality and coagulant dosages in treatment plants [97,98,99,100]. SVMs, in particular, are valued for handling small, high-dimensional datasets and learning complex nonlinear boundaries [99,101]. In contrast, unsupervised methods, such as k-means clustering and self-organizing maps, are used to detect hidden patterns in water-quality data without predefined outputs. For example, clustering of raw water samples can group similar pollution profiles, aiding the selection of optimal flocculant treatment strategies. Figure 2 presents several traditional machine learning models [102].

Figure 2. Schematic diagrams of (A) support vector machine (SVM), (B) decision tree (DT), and (C) artificial neural network (ANN) [102].

3.1. Data Processing

ML is able to guide machines in efficiently processing data. At times, the information extracted from the data is difficult for us to comprehend. This is where machine learning plays a role. With the increasing richness of available datasets, the demand for machine learning continues to rise. Currently, many enterprises place great importance on the role of machine learning in data processing. The purpose of machine learning is to learn from the data [103,104].

Data preprocessing is critical in applying ML to flocculation problems. Raw water treatment data are often noisy, incomplete, and heterogeneous, so steps such as outlier filtering, normalization, and feature engineering are essential. For example, sensor readings (pH, turbidity, and conductivity) may be normalized or denoised, and derived features (e.g., floc colorimetric indices from images) can be extracted. Techniques like principal component analysis (PCA) can reduce dimensionality, preserving key variance while simplifying model inputs. Proper feature selection and engineering have been shown to improve ML performance in flocculation contexts [105].

3.2. Modeling

Supervised learning algorithms form the backbone of predictive modeling. Regression models (linear or nonlinear) and ensemble classifiers (RF and gradient boosting) are routinely trained on historical jar-test or plant datasets to forecast treatment metrics like turbidity removal or effluent quality. For example, artificial neural networks (ANNs), RFs, and SVMs have been successfully used to predict flocculation performance and optimize coagulant dosages [97,106]. Random forests, in particular, handle nonlinear interactions robustly and output variable importance metrics, while SVMs offer robust generalization on limited data [107].

After introducing the main supervised learning algorithms applied in flocculant research, their core mathematical formulations are briefly summarized below to clarify their theoretical underpinnings.

Support Vector Machine (SVM)

The SVM model seeks to determine the optimal separating hyperplane for classification tasks:

f (x) = s i g n (\sum_{i = 1}^{n} α_{i} β_{i} K (x_{i}, x) + b)

where

K (x_{i}, x)

denotes the kernel function,

α_{i}

are the learned weights, and b is the bias term.

Random Forest (RF)

Random forest constructs an ensemble of decision trees, and the final output is typically obtained by averaging the predictions from all trees:

\hat{y} = \frac{1}{T} \sum_{t = 1}^{T} h_{t} (x)

where

h_{t} (x)

represents the prediction of the t-th tree, and T is the total number of trees in the forest.

Artificial Neural Network (ANN)

A basic ANN computes the output as a nonlinear transformation of weighted input features:

y = f (\sum_{i = 1}^{n} ω_{i} x_{i} + b)

where

x_{i}

are the input features,

ω_{i}

are the weights, b is the bias, and f is an activation function (such as sigmoid or ReLU).

Support Vector Regression (SVR)

SVR extends the SVM framework to regression problems, aiming to fit a function within a specified margin of tolerance:

f (x) = \sum_{i = 1}^{n} (α_{i} - α_{i}^{*}) K (x_{i}, x) + b

where

α_{i}

and

α_{i}^{*}

are Lagrange multipliers, and

K (x_{i}, x)

is the kernel function.

These mathematical models provide the theoretical basis for various machine learning approaches utilized in flocculant research and process optimization.

Deep learning expands on these by using multilayer neural networks that can automatically learn feature representations [108,109]. Convolutional neural networks (CNNs) excel at analyzing visual data, such as floc images [110]. Pan et al. [111] used a deep CNN to predict post-coagulation turbidity directly from microscope images of flocs generated in jar tests, bypassing chemical inputs. This work demonstrated that image-based deep models can accurately estimate key process outcomes, greatly easing real-time monitoring. Recurrent neural networks (RNNs), including LSTM variants, are well suited to temporal data; for instance, Bankole et al. [112] showed that an LSTM model could accurately predict the evolution of floc size and count during treatment (achieving R² ≈ 0.98–1.00), outperforming simpler time-series methods. In general, deep learning offers powerful pattern-recognition capabilities for both spatial and temporal flocculation data, though it requires careful tuning and sufficient data.

Unsupervised learning is also finding roles in flocculation research. Clustering algorithms can group operating conditions or influent compositions to inform treatment strategies, and dimensionality reduction can reveal the most influential factors in a complex dataset. Self-organizing maps (SOMs) have been used to identify nonlinear relationships between mixing parameters in flocculation processes [97]. In one hybrid framework, SOMs were combined with regression splines to detect crucial interactions among jet mixing speed, coagulant dose, and water characteristics. Such unsupervised analyses help interpret high-dimensional data, guiding the selection of features for supervised models.

In practical terms, the predictive and classification capabilities of ML have been validated on real water-treatment datasets. For example, neural networks and ensemble regressors have been trained on multi-year treatment plant logs to predict final turbidity or required polymer dose based on influent quality [98]. These models often achieve 90–98% accuracy in holdout tests, which is markedly higher than simple correlations. Classification models have also been used to categorize raw water into “high-turbidity” vs. “low-turbidity” regimes, enabling dynamic adjustment of treatment protocols. Overall, ML methods are enabling highly accurate, data-driven forecasting in water treatment that would be infeasible with traditional empirical approaches [113].

4. Machine Learning in Flocculant Synthesis

Machine learning (ML)-enabled approaches have begun to transform flocculant development. These approaches enable data-driven prediction of structure–property relationships, high-throughput screening of candidate formulations, and optimization of synthesis parameters. In this section, we review seminal and recent studies that apply ML techniques to (i) predict flocculation performance from molecular or process descriptors, (ii) guide high-throughput discovery of novel flocculant chemistries, and (iii) optimize key synthesis parameters through surrogate modeling and active learning.

4.1. Structure-Oriented Design

Machine learning leverages structural descriptors to predict the chemical composition of materials in a process known as structure-guided design [114], enabling the customized synthesis of flocculants for improved flocculation performance [29,115,116,117,118]. Lu et al. [117] used various regression algorithms, including gradient boosting regression (GBR), kernel ridge regression (KRR), support vector regression (SVR), Gaussian process regression (GPR), DT regression, and multilayer perceptron regression, to predict stable lead-free HOIPs from 5158 unexplored HOIPs and successfully identified six stable compounds (C₂H₅OInBr₃, C₂H₆NInBr₃, NH₃NH₂InBr₃, C₂H₅OSnBr₃, NH₄InBr₃, and C₂H₆NSnBr₃).

Recent work at Lawrence Livermore National Lab demonstrated a novel ML model that predicts multiple polymer properties nearly instantly from an encoded representation of the polymer’s repeat units [119]. By explicitly incorporating polymer periodicity into a graph-based model, the authors achieved State-of-the-Art accuracy for ten properties. In practice, such models could enable flocculant chemists to screen candidate polymer backbones or monomer ratios for desirable viscosity or charge density before synthesis. Likewise, ML models have been developed to predict copolymerization behavior. For example, a graph-attention network was trained to estimate comonomer reactivity ratios from molecular fingerprints [120]. This approach allows rapid prediction of how changing monomer chemistry will affect copolymer composition and, by extension, flocculant performance. These data-driven design tools thus shorten the feedback loop between molecular conception and experimental testing.

Furthermore, regarding bioflocculants, particularly microbial flocculants, the research conducted by Dalal et al. [121] provides a machine learning-based optimization scheme for the genetic engineering of microbial flocculants. They employed machine learning (ML) to inform flocculant design. A clickable polymer library with varied length, composition, pKa, and hydrophobicity was analyzed using SHAP, revealing key flocculant parameters of lower pKa enhanced pDNA delivery, while polymer length improved RNP performance. Bayesian optimization of 552 formulations achieved a 1.7-fold performance boost. This ML-driven structure-guided approach offers a scalable framework for tailoring flocculant properties to enhance wastewater treatment efficiency. Figure 3 presents the output results of SHAP.

Figure 3. The SHAP values for physicochemical features related to expression and cell viability when delivering (A) pDNA or (B) RNP. Higher SHAP values correlate with higher impact on the output variable. The feature value color bar corresponds to the normalized value of the feature of interest (where low = blue; moderate = white; and high = red). Each dot represents a polymer formulation. (A) An overlay spider plot showing the average impact of individual polymer variables on expression and viability when delivering (A) pDNA and (B) RNP. The spider web plot is constructed by taking the mean SHAP value for a given feature across all samples and normalizing to the maximum SHAP value for each output variable. (C) SHAP dependency plot values across two variables relating to expression [121].

4.2. Microstructure Image Data Representation

The microstructure of flocculant materials and flocs can be quantified by computer vision techniques [122]. Convolutional neural networks (CNNs) and vision transformers can extract quantitative descriptors from SEM/TEM images of polymer flocs or coagulant aggregates [123,124,125]. For instance, Baum et al. [126] used a CNN to classify flocculation process states based on microscope images. With global pooling, the CNN outperformed classical texture features, indicating it learned meaningful image descriptors automatically. Similarly, Yamamura et al. [127] captured video frames of floc formation during jar tests and trained a CNN to predict the resulting turbidity. The network learned “specific image characteristics” of flocs and achieved near-perfect accuracy in training (100%) and very high accuracy (96–99%) on test images. This confirms that CNNs can encode morphology (floc size, shape, and density) into features predictive of settleability. Al-Ani et al. [128] tackled the challenge of monitoring bacterial dynamics affecting flocculation in wastewater treatment by developing a deep learning framework for real-time analysis of floc-forming and filamentous bacteria in activated sludge. Using a rule-based segmentation algorithm and a deep learning model trained on 68 microscopic images, the study achieved 97.8% accuracy in classifying bacteria critical to floc stability. This ML-driven approach enhances flocculant application by enabling precise, automated monitoring, addressing process adaptability gaps, and supporting efficient, sustainable wastewater treatment. Figure 4 displays the original microscopic images, the ground truth, and the images generated by the deep learning model, illustrating that the deep learning simulation largely aligns with the results of the segmentation algorithm.

Figure 4. Validation results: raw microscopic images (A1–A3), ground truth (B1–B3), and deep learning model output (C1–C3). Mask colors indicate floc-forming bacteria (red), filamentous bacteria (blue), and background (white), with a yellow scale bar of 100 μm [128].

4.3. Reaction Condition Optimization

ML also accelerates the optimization of synthesis and polymerization parameters. Active learning and Bayesian optimization (BO) are used to intelligently select reaction experiments for target properties. For example, Zhao et al. [129] applied an active learning loop with a Gaussian process surrogate to optimize the aqueous electrochemical ATRP of a poly (ethylene glycol) acrylate (seATRP). Their BO algorithm treated the reaction condition space (applied voltage, monomer/initiator ratio, and concentrations) as a database and sequentially proposed new experiments. Starting from biased data, the method quickly converged to conditions yielding high monomer conversion and low dispersity (Đ ≈ 1.2) significantly faster than human-led trials. This demonstrates how ML can minimize experiments. The model learns from each run and suggests the next optimal conditions (temperature, catalyst loadings, time, etc.) to approach a desired molecular weight or yield.

In polymer synthesis, additional approaches include multi-objective optimization and reinforcement learning [130]. For instance, multi-fidelity models may combine coarse predictors with ML to guide fine-tuning of feed ratios. In situ sensors and ML can monitor reaction progress, updating predictions of molecular weight growth. The general idea is to build a surrogate (often a neural net or decision-tree ensemble) that maps reaction parameters to outcomes, and then, techniques like D-Optimal design or uncertainty sampling pick experiments that improve the model. The result is an automated experimental design for polymerization. The chemical space of conditions is explored intelligently rather than by exhaustive grid search. This approach has been employed for step-growth polymerizations, ring-opening polymerizations, and copolymer formulations, where models learn how reaction temperature, time, or feed ratio affect polymer chain length or dispersity. Overall, ML-driven reaction planning—through BO, active learning, or even simple regression—enables rapid tuning of synthesis parameters to reach target flocculant properties (high molecular weight and narrow polydispersity) with fewer trials than traditional methods.

5. Machine Learning for Flocculant Application Optimization

Machine learning (ML) transforms flocculant application in wastewater treatment by enabling data-driven optimization across selection, monitoring, and dynamic process control. Table 2 presents several cases of machine learning methods applied to optimize the use of flocculants.

5.1. Flocculant Selection

Selecting the appropriate flocculant for a given wastewater is a multi-parameter task ideally suited to data-driven models. Machine learning strategies often employ classification or regression to map influent characteristics (pH, turbidity, suspended solids, organic content, ionic strength, heavy metal concentrations, etc.) to flocculant performance metrics or a recommended flocculant type. For example, Lu et al. [131] compiled data on chitosan-based flocculants across varying pH, concentration, and metal species and trained a random forest model to predict heavy metal removal efficiency. The RF achieved R² ≈ 0.94, indicating high predictive accuracy from features, including flocculant dose and solution parameters. This model effectively guides which flocculant properties and dosages will best remove specific metals. Figure 5 demonstrates the predictive performance of the RF model.

Figure 5. A scatter plot of the predicted heavy metal removal efficiency and experimental data using the RF model [131].

Similarly, algorithms like support vector machines or gradient boosting can classify wastewater scenarios. For instance, CatBoost (a gradient boosting on decision trees) was applied in a “hybrid” ML framework that included categorical variables for coagulant (and by extension, flocculant) type [87]. CatBoost naturally handles categorical inputs while learning nonlinear interactions among mixing parameters, dose, and effluent quality. In practice, one might use sensor measurements and lab jar-test data to train a model that, given a water sample profile, predicts the most effective flocculant class (organic polymer, inorganic coagulant, etc.) and its dose. UV–Vis spectral sensors have also been combined with ML. Shi et al. [132] recorded full-spectrum raw-water data and found that simple linear models (MLR and PLS) could virtually replicate expert coagulant dosing decisions. In their study, PLS achieved high R² and low error in predicting alum dose from turbidity and dissolved organic carbon measurements, outperforming a neural network. In summary, ML-based flocculant selection systems ingest wastewater parameters and output either continuous performance metrics or categorical choices. Techniques range from decision-tree ensembles (random forests, XGBoost, and CatBoost) to neural nets [133]. The goal is to rapidly screen available flocculants and identify the one likely to achieve target pollutant removal, minimizing costly trial-and-error. Successful cases include models that recommend chitosan derivatives for heavy-metal-laden effluents or that suggest a particular polymeric flocculant blend based on water hardness and organics (via trained regression models). These ML classifiers/regressors thus form a decision-support layer in treatment design.

5.2. Flocculation Process Monitoring and Dosing Prediction

Modern treatment plants increasingly use sensor networks and IoT to monitor process variables in real time. Machine learning can integrate these data streams (turbidity meters, particle counters, pH sensors, and flow rates) to dynamically adjust flocculant dosing. Deep learning-based “soft sensors” have been proposed to predict treatment outcomes or dose requirements on the fly. For example, CNNs analyzing live images of flocs can infer process state. Yamamura et al. [127] applied a jar-test CNN (mentioned above), which essentially functioned as a virtual sensor, instantly predicting turbidity from floc images. In full-scale systems, cameras or microscope probes could similarly feed images into a trained CNN that outputs a turbidity estimate or flocculation index. Likewise, textural metrics from images, such as the floc texture index, can be computed in real time and fed into an ML predictor of effluent clarity. Beyond vision, ML time-series models can use turbidity and flow data to forecast near-term dynamics. For instance, Sharafi et al. [134] developed an LSTM with attention to correlate current sensor readings with historical trends, effectively learning flocculation dynamics for dose prediction. Such a model could predict the turbidity outcome before it occurs, allowing automatic adjustment of the coagulant feed. Reinforcement learning has also been explored. Randive et al. [87] trained a policy (DDPG/SAC) that learned to modulate mixing speed and flocculant dose in a simulated plant, improving efficiency by ~20–25%. Their CatBoost + RL framework achieved 95–97% predictive accuracy on flocculation outcomes, illustrating how ML can perform closed-loop control. Yokoyama et al. [135] developed a deep learning-based flocculation sensor for automated polymer flocculant control. Initially, laboratory tests with sludge samples generated floc images, which were analyzed by convolutional neural networks (CNNs) and MLP Mixer models, achieving over 0.9 R² accuracy in predicting flocculation degree. Subsequently, a low-cost EdgeAI sensor, with a 12.8 FPS inference speed, enabled real-time dose adjustments, stabilizing flocculant performance. This data-driven approach optimizes flocculant application, enhancing automation and efficiency in wastewater treatment. Zhu et al. [136] introduced an innovative flocculation tensor framework integrated with deep learning to enhance water quality monitoring in wastewater treatment. Initially, flocculation images under varying conditions were generated to construct tensor diagrams capturing dynamic floc characteristics. Subsequently, a convolutional neural network (CNN) analyzed these images, achieving a 98% accuracy in classifying pollution levels and significantly reducing monitoring delays. In addition, they proposed a Mod-Dos model to investigate the impact of coagulant dosage on the accuracy of deep learning models. Figure 6 demonstrates the influence of deep learning on turbidity signal predictions, indicating that the tensor-based deep learning model is highly sensitive to predicted turbidity signals.

Figure 6. Deep learning effect on prediction of turbidity signal with (a) Mod-Dos for training accuracy, (b) Mod-Dos for training loss, (c) Mod-pH for training accuracy, and (d) Mod-pH for training loss [136].

In practical terms, one can envision a control system where turbidity, particle size distribution, and conductivity are continuously monitored. An ML model (possibly an ensemble of CNN for imaging and LSTM for time series) then predicts the optimum flocculant dose or pump speed to maintain target effluent quality. Field trials have demonstrated this concept. For example, coagulant dosage in a drinking-water plant was successfully predicted by feeding UV-Vis and pH data into a PLS model, as mentioned above, and by CNN models analyzing sludge flocs. Such integrated sensing and ML prediction systems enable adaptive dosing. They track fluctuations in raw-water quality and adjust chemical feed rates in real time to keep removal efficiency stable.

5.3. Flocculation Dynamics

Flocculation is a dynamic process involving floc formation, growth, and sedimentation. Machine learning can model these time-dependent phenomena using sequence models or physics-informed approaches. Recurrent neural networks, particularly LSTMs, have been applied to forecast floc formation over time.

Moreover, physics-informed ML is emerging for sedimentation modeling. Physics-Informed Neural Networks (PINNs) have been applied to sedimentation flows, successfully recovering the dimensionless settling velocity of particulate flows by embedding Navier–Stokes and transport equations into the loss function. This approach could be adapted to floc settling. By constraining a neural model with conservation laws and flocculation kinetics, one can predict settling rates or final turbidity under varied conditions. As a powerful extension of physics-informed ML, PINNs constrained by conservation laws and flocculation kinetics provide interpretable and accurate predictions of settling rates and turbidity—critical for adaptive coagulant dosing and regulatory compliance in real-time flocculation systems. In aquatic science, integrating first-principle coagulation models with ML (hybrid modeling) has shown promise for capturing complex floc-growth kinetics.

In summary, dynamic ML models (LSTM, RNN, and PINN) can simulate the time evolution of flocs. They can predict how floc size and structure evolve during mixing and settling, enabling forecasts of when turbidity will drop or how quickly flocs will clarify. These models benefit from both data and physical insight, and they facilitate predictive control. For example, an LSTM trained on sensor logs could alert operators to add coagulant early if a heavy-load storm event is imminent. In combination with real-time monitoring, dynamic ML models complete the toolkit for smart flocculation management, bridging the gap from static dose optimization to predictive process control.

6. Challenges and Prospects

The integration of machine learning (ML) into flocculant research promises transformative advancements in wastewater treatment but faces significant challenges in economic feasibility, data integration, and model generalization.

6.1. Economic Cost of ML in Flocculant Research

The deployment of ML in flocculant research often entails a significant upfront investment. High-speed cameras, real-time sensors, and computational infrastructure are typically required to acquire and process the large, multimodal datasets on which ML models are trained. For example, advanced image-based floc analysis demands non-intrusive imaging systems and data acquisition hardware that can drive up initial equipment costs [87,102]. In parallel, building and tuning ML models incurs labor and computational expense (GPU time, software licensing, etc.) that are not trivial. Such costs can be a barrier for many water treatment facilities or research groups with limited budgets. In fact, conventional flocculation optimization has traditionally relied on trial-and-error jar tests, which themselves consume large amounts of operator time, chemicals, and energy. Machine learning-based monitoring could replace some of these costly manual procedures; for instance, low-cost sensor data (e.g., turbidity, flow) coupled with ML can predict chemical dosage needs without extensive testing [137]. In one study, ML models trained on inexpensive monitoring data accurately predicted free chlorine residuals in drinking water treatment, suggesting that ML can substitute for more expensive tests.

On the other hand, ML can reduce operational costs by optimizing chemical and energy usage. Data-driven models, by contrast, can learn complex dependencies and thus reduce excess chemical dosing. In fact, the cited work notes that the inefficiencies of conventional empirical models led to higher operational costs, whereas the proposed ML framework “brings a solution to these challenges: innovative techniques for prediction and optimization” that would improve efficiency and sustainability. Similarly, broader economic analyses show that economies of scale and centralization can lower unit treatment costs [138]. It follows that an ML-enabled central water treatment system could achieve even greater savings than a distributed, manually run plant. Overall, while the economic cost of adopting ML (sensors, computing, and expertise) is non-negligible, these initial investments can be offset by longer-term savings in chemical inputs, energy, and labor. Critically, cost–benefit assessments must account for both capital and operational expenditures. In summary, the economic cost challenge involves balancing the high upfront data-infrastructure expenses against the potential for reduced operational costs.

6.2. Data Integration

Effective ML in flocculant research requires large, well-curated datasets from diverse sources. In practice, however, data integration in water treatment is fragmented and difficult. Experimental flocculation studies generate heterogeneous data (sensor readings, chemical analyses, and high-resolution images of particle agglomerates), but these data are often siloed or recorded in incompatible formats. Integrating lab-scale and field-scale data, for example, requires careful alignment of time stamps, units, and metadata, which is not a trivial undertaking. Furthermore, water treatment plants generate continuous telemetry (flow rates, pH, temperature, and turbidity) that are often stored separately from lab results. Combining these spatially and temporally disparate datasets into a single ML-ready database is a significant challenge. ML models are “restricted by the amount and quality of training data” [139]. In floc research, this is especially acute. Lab studies typically produce only dozens of experimental points, while real-world plants have complex dynamics.

Advancing ML will, therefore, require robust data integration strategies. For instance, one recent study demonstrated that combining climate data with operational water quality records significantly improved ML prediction accuracy [140]. Similarly, data assimilation techniques from hydrology may mitigate uncertainty by fusing model outputs with sensor data [141]. However, developing such integrated pipelines is not trivial. It demands standardization of protocols, open data formats, and often real-time data feeds from IoT sensors. Many current datasets are proprietary or too small to train general models. Moreover, inconsistencies in sensor calibration and data gaps (due to faulty equipment or maintenance) can introduce biases. As Martin and White note, “uncertain data generate risks for AI–ML because they increase overfitting and limit generalization ability”. In practice, creating a unified dataset for flocculation modeling may involve pooling data across multiple labs and treatment plants, which raises issues of confidentiality and interoperability.

Data integration obstacles lie in the complexity of merging multimodal, multi-scale data. Without substantial effort in building common data platforms and conducting rigorous data cleaning, ML models will suffer from incomplete or inconsistent inputs. Improved sensor networks, standardized data schemas, and collaborative data-sharing frameworks are needed. Progress in related fields suggests that federated learning or digital twin architectures could eventually enable near-real-time data fusion. For now, however, data scarcity and fragmentation remain a key barrier to deploying ML-based flocculant control in practice.

6.3. Modeling Generalization

Even with ample data, ML models must generalize well to unseen conditions, which is a major challenge in water treatment [142]. Flocculation and coagulation processes are highly nonlinear and context-dependent (affected by water chemistry, temperature, equipment geometry, etc.). Models trained on one dataset often fail to perform on another. In practice, a model optimized for one plant or water source may generate erroneous predictions elsewhere due to shifts in input distributions (pH, organic content, etc.). Overfitting is a common manifestation of poor generalization. When the training data are limited or noisy, ML models may memorize spurious correlations. Martin and White [139] emphasize that overfit models can produce “specious confidence” when deployed, leading to misguided decisions. In flocculant research, where lab experiments may involve as few as 10–20 runs, the overfitting risk is high. The problem is exacerbated by measurement noise in water quality data. Insufficient or error-prone data severely constrain predictive reliability.

Another aspect of the generalization challenge is the lack of transferability across systems. A model trained on one coagulant type (e.g., alum) may not work well for a different flocculant (e.g., a plant-based polymer) because the underlying physics differ. Incorporating domain knowledge (physics-informed ML) can help, but this hybrid approach is still nascent in practice. Furthermore, real-time dynamics (such as changing influent characteristics) mean that a static trained model may quickly become outdated. Continuous learning or online adaptation is rarely implemented in current flocculation models.

In summary, model generalization obstacles arise from overfitting, data noise, and domain shifts. Sophisticated techniques can mitigate these issues—for example, ensemble learning or neural tangent kernels can improve robustness on small datasets—but they add complexity. Without larger, more representative training sets and thorough cross-validation, ML predictions remain uncertain. Future research will need to emphasize model validation on independent datasets and the development of general-purpose models capable of adapting to new water conditions. Until then, practitioners must treat ML predictions cautiously and combine them with first-principle understanding.

Beyond flocculant research, machine learning is rapidly expanding into related domains such as membrane fouling prediction, advanced oxidation process optimization, resource recovery from wastewater, and smart monitoring of decentralized water systems. These advances point toward a future where AI-driven decision-making underpins the entire water treatment value chain.

7. Conclusions

Machine learning is accelerating flocculant research by enabling data-driven design, predictive control, and optimization throughout the water treatment process. By integrating ML with traditional knowledge and advanced sensing, new pathways for intelligent, adaptive, and sustainable water treatment systems are emerging. Future work should focus on expanding open datasets, hybrid modeling, and addressing practical challenges in deployment. Ultimately, ML-powered solutions are expected to contribute significantly to achieving global water sustainability goals.

Author Contributions

Writing—original draft preparation, C.D.; writing—review, L.S., Q.L. and L.L.; review and editing, L.S., and L.L.; funding acquisition, L.S., and L.L.; supervision, L.S., and L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Hubei Provincial Natural Science Foundation (Grant No. 2024AFC066), the China University Industry University Research Innovation Fund of the Ministry of Education (Grant No. 2023YC075), the Natural Science Foundation of Heilongjiang Province (LH2023E125), the Science and Technology Research Projects of Hubei Provincial Department of Education (Grant No. Q20162706), and the Xiaogan City Natural Science Program Project (Grant No. XGKJ2023010060).

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, L.; Xu, H.; Zhang, Q.; Zhan, Z.; Liang, X.; Xing, J. Estimation methods of wetland carbon sink and factors influencing wetland carbon cycle: A review. Carbon Res. 2024, 3, 50. [Google Scholar] [CrossRef]
Li, L.; Liang, T.; Zhao, M.; Lv, Y.; Song, Z.; Sheng, T.; Ma, F. A review on mycelial pellets as biological carriers: Wastewater treatment and recovery for resource and energy. Bioresour. Technol. 2022, 355, 127200. [Google Scholar] [CrossRef]
Li, L.; Liu, S.; Ke, X.; Dong, Z.; Huang, L. Review on Anaerobic Ammonium Oxidation Process for Treating Coking Wastewater from Coal Chemical Industry. J. Min. Sci. Technol. 2025, 10, 351–362. [Google Scholar]
Li, Y.; He, L.; Pan, Y.; Chen, K.; Zhuo, T.; Yu, K.; Zhang, J.; Zhou, S.; Lei, X.; Chai, B. Elevated hydrostatic pressure enhances the potential for microbially mediated carbon sequestration at the sediment–water interface in a deep-water reservoir by modulating functional genes and metabolic pathways. Carbon Res. 2024, 3, 19. [Google Scholar] [CrossRef]
Lv, L.; Wang, X.; Zhang, D.; Liu, X.; Liang, J.; Liu, X.; Gao, W.; Sun, L.; Ren, Z.; Zhang, G. Strategies and applications of enhancing extracellular electron transfer in anaerobic digestion for wastewater resource recovery: A critical review. Environ. Funct. Mater. 2025, in press. [Google Scholar] [CrossRef]
Gregory, J. Flocculation fundamentals. In Encyclopedia of Colloid and Interface Science; Springer: Berlin/Heidelberg, Germany, 2013; pp. 459–491. [Google Scholar] [CrossRef]
Yang, Y.; Jiang, C.; Wang, X.; Fan, L.; Xie, Y.; Wang, D.; Yang, T.; Peng, J.; Zhang, X.; Zhuang, X. Unraveling the potential of microbial flocculants: Preparation, performance, and applications in wastewater treatment. Water 2024, 16, 1995. [Google Scholar] [CrossRef]
Li, L.; Han, J.; Huang, L.; Liu, L.; Qiu, S.; Ding, J.; Liu, X.; Zhang, J. Activation of PMS by MIL-53 (Fe)@ AC composites contributes to tetracycline degradation: Properties and mechanisms. Surf. Interfaces 2024, 51, 104521. [Google Scholar] [CrossRef]
Zhai, J.; Mao, H.; He, B.; Jia, T.; Zhou, S.; Chen, R.; Zhao, Y. A review of recent development in the enhancement mechanism of catalytic membranes for wastewater treatment. Environ. Funct. Mater. 2025, in press. [Google Scholar] [CrossRef]
Zahoor, A.; Liu, X.; Liu, Y.; Liu, S.; Yi, W.; Sajnani, S.; Tai, L.; Tahir, N.; Abdoulaye, B.; Mahaveer; et al. Agricultural lignocellulose biochar material in wastewater treatment: A critical review and sustainability assessment. Environ. Funct. Mater. 2025, in press. [Google Scholar] [CrossRef]
Li, S.; Xie, J.; Gu, J.; Zhou, M. Hybrid peroxi-coagulation/ozonation process for highly efficient removal of organic contaminants. Chin. Chem. Lett. 2023, 34, 108204. [Google Scholar] [CrossRef]
Yang, J.; Zhang, T.; Ma, S.; Shang, J.; Li, L.; Ning, Y.; Zhao, X. Enhancing microplastic removal and nitrogen mitigation in constructed wetlands: An earthworm-centric perspective. J. Hazard. Mater. 2025, 489, 137540. [Google Scholar] [CrossRef]
Zhao, X.; Meng, X.; Li, Q.; Ho, S.-H. Nitrogen metabolic responses of non-rhizosphere and rhizosphere microbial communities in constructed wetlands under nanoplastics disturbance. J. Hazard. Mater. 2025, 484, 136777. [Google Scholar] [CrossRef]
Pradhan, S.; Parthasarathy, P.; Mackey, H.R.; Al-Ansari, T.; McKay, G. Food waste biochar: A sustainable solution for agriculture application and soil–water remediation. Carbon Res. 2024, 3, 41. [Google Scholar] [CrossRef]
Wang, G.; Wang, L.; Ma, F.; Yang, D.; You, Y. Earthworm and arbuscular mycorrhiza interactions: Strategies to motivate antioxidant responses and improve soil functionality. Environ. Pollut. 2021, 272, 115980. [Google Scholar] [CrossRef] [PubMed]
You, Y.; Ju, C.; Wang, L.; Wang, X.; Ma, F.; Wang, G.; Wang, Y. The mechanism of arbuscular mycorrhizal enhancing cadmium uptake in Phragmites australis depends on the phosphorus concentration. J. Hazard. Mater. 2022, 440, 129800. [Google Scholar] [CrossRef] [PubMed]
Wang, L.; Yang, D.; Chen, R.; Ma, F.; Wang, G. How a functional soil animal-earthworm affect arbuscular mycorrhizae-assisted phytoremediation in metals contaminated soil? J. Hazard. Mater. 2022, 435, 128991. [Google Scholar] [CrossRef]
Yang, D.; Wang, L.; Bai, S. Enhancement of alfalfa growth resistance by arbuscular mycorrhiza and earthworm in molybdenum-contaminated soils: From the perspective of soil nutrient turnover. Environ. Res. 2025, 267, 120714. [Google Scholar] [CrossRef]
Zhang, C.; Zhou, M.; Du, H.; Li, D.; Lv, D.; Hou, N. Influence of microbial agents-loaded biochar on bacterial community assembly and heavy metals morphology in sewage sludge compost: Insights from community stability and complexity. Bioresour. Technol. 2025, 419, 132070. [Google Scholar] [CrossRef]
Zhang, S.; Yi, X.; He, D.; Tang, X.; Chen, Y.; Zheng, H. Recent progress and perspectives of typical renewable bio-based flocculants: Characteristics and application in wastewater treatment. Environ. Sci. Pollut. Res. 2024, 31, 46877–46897. [Google Scholar] [CrossRef]
Bolto, B.; Gregory, J. Organic polyelectrolytes in water treatment. Water Res. 2007, 41, 2301–2324. [Google Scholar] [CrossRef]
Ahmad, M.; Ahmed, S.; Swami, B.L.; Ikram, S. Adsorption of heavy metal ions: Role of chitosan and cellulose for water treatment. Langmuir 2015, 79, 109–155. [Google Scholar] [CrossRef]
Ghiringhelli, L.M.; Vybiral, J.; Levchenko, S.V.; Draxl, C.; Scheffler, M. Big data of materials science: Critical role of the descriptor. Phys. Rev. Lett. 2015, 114, 105503. [Google Scholar] [CrossRef]
Schleder, G.R.; Padilha, A.C.; Acosta, C.M.; Costa, M.; Fazzio, A. From DFT to machine learning: Recent approaches to materials science—A review. J. Phys. Mater. 2019, 2, 032001. [Google Scholar] [CrossRef]
Tian, Y.; Li, X.; Ma, H.; Zhang, X.; Tan, K.C.; Jin, Y. Deep reinforcement learning based adaptive operator selection for evolutionary multi-objective optimization. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 7, 1051–1064. [Google Scholar] [CrossRef]
Wang, W.; Zhang, K.; Lü, Z.; Gu, Z.; Qian, H.; Zhang, Q. Machine vision detection of foreign objects in coal using deep learning. J. Min. Sci. Technol. 2021, 6, 115–123. [Google Scholar] [CrossRef]
Wang, Z.; Song, Z.; Zhang, J.; Chi, M. The temporal-spatialotemporal differentiation characteristics and self-repairing law patterns of soil nutrients in a mining area in western China. J. Min. Sci. Technol. 2024, 9, 631–640. [Google Scholar]
Talukder, M.J.; Alshami, A.S.; Tayyebi, A.; Ismail, N.; Yu, X. Membrane science meets machine learning: Future and potential use in assisting membrane material design and fabrication. Sep. Purif. Rev. 2024, 53, 216–229. [Google Scholar] [CrossRef]
Butler, K.T.; Davies, D.W.; Cartwright, H.; Isayev, O.; Walsh, A. Machine learning for molecular and materials science. Nature 2018, 559, 547–555. [Google Scholar] [CrossRef]
Sun, A.Y.; Scanlon, B.R. How can Big Data and machine learning benefit environment and water management: A survey of methods, applications, and future directions. Environ. Res. Lett. 2019, 14, 073001. [Google Scholar] [CrossRef]
Kim, J.; Hua, C.; Kim, K.; Lin, S.; Oh, G.; Park, M.-H.; Kang, S. Optimizing coagulant dosage using deep learning models with large-scale data. Chemosphere 2024, 350, 140989. [Google Scholar] [CrossRef] [PubMed]
Liu, C.; Balasubramanian, P.; Nguyen, X.C.; An, J.; Praneeth, S.; Zhang, P.; Huang, H. Enhanced machine learning prediction of biochar adsorption for dyes: Parameter optimization and experimental validation. Carbon Res. 2025, 4, 46. [Google Scholar] [CrossRef]
Zhu, L.-T.; Chen, X.-Z.; Ouyang, B.; Yan, W.-C.; Lei, H.; Chen, Z.; Luo, Z.-H. Review of machine learning for hydrodynamics, transport, and reactions in multiphase flows and reactors. Ind. Eng. Chem. Res. 2022, 61, 9901–9949. [Google Scholar] [CrossRef]
Wang, Z.; Wu, M.; Liang, X.; Huang, N.; Li, X. A novel vortex flocculation reactor for efficient water treatment: Kinetic modeling and experimental verification. Chem. Eng. Process.-Process Intensif. 2023, 183, 109245. [Google Scholar] [CrossRef]
Lu, X.; Su, P. Design and application of metal-organic frameworks derivatives as 3-electron ORR electrocatalysts for •OH generation in wastewater treatment: A review. Chin. Chem. Lett. 2025, 110909, in press. [Google Scholar] [CrossRef]
Kazemi-Khasragh, E.; Blázquez, J.P.F.; Gómez, D.G.; González, C.; Haranczyk, M. Facilitating polymer property prediction with machine learning and group interaction modelling methods. Int. J. Solids Struct. 2024, 286, 112547. [Google Scholar] [CrossRef]
Salehizadeh, H.; Yan, N.; Farnood, R. Recent advances in polysaccharide bio-based flocculants. Biotechnol. Adv. 2018, 36, 92–119. [Google Scholar] [CrossRef]
Rodrigues, A.C.; Boroski, M.; Shimada, N.S.; Garcia, J.C.; Nozaki, J.; Hioka, N. Treatment of paper pulp and paper mill wastewater by coagulation–flocculation followed by heterogeneous photocatalysis. J. Photochem. Photobiol. A Chem. 2008, 194, 1–10. [Google Scholar] [CrossRef]
Renault, F.; Sancey, B.; Badot, P.-M.; Crini, G. Chitosan for coagulation/flocculation processes—An eco-friendly approach. Eur. Polym. J. 2009, 45, 1337–1348. [Google Scholar] [CrossRef]
Renault, F.; Sancey, B.; Charles, J.; Morin-Crini, N.; Badot, P.-M.; Winterton, P.; Crini, G. Chitosan flocculation of cardboard-mill secondary biological wastewater. Chem. Eng. J. 2009, 155, 775–783. [Google Scholar] [CrossRef]
Özacar, M.; Şengil, İ.A. Evaluation of tannin biopolymer as a coagulant aid for coagulation of colloidal particles. Colloids Surf. A Physicochem. Eng. Asp. 2003, 229, 85–96. [Google Scholar] [CrossRef]
Roussy, J.; Chastellan, P.; Van Vooren, M.; Guibal, E. Treatment of ink-containing wastewater by coagulation/flocculation using biopolymers. Water SA 2005, 31, 369–376. [Google Scholar] [CrossRef]
Heredia, J.B.; Martín, J.S. Removing heavy metals from polluted surface water with a tannin-based flocculant agent. J. Hazard. Mater. 2009, 165, 1215–1218. [Google Scholar] [CrossRef]
Beltrán-Heredia, J.; Sánchez-Martín, J. Municipal wastewater treatment by modified tannin flocculant agent. Desalination 2009, 249, 353–358. [Google Scholar] [CrossRef]
Mishra, A.; Agarwal, M.; Bajpai, M.; Rajani, S.; Mishra, R. Plantago psyllium mucilage for sewage and tannery effluent treatment. Iran. Polym. J. 2002, 11, 381–386. [Google Scholar]
Mishra, A.; Yadav, A.; Agarwal, M.; Bajpai, M. Fenugreek mucilage for solid removal from tannery effluent. React. Funct. Polym. 2004, 59, 99–104. [Google Scholar] [CrossRef]
Mishra, A.; Bajpai, M. The flocculation performance of Tamarindus mucilage in relation to removal of vat and direct dyes. Bioresour. Technol. 2006, 97, 1055–1059. [Google Scholar] [CrossRef] [PubMed]
Anastasakis, K.; Kalderis, D.; Diamadopoulos, E. Flocculation behavior of mallow and okra mucilage in treating wastewater. Desalination 2009, 249, 786–791. [Google Scholar] [CrossRef]
Al-Hamadani, Y.A.; Yusoff, M.S.; Umar, M.; Bashir, M.J.; Adlan, M.N. Application of psyllium husk as coagulant and coagulant aid in semi-aerobic landfill leachate treatment. J. Hazard. Mater. 2011, 190, 582–587. [Google Scholar] [CrossRef]
Wu, C.; Wang, Y.; Gao, B.; Zhao, Y.; Yue, Q. Coagulation performance and floc characteristics of aluminum sulfate using sodium alginate as coagulant aid for synthetic dying wastewater treatment. Sep. Purif. Technol. 2012, 95, 180–187. [Google Scholar] [CrossRef]
Khiari, R.; Dridi-Dhaouadi, S.; Aguir, C.; Mhenni, M.F. Experimental evaluation of eco-friendly flocculants prepared from date palm rachis. J. Environ. Sci. 2010, 22, 1539–1543. [Google Scholar] [CrossRef]
Suopajärvi, T.; Liimatainen, H.; Hormi, O.; Niinimäki, J. Coagulation–flocculation treatment of municipal wastewater based on anionized nanocelluloses. Chem. Eng. J. 2013, 231, 59–67. [Google Scholar] [CrossRef]
Zhong, J.; Sun, X.; Wang, C. Treatment of oily wastewater produced from refinery processes using flocculation and ceramic membrane filtration. Sep. Purif. Technol. 2003, 32, 93–98. [Google Scholar] [CrossRef]
Sarika, R.; Kalogerakis, N.; Mantzavinos, D. Treatment of olive mill effluents: Part II. Complete removal of solids by direct flocculation with poly-electrolytes. Environ. Int. 2005, 31, 297–304. [Google Scholar] [CrossRef] [PubMed]
Ebeling, J.M.; Rishel, K.L.; Sibrell, P.L. Screening and evaluation of polymers as flocculation aids for the treatment of aquacultural effluents. Aquac. Eng. 2005, 33, 235–249. [Google Scholar] [CrossRef]
Ginos, A.; Manios, T.; Mantzavinos, D. Treatment of olive mill effluents by coagulation–flocculation–hydrogen peroxide oxidation and effect on phytotoxicity. J. Hazard. Mater. 2006, 133, 135–142. [Google Scholar] [CrossRef]
Pang, J.; Luo, W.; Yao, Z.; Chen, J.; Dong, C.; Lin, K. Water quality prediction in urban waterways based on wavelet packet Denoising and LSTM. Water Resour. Manag. 2024, 38, 2399–2420. [Google Scholar] [CrossRef]
Wang, Q.; Wang, Z.; Gao, D.; Gao, Z.; Jia, J.; Zhu, J.; Gao, J. Seismic attribute analysis with a combination of convolutional autoencoder and random forest in a turbidite reservoir. Geophysics 2024, 89, WA207–WA217. [Google Scholar] [CrossRef]
Tang, Q.; Liang, J.; Zhu, F. A comparative review on multi-modal sensors fusion based on deep learning. Signal Process. 2023, 213, 109165. [Google Scholar] [CrossRef]
Ma, S.; Ding, W.; Zheng, Y.; Zhou, L.; Yan, Z.; Xu, J. Edge-cloud collaboration-driven predictive planning based on LSTM-attention for wastewater treatment. Comput. Ind. Eng. 2024, 195, 110425. [Google Scholar] [CrossRef]
Boumezbeur, H.; Laouacheria, F.; Heddam, S.; Djemili, L. Modelling coagulant dosage in drinking water treatment plant using advance machine learning model: Hybrid extreme learning machine optimized by Bat algorithm. Environ. Sci. Pollut. Res. 2023, 30, 72463–72483. [Google Scholar] [CrossRef]
Coppola, S. Optimization of Water Treatment Processes Using Computational Fluid Dynamics. Ph.D. Thesis, University of Salerno, Fisciano, Italy, 2021. [Google Scholar] [CrossRef]
Huang, Y.; Chen, J.; Wang, C. Algorithm for Predicting Flocculation Rate of Particulate Minerals in Water under Different Influencing Factors. J. Coast. Res. 2019, 93, 61–69. [Google Scholar] [CrossRef]
Guo, X.; Meng, M.; Ning, Y.; Chen, C.; Xiao, L. Data-Driven Model Predictive Control Strategy for Coagulant Dosing in Water Treatment Plants. In Proceedings of the 2024 Second International Conference on Cyber-Energy Systems and Intelligent Energy (ICCSIE), Shenyang, China, 17–19 May 2024; pp. 1–6. [Google Scholar] [CrossRef]
Álvarez Díez, A.; Pena Rois, R.; Mocanu, I.; Orzan, C.; Brebenel, C.; Stere, J.; Muíños Landín, S.; Fernández Montenegro, J.M. Reinforcement learning-based DSS for coagulant and disinfectant dosage selection on drinking water treatment plants. Water Supply 2024, 24, 86–102. [Google Scholar] [CrossRef]
Wang, A.-J.; Li, H.; He, Z.; Tao, Y.; Wang, H.; Yang, M.; Savic, D.; Daigger, G.T.; Ren, N. Digital twins for wastewater treatment: A technical review. Engineering 2024, 36, 21–35. [Google Scholar] [CrossRef]
Tang, W.; Li, H.; Fei, L.; Wei, B.; Zhou, T.; Zhang, H. The removal of microplastics from water by coagulation: A comprehensive review. Sci. Total Environ. 2022, 851, 158224. [Google Scholar] [CrossRef]
Tang, X.; Zheng, H.; Teng, H.; Sun, Y.; Guo, J.; Xie, W.; Yang, Q.; Chen, W. Chemical coagulation process for the removal of heavy metals from water: A review. Desalination Water Treat. 2016, 57, 1733–1748. [Google Scholar] [CrossRef]
Exall, K.N. Examination of the Behaviour of Aluminum-Based Coagulants During Organic Matter Removal in Drinking Water Treatment; National Library of Canada = Bibliothèque Nationale du Canada: Ottawa, ON, Canada, 2002. [Google Scholar]
Exley, C. The chemistry of human exposure to aluminum. In Neurotoxicity of Aluminum; Springer: Berlin/Heidelberg, Germany, 2023; pp. 33–37. [Google Scholar] [CrossRef]
Chen, H.; Wang, X.; Liang, H.; Yan, Z.; Ma, Z.; Wang, Z.; Du, C. Preparation, oil removal and flocculation efficiency evaluation of a PAC-P (AM-BA) hybrid polymer flocculant based on response surface method. J. Environ. Chem. Eng. 2024, 12, 114503. [Google Scholar] [CrossRef]
Fu, Q.; Liu, X.; Wu, Y.; Wang, D.; Xu, Q.; Yang, J. The fate and impact of coagulants/flocculants in sludge treatment systems. Environ. Sci. Water Res. Technol. 2021, 7, 1387–1401. [Google Scholar] [CrossRef]
Wei, H.; Gao, B.; Ren, J.; Li, A.; Yang, H. Coagulation/flocculation in dewatering of sludge: A review. Water Res. 2018, 143, 608–631. [Google Scholar] [CrossRef]
Yu, J.; Cheng, Y.; Cai, A.; Huang, X.; Zhang, Q. Synthetic Cu(III) from copper plating wastewater for onsite decomplexation of Cu(II)- and Ni(II)-organic complexes. Chin. Chem. Lett. 2025, 36, 110549. [Google Scholar] [CrossRef]
Zhao, C.; Zhou, J.; Yan, Y.; Yang, L.; Xing, G.; Li, H.; Wu, P.; Wang, M.; Zheng, H. Application of coagulation/flocculation in oily wastewater treatment: A review. Sci. Total Environ. 2021, 765, 142795. [Google Scholar] [CrossRef] [PubMed]
Jiang, X.; Li, Y.; Tang, X.; Jiang, J.; He, Q.; Xiong, Z.; Zheng, H. Biopolymer-based flocculants: A review of recent technologies. Environ. Sci. Pollut. Res. 2021, 28, 46934–46963. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.; Wei, H.; Li, A.; Yang, H. Evaluation of structural effects on the flocculation performance of a co-graft starch-based flocculant. Water Res. 2017, 118, 160–166. [Google Scholar] [CrossRef]
Qiu, J.; Charleux, B.; Matyjaszewski, K. Controlled/living radical polymerization in aqueous media: Homogeneous and heterogeneous systems. Prog. Polym. Sci. 2001, 26, 2083–2134. [Google Scholar] [CrossRef]
Save, M.; Guillaneuf, Y.; Gilbert, R.G. Controlled radical polymerization in aqueous dispersed media. Aust. J. Chem. 2006, 59, 693–711. [Google Scholar] [CrossRef]
Lee, C.S.; Robinson, J.; Chong, M.F. A review on application of flocculants in wastewater treatment. Process Saf. Environ. Prot. 2014, 92, 489–508. [Google Scholar] [CrossRef]
Vajihinejad, V.; Gumfekar, S.P.; Bazoubandi, B.; Rostami Najafabadi, Z.; Soares, J.B. Water soluble polymer flocculants: Synthesis, characterization, and performance assessment. Macromol. Mater. Eng. 2019, 304, 1800526. [Google Scholar] [CrossRef]
Wang, J.-P.; Yuan, S.-J.; Wang, Y.; Yu, H.-Q. Synthesis, characterization and application of a novel starch-based flocculant with high flocculation and dewatering properties. Water Res. 2013, 47, 2643–2648. [Google Scholar] [CrossRef]
Salehizadeh, H.; Yan, N. Recent advances in extracellular biopolymer flocculants. Biotechnol. Adv. 2014, 32, 1506–1522. [Google Scholar] [CrossRef]
Rinaudo, M. Chitin and chitosan: Properties and applications. Prog. Polym. Sci. 2006, 31, 603–632. [Google Scholar] [CrossRef]
Bangar, S.P.; Ashogbon, A.O.; Singh, A.; Chaudhary, V.; Whiteside, W.S. Enzymatic modification of starch: A green approach for starch applications. Carbohydr. Polym. 2022, 287, 119265. [Google Scholar] [CrossRef]
Marimuthu, S.; Rajendran, K. Artificial neural network modeling and statistical optimization of medium components to enhance production of exopolysaccharide by Bacillus sp. EPS003. Prep. Biochem. Biotechnol. 2023, 53, 136–147. [Google Scholar] [CrossRef]
Wang, Z.; Chen, S.; Yang, L.; Wang, Q.; Hou, N.; Zhang, J.; Tong, Y.; Li, X. Remediation strategies of biochar and microbial inoculum for PAHs-contaminated soil: Quorum sensing-mediated PAHs degradation and element cycling. J. Hazard. Mater. 2025, 490, 137854. [Google Scholar] [CrossRef] [PubMed]
Agbovi, H.K. Biopolymer Flocculant Systems and Their Chemically Modified Forms for Aqueous Phosphate and Kaolinite Removal; University of Saskatchewan: Saskatoon, SK, Canada, 2020. [Google Scholar]
Montgomery, D.C. Design and Analysis of Experiments; John Wiley & Sons: Hoboken, NJ, USA, 2017. [Google Scholar]
Bezerra, M.A.; Santelli, R.E.; Oliveira, E.P.; Villar, L.S.; Escaleira, L.A. Response surface methodology (RSM) as a tool for optimization in analytical chemistry. Talanta 2008, 76, 965–977. [Google Scholar] [CrossRef] [PubMed]
Lin, Z.; Li, C.; Zhang, X.; Zhang, H. Study on the characteristics and mechanism of the flocculation behaviour in a novel fluidized bed flocculator. Sep. Purif. Technol. 2023, 307, 122724. [Google Scholar] [CrossRef]
Zhu, M.; Wang, J.; Yang, X.; Zhang, Y.; Zhang, L.; Ren, H.; Wu, B.; Ye, L. A review of the application of machine learning in water quality evaluation. Eco-Environ. Health 2022, 1, 107–116. [Google Scholar] [CrossRef]
Tung, T.M.; Yaseen, Z.M. A survey on river water quality modelling using artificial intelligence models: 2000–2020. J. Hydrol. 2020, 585, 124670. [Google Scholar] [CrossRef]
El Bouchefry, K.; de Souza, R.S. Learning in big data: Introduction to machine learning. In Knowledge Discovery in Big Data from Astronomy and Earth Observation; Elsevier: Amsterdam, The Netherlands, 2020; pp. 225–249. [Google Scholar] [CrossRef]
Han, D.; Liu, Q.; Fan, W. A new image classification method using CNN transfer learning and web data augmentation. Expert Syst. Appl. 2018, 95, 43–56. [Google Scholar] [CrossRef]
Liu, J.; Long, Y.; Zhu, G.; Hursthouse, A.S. Application of Artificial Intelligence in the Management of Coagulation Treatment Engineering System. Processes 2024, 12, 1824. [Google Scholar] [CrossRef]
Randive, P.; Bhagat, M.S.; Bhorkar, M.P.; Bhagat, R.M.; Vinchurkar, S.M.; Shelare, S.; Sharma, S.; Beemkumar, N.; Hemalatha, S.; Kumar, P.; et al. Adaptive optimization of natural coagulants using hybrid machine learning approach for sustainable water treatment. Sci. Rep. 2025, 15, 16096. [Google Scholar] [CrossRef] [PubMed]
Krishnan, A.G.; Krishnamoorthy Lakshmi, P.; Chellappan, S. Artificial neural network modelling approach for the prediction of turbidity removal efficiency of PACl and Moringa Oleifera in water treatment plants. Model. Earth Syst. Environ. 2023, 9, 2893–2903. [Google Scholar] [CrossRef]
Abobakr Yahya, A.S.; Ahmed, A.N.; Binti Othman, F.; Ibrahim, R.K.; Afan, H.A.; El-Shafie, A.; Fai, C.M.; Hossain, M.S.; Ehteram, M.; Elshafie, A. Water Quality Prediction Model Based Support Vector Machine Model for Ungauged River Catchment under Dual Scenarios. Water 2019, 11, 1231. [Google Scholar] [CrossRef]
Lu, Z.; Fan, Y.; Sun, Z.; He, X.; Yang, C.; Yin, H.; Zhang, J.; Song, G.; Zheng, Y.; Bai, Y. A fast composition-stability machine learning model for screening MAX phases and guiding discovery of Ti₂SnN. J. Adv. Ceram. 2025, 14, 9221050. [Google Scholar] [CrossRef]
Hipni, A.; El-shafie, A.; Najah, A.; Karim, O.A.; Hussain, A.; Mukhlisin, M. Daily forecasting of dam water levels: Comparing a support vector machine (SVM) model with adaptive neuro fuzzy inference system (ANFIS). Water Resour. Manag. 2013, 27, 3803–3823. [Google Scholar] [CrossRef]
Nantasenamat, C.; Isarankura-Na-Ayudhya, C.; Prachayasittikul, V. Advances in computational methods to predict the biological activity of compounds. Expert Opin. Drug Discov. 2010, 5, 633–654. [Google Scholar] [CrossRef] [PubMed]
Mahesh, B. Machine learning algorithms—A review. Int. J. Sci. Res. IJSR. 2020, 9, 381–386. [Google Scholar] [CrossRef]
Mahmoodzadeh, A.; Fakhri, D.; Mohammed, A.H.; Mohammed, A.S.; Ibrahim, H.H.; Rashidi, S. Estimating the effective fracture toughness of a variety of materials using several machine learning models. Eng. Fract. Mech. 2023, 286, 109321. [Google Scholar] [CrossRef]
El Naqa, I.; Murphy, M.J. What is machine learning. In Machine Learning in Radiation Oncology: Theory and Applications; Springer: Berlin/Heidelberg, Germany, 2015; pp. 3–11. [Google Scholar] [CrossRef]
Li, Y.; Zhang, Z.; Zhao, Y.; Han, Y.; Ren, L.; Sun, Y. A comparison of micro-flocculation and ozonation as pretreatments for ultrafiltration: Organic removal and membrane fouling. Environ. Sci. Pollut. Res. 2023, 30, 112267–112276. [Google Scholar] [CrossRef]
Rigatti, S.J. Random forest. J. Insur. Med. 2017, 47, 31–39. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Yang, H.; Gao, C.; Jiang, D.; Zhong, D.; Ma, Y.; Li, Y.; Xing, L.; Zhao, H.; Yang, L.; Li, Z.; et al. Machine learning assisted prediction for the coefficient of thermal expansion of binary crystals. J. Adv. Ceram. 2025, 14. [Google Scholar] [CrossRef]
Wu, J. Introduction to Convolutional Neural Networks; National Key Lab for Novel Software Technology, Nanjing University: Nanjing, China, 2017; Volume 5, p. 495. [Google Scholar]
Pan, F.; Zhu, S.; Shang, L.; Wang, P.; Liu, L.; Liu, J. Assessment of drinking water quality and health risk using water quality index and multiple computational models: A case study of Yangtze River in suburban areas of Wuhan, central China, from 2016 to 2021. Environ. Sci. Pollut. Res. 2024, 31, 22736–22758. [Google Scholar] [CrossRef]
Bankole, A.O.; Moruzzi, R.; Negri, R.G.; Bressane, A.; Reis, A.G.; Sharifi, S.; James, A.O.; Bankole, A.R. Machine learning framework for modeling flocculation kinetics using non-intrusive dynamic image analysis. Sci. Total Environ. 2024, 908, 168452. [Google Scholar] [CrossRef] [PubMed]
Ban, Y.; Liu, L.; Du, J.; Ma, C. Investigation of the treatment efficiency and mechanism of microporous flocculation magnetic fluidized bed (MFMFB) reactor for Pb (II)-containing wastewater. Sep. Purif. Technol. 2024, 334, 125963. [Google Scholar] [CrossRef]
Wei, J.; Chu, X.; Sun, X.Y.; Xu, K.; Deng, H.X.; Chen, J.; Wei, Z.; Lei, M. Machine learning in materials science. InfoMat 2019, 1, 338–358. [Google Scholar] [CrossRef]
Jha, D.; Ward, L.; Paul, A.; Liao, W.-K.; Choudhary, A.; Wolverton, C.; Agrawal, A. Elemnet: Deep learning the chemistry of materials from only elemental composition. Sci. Rep. 2018, 8, 17593. [Google Scholar] [CrossRef]
Oliynyk, A.O.; Antono, E.; Sparks, T.D.; Ghadbeigi, L.; Gaultois, M.W.; Meredig, B.; Mar, A. High-throughput machine-learning-driven synthesis of full-Heusler compounds. Chem. Mater. 2016, 28, 7324–7331. [Google Scholar] [CrossRef]
Lu, S.; Zhou, Q.; Ouyang, Y.; Guo, Y.; Li, Q.; Wang, J. Accelerated discovery of stable lead-free hybrid organic-inorganic perovskites via machine learning. Nat. Commun. 2018, 9, 3405. [Google Scholar] [CrossRef]
Legrain, F.; Carrete, J.; Van Roekeghem, A.; Madsen, G.K.; Mingo, N. Materials screening for the discovery of new half-Heuslers: Machine learning versus ab initio methods. J. Phys. Chem. B 2018, 122, 625–632. [Google Scholar] [CrossRef]
Antoniuk, E.R.; Li, P.; Kailkhura, B.; Hiszpanski, A.M. Representing polymers as periodic graphs with learned descriptors for accurate polymer property predictions. J. Chem. Inf. Model. 2022, 62, 5435–5445. [Google Scholar] [CrossRef]
Nguyen, T.; Bavarian, M. Machine Learning Approach to Polymerization Reaction Engineering: Determining Monomers Reactivity Ratios. arXiv 2023, arXiv:2301.01231. [Google Scholar] [CrossRef]
Dalal, R.J.; Oviedo, F.; Leyden, M.C.; Reineke, T.M. Polymer design via SHAP and Bayesian machine learning optimizes pDNA and CRISPR ribonucleoprotein delivery. Chem. Sci. 2024, 15, 7219–7228. [Google Scholar] [CrossRef]
Li, S.; Liu, Y.; Wang, Z.; Dou, C.; Zhao, W. Constructing a visual detection model for floc settling velocity using machine learning. J. Environ. Manag. 2024, 370, 122805. [Google Scholar] [CrossRef]
Li, L.; Chai, W.; Kang, J.; Liu, J.; Xing, J.; Li, G.; Zhan, Z. Utilization of graphite tailings and coal gangue in the preparation of foamed ceramics. Int. J. Appl. Ceram. Technol. 2025, 22, e15012. [Google Scholar] [CrossRef]
Wang, K.; Li, K.; Du, F.; Zhang, X.; Wang, Y.; Zhou, J. Prediction of coal-gas compound dynamic disaster based on convolutional neural network. J. Min. Sci. Technol. 2023, 8, 613–622. [Google Scholar] [CrossRef]
Zhong, Y.; Liu, Y.; Jiang, Q.; Jin, N.; Lin, Z.; Ye, J. Improvement of mechanical properties and investigation of strengthening mechanisms on the Ti3AlC2 ceramic with nanosized WC addition. J. Adv. Ceram. 2024, 13, 861–876. [Google Scholar] [CrossRef]
Baum, A.; Moiseyenko, R.; Glanville, S.; Martini Jørgensen, T. Image-based characterization of flocculation processes through PLS inspired representation learning in convolutional neural networks. J. Chemom. 2024, 38, e3534. [Google Scholar] [CrossRef]
Yamamura, H.; Putri, E.U.; Kawakami, T.; Suzuki, A.; Ariesyady, H.D.; Ishii, T. Dosage optimization of polyaluminum chloride by the application of convolutional neural network to the floc images captured in jar tests. Sep. Purif. Technol. 2020, 237, 116467. [Google Scholar] [CrossRef]
Al-Ani, S.; Guo, H.; Fyfe, S.; Long, Z.; Donnaz, S.; Kim, Y. Deep learning-based image analysis for filamentous and floc-forming bacteria in wastewater treatment. J. Water Process Eng. 2024, 65, 105772. [Google Scholar] [CrossRef]
Zhao, B.; Cheng, J.; Gao, J.; Haddleton, D.M.; Wilson, P. Active Learning as a Tool for Optimizing “Plug-n-Play” Electrochemical Atom Transfer Radical Polymerization. Macromol. Chem. Phys. 2023, 224, 2300039. [Google Scholar] [CrossRef]
Whitman, S.E.; Latypov, M.I. Machine learning of microstructure--property relationships in materials with robust features from foundational vision transformers. arXiv 2025, arXiv:2501.18637. [Google Scholar] [CrossRef]
Lu, C.; Xu, Z.; Dong, B.; Zhang, Y.; Wang, M.; Zeng, Y.; Zhang, C. Machine learning for the prediction of heavy metal removal by chitosan-based flocculants. Carbohydr. Polym. 2022, 285, 119240. [Google Scholar] [CrossRef]
Shi, Z.; Chow, C.W.; Fabris, R.; Liu, J.; Sawade, E.; Jin, B. Determination of coagulant dosages for process control using online UV-Vis spectra of raw water. J. Water Process Eng. 2022, 45, 102526. [Google Scholar] [CrossRef]
Lin, P.; Sun, C.; Ren, K.; Liu, Y.; Li, Y. Research on intelligent fault identification method of coalfield based on the PSO-XGBoost algorithm. J. Min. Sci. Technol. 2025, 10, 57–69. [Google Scholar] [CrossRef]
Sharafi, M.; Rezaverdinejad, V.; Behmanesh, J.; Samadianfard, S. Development of long short-term memory along with differential optimization and neural networks for coagulant dosage prediction in water treatment plant. J. Water Process Eng. 2024, 65, 105784. [Google Scholar] [CrossRef]
Yokoyama, H.; Yamashita, T.; Kojima, Y.; Nakamura, K. Deep learning-based flocculation sensor for automatic control of flocculant dose in sludge dewatering processes during wastewater treatment. Water Res. 2024, 260, 121890. [Google Scholar] [CrossRef] [PubMed]
Zhu, G.; Lin, J.; Fang, H.; Yuan, F.; Li, X.; Yuan, C.; Hursthouse, A.S. A flocculation tensor to monitor water quality using a deep learning model. Environ. Chem. Lett. 2022, 20, 3405–3414. [Google Scholar] [CrossRef]
Helm, W.; Zhong, S.; Reid, E.; Igou, T.; Chen, Y. Development of gradient boosting-assisted machine learning data-driven model for free chlorine residual prediction. Front. Environ. Sci. Eng. 2024, 18, 17. [Google Scholar] [CrossRef]
Hernández-Chover, V.; Bellver-Domingo, Á.; Castellet-Viciano, L.; Hernández-Sancho, F. Economies of Scale and Efficiency in the Wastewater Treatment Sector: A Decision Tree Approach. Appl. Sci. 2025, 15, 3423. [Google Scholar] [CrossRef]
Martin, N.; White, J. Water Resources’ AI–ML Data Uncertainty Risk and Mitigation Using Data Assimilation. Water 2024, 16, 2758. [Google Scholar] [CrossRef]
Peerzade, S.; Kamat, P. Enhancing water quality prediction: A machine learning approach across diverse water environments. Water Qual. Res. J. 2025, 60, 298–317. [Google Scholar] [CrossRef]
Zhang, T.; Shi, Y.; Liu, Y.; Yang, J.; Guo, M.; Bai, S.; Hou, N.; Zhao, X. A study on microbial mechanism in response to different nano-plastics concentrations in constructed wetland and its carbon footprints analysis. Chem. Eng. J. 2024, 480, 148023. [Google Scholar] [CrossRef]
Barbiero, P.; Squillero, G.; Tonda, A. Modeling Generalization in Machine Learning: A Methodological and Computational Study. arXiv 2020, arXiv:2006.15680. [Google Scholar] [CrossRef]

Figure 1. Conceptual classification of flocculants by origin and typical applications [37].

Figure 2. Schematic diagrams of (A) support vector machine (SVM), (B) decision tree (DT), and (C) artificial neural network (ANN) [102].

Figure 3. The SHAP values for physicochemical features related to expression and cell viability when delivering (A) pDNA or (B) RNP. Higher SHAP values correlate with higher impact on the output variable. The feature value color bar corresponds to the normalized value of the feature of interest (where low = blue; moderate = white; and high = red). Each dot represents a polymer formulation. (A) An overlay spider plot showing the average impact of individual polymer variables on expression and viability when delivering (A) pDNA and (B) RNP. The spider web plot is constructed by taking the mean SHAP value for a given feature across all samples and normalizing to the maximum SHAP value for each output variable. (C) SHAP dependency plot values across two variables relating to expression [121].

Figure 4. Validation results: raw microscopic images (A1–A3), ground truth (B1–B3), and deep learning model output (C1–C3). Mask colors indicate floc-forming bacteria (red), filamentous bacteria (blue), and background (white), with a yellow scale bar of 100 μm [128].

Figure 5. A scatter plot of the predicted heavy metal removal efficiency and experimental data using the RF model [131].

Figure 6. Deep learning effect on prediction of turbidity signal with (a) Mod-Dos for training accuracy, (b) Mod-Dos for training loss, (c) Mod-pH for training accuracy, and (d) Mod-pH for training loss [136].

Table 1. Application of flocculation in wastewater treatment.

Flocculant(s)	Types of Wastewater	Optimum Results	References
Chitosan	Pulp and paper mill wastewater	Turbidity: 10–1.1 NTU	[38,39,40]
	Cardboard industry wastewater	COD: 1303–516 mg/L; 80% removal
	Dye-containing solutions	Turbidity: 85% removal; dye: 99% removal
Anionic tannin	Drinking water	Turbidity: 300–2 FTU	[41,42]
Anionic tannin	Ink-containing effluent from cardboard box-making factory	Color > 99% removal	[41,42]
Modified tannin (cationic Tanfloc)	Polluted surface water	COD: 84% removal; Cu²⁺, Zn²⁺, and Ni²⁺ 90%, 75%, and 70%	[43,44]
Modified tannin (cationic Tanfloc)	Municipal wastewater	Turbidity: almost 100% removal	[43,44]
Anionic Psyllium mucilage (Plantago psyllium)	Sewage effluent	COD around 50%; BOD₅ around 50%; TSS: 95% removal	[45]
Neutral Fenugreek mucilage (Trigonella foenum-graecum)	Tannery effluent	TSS: 87% removal	[46]
Tamarind mucilage (Tamarindus indica)	Golden yellow dye and direct fast scarlet dye	TDS: 40% removal; dye: 60% and 25% removal	[47]
Mallow mucilage (Malva sylvestris)	Biologically treated effluent	Turbidity: 67% removal	[48]
Anionic Isabgol mucilage (Plantago ovata)	Semi-aerobic landfill leachate	COD: 64% removal	[49]
Anionic sodium alginate	Synthetic and actual textile wastewater	Color: 90–93.4% removal; TSS: 96%	[50]
Anionic sodium carboxymethylcellulose (CMCNa)	Drinking water	COD: 80.1%; turbidity: 93%	[51]
Anionic dicarboxylic acid nanocellulose (DCC)	Municipal wastewater	Turbidity: 40–80%; COD: 40–60%	[52]
Derivative of polyacrylamide	Oily wastewater from refinery plant	Oil: 6 g/L to 220 mg/L; COD: 3 g/L to 668 mg/L	[53]
Four cationic (FO-4700-SH, FO-4490-SH, FO-4350-SHU, and FO-4190-SH) and two anionic (FLOCAN 23 and AN 934-SH) polyelectrolytes	Olive mill effluent	TSS: nearly 100% removal; COD: 55% removal; BOD₅: 23% removal	[54]
Cationic polyamine (Magnofloc LT 7991), cationic organic polyelectrolytes (Magnofloc LT 7992 and 7995), cationic polyacrylamide (Hyperfloc CE 854 and CE 1950), and copolymer of quaternary acrylate salt and acrylamide (Magnofloc 22S)	Aquaculture wastewater	TSS: 99% removal; RP: 92–95% removal	[55]
Cationic (FO-4700-SH and FO-4490-SH) polyelectrolytes	Olive mill effluent	TSS: 97–99% removal; TP: 50–56% removal; COD: 17–35% removal	[56]

Table 2. Machine learning for flocculant application optimization.

ML Application	ML-Algorithms/Models	Results	References
Sensor data preprocessing and feature extraction	1. Wavelet denoising and adaptive baseline correction. 2. Convolutional autoencoders on turbidity time series. 3. Multimodal neural network fusing optical and zeta potential sensors.	1. Improved turbidity prediction by 1.7×. 2. Latent features correlated with suspended solids (R² = 0.92). 3. Predicted optimal dosing points with 95% accuracy, which is 20% better than single-sensor models.	[57,58,59]
Dosage prediction from real-time water quality	1. De-model with turbidity, pH, and temperature inputs. 2. LSTM for turbidity and organic matter forecasting. 3. Hybrid model (first principles + ML kernel ridge regression).	1. MAE of 0.12 mg/L, reduced chemical usage by 8% over 6 months. 2. Reduced turbidity spikes by 65% in pilot trials. 3. Dose predictions within 5% of optimal across varying chemistries.	[60,61]
Modeling floc structure and sedimentation	1. Random forest surrogate for CFD simulation outputs. 2. SVM trained on laser diffraction measurements 3. CNN on microscope images.	1. Enabled fast what-if analyses for settling dynamics. 2. R² = 0.88 for floc size prediction. 3. 97% morphology classification accuracy, used for membrane bioreactor tuning.	[62,63]
Integration of ML into control frameworks	1. XGBoost + MPC framework. 2. Reinforcement learning in a drinking water plant. 3. Digital twin combining ML surrogates and real-time data.	1. 12% lower polymer usage than rule-based systems. 2. 14% reduction in chemical costs, improved effluent quality. 3. 30% reduction in dosing errors through virtual testing.	[64,65,66]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Machine Learning in Flocculant Research and Application: Toward Smart and Sustainable Water Treatment

Abstract

1. Introduction

2. Flocculant Classification and Conventional Preparation

2.1. Inorganic Coagulants

2.2. Organic Synthetic Flocculants

2.3. Bioflocculants

2.4. Limitations of Conventional Optimization

3. Machine Learning for Molecular Design, Process Simulation, and Performance Prediction of Flocculants

3.1. Data Processing

3.2. Modeling

4. Machine Learning in Flocculant Synthesis

4.1. Structure-Oriented Design

4.2. Microstructure Image Data Representation

4.3. Reaction Condition Optimization

5. Machine Learning for Flocculant Application Optimization

5.1. Flocculant Selection

5.2. Flocculation Process Monitoring and Dosing Prediction

5.3. Flocculation Dynamics

6. Challenges and Prospects

6.1. Economic Cost of ML in Flocculant Research

6.2. Data Integration

6.3. Modeling Generalization

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics