Automated Machine Learning-Based Prediction of the Effects of Physicochemical Properties and External Experimental Conditions on Cadmium Adsorption by Biochar

Wang, Shuoyang; Song, Xiangyu; Duan, Jicheng; Li, Shuo; Gao, Dangdang; Liu, Jia; Meng, Fanjing; Yang, Wen; Yu, Shixin; Wang, Fangshu; Xu, Jie; Luo, Siyi; Zhao, Fangchao; Chen, Dong

doi:10.3390/w17152266

Open AccessArticle

Automated Machine Learning-Based Prediction of the Effects of Physicochemical Properties and External Experimental Conditions on Cadmium Adsorption by Biochar

by

Shuoyang Wang

¹,

Xiangyu Song

¹,

Jicheng Duan

²,

Shuo Li

¹

,

Dangdang Gao

¹,

Jia Liu

¹,

Fanjing Meng

³,

Wen Yang

¹,

Shixin Yu

¹,

Fangshu Wang

¹,

Jie Xu

¹,

Siyi Luo

¹,

Fangchao Zhao

¹ and

Dong Chen

^1,*

¹

School of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao 266520, China

²

Qingdao Licun River North Bank Water Affairs Co., Ltd., Qingdao 266520, China

³

Faculty of Science and Technology, Norwegian University of Life Sciences, 1433 Ås, Norway

^*

Author to whom correspondence should be addressed.

Water 2025, 17(15), 2266; https://doi.org/10.3390/w17152266

Submission received: 20 June 2025 / Revised: 17 July 2025 / Accepted: 27 July 2025 / Published: 30 July 2025

(This article belongs to the Special Issue Advanced Adsorbent-Based Technologies for Efficient Wastewater Treatment)

Download

Browse Figures

Versions Notes

Abstract

Biochar serves as an effective adsorbent for the heavy metal cadmium, with its performance significantly influenced by its physicochemical properties and various environmental features. Traditional machine learning models, though adept at managing complex multi-feature relationships, rely heavily on expertise in feature engineering and hyperparameter optimization. To address these issues, this study employs an automated machine learning (AutoML) approach, automating feature selection and model optimization, coupled with an intuitive online graphical user interface, enhancing accessibility and generalizability. Comparative analysis of four AutoML frameworks (TPOT, FLAML, AutoGluon, H₂O AutoML) demonstrated that H₂O AutoML achieved the highest prediction accuracy (R² = 0.918). Key features influencing adsorption performance were identified as initial cadmium concentration (23%), stirring rate (14.7%), and the biochar H/C ratio (9.7%). Additionally, the maximum adsorption capacity of the biochar was determined to be 105 mg/g. Optimal production conditions for biochar were determined to be a pyrolysis temperature of 570–800 °C, a residence time of ≥2 h, and a heating rate of 3–10 °C/min to achieve an H/C ratio of <0.2. An online graphical user interface was developed to facilitate user interaction with the model. This study not only provides practical guidelines for optimizing biochar but also introduces a novel approach to modeling using AutoML.

Keywords:

automated machine learning; cadmium removal; environment; isotherm; kinetics; sustainability; thermodynamics

1. Introduction

Heavy metal pollution in aquatic environments has become a significant concern in the fields of global environmental science and public health [1]. Cadmium (Cd), as a representative heavy metal, poses a substantial threat to both aquatic ecosystems and human health due to its tendency to accumulate in natural water bodies and its resistance to natural degradation [2]. Consequently, the removal of Cd from water is crucial for both human health and environmental protection. Previous studies have employed various methods to remove cadmium ions from water, including electrocoagulation, ion exchange, flotation, packed bed filtration, phytoremediation, and reverse osmosis. However, these methods have certain limitations in practical applications, such as high costs, complex operations, high energy consumption, and limited treatment efficiency [2,3,4]. Consequently, there is an urgent need to identify alternative solutions that are more economical, sustainable, and efficient. Adsorption has gradually become an important method for removing Cd from water due to its outstanding treatment efficiency, ease of operation, low cost, and strong adaptability [5].

In adsorption methods, the selection of adsorbents is a critical factor that directly determines the effectiveness of removal [6]. In recent years, biochar has emerged as a novel adsorbent material and has become a focal point in Cd removal research due to its excellent specific surface area, rich pore structure, strong surface affinity, and relatively low production costs [7]. However, the adsorption performance of biochar towards Cd is influenced by various factors, including its physicochemical properties (such as surface area, functional groups, and pore structure) as well as experimental conditions (such as contact time and temperature) [8]. Moreover, due to the complex interactions between the physicochemical properties of biochar, its preparation conditions, and experimental conditions, traditional experimental methods are insufficient to fully reveal the adsorption mechanisms and their relationships with these factors.

In recent years, machine learning (ML), as a data-driven mathematical approach, has demonstrated remarkable performance in handling high-dimensional and complex relationships, enabling the identification of hidden patterns and correlations in environmental issues [9]. These characteristics have led to its widespread application in the fields of environmental pollution assessment and wastewater treatment, providing researchers with novel analytical approaches and efficient technical solutions [10]. For example, Kandpal et al. employed the Random Forest (RF) model to accurately predict biochar yield, achieving a high coefficient of determination (R²) of 0.73, demonstrating strong predictive reliability [11], while Zeeshan and other scholars utilized the Tab-Transformer model for adsorption capacity prediction, achieving similarly high precision (R² = 0.89) [12]. Therefore, ML not only significantly reduces experimental time and costs, improving research efficiency, but also overcomes many limitations in traditional material preparation processes. However, traditional ML methods often require manual completion of steps such as model selection and hyperparameter tuning, which are time-consuming and heavily reliant on expert experience [13]. Therefore, there is an urgent need for an automated solution to optimize these stages in order to enhance efficiency and accuracy. To address the challenges inherent in traditional machine learning methods, Automated Machine Learning (AutoML) has emerged as a solution. By integrating advanced intelligent techniques such as genetic algorithms and Bayesian optimization, AutoML can autonomously perform complex tasks including data preprocessing, feature engineering, model selection, and hyperparameter tuning. This capability significantly reduces reliance on expert knowledge and minimizes human intervention, leading to substantial savings in time and labor costs. Furthermore, through its efficient automated processes, AutoML enhances model performance and boosts overall productivity, thereby providing robust support for the widespread adoption and innovation of machine learning applications [14]. For example, Xu et al. successfully constructed a predictive model for the key variables (such as concentration, type, and size) of microplastics and their methane yield using AutoML [15]. Bao et al. utilized AutoML to efficiently predict and optimize the removal efficiency of antibiotics in constructed wetlands [16]. However, in the critical field of biochar adsorption of Cd, despite the significant potential that AutoML demonstrates in both theoretical and practical applications, existing research reveals notable gaps, lacking systematic and comprehensive analyses.

This study innovatively employs AutoML technology to systematically construct an intelligent prediction model for biochar adsorption of the Cd. First, multi-source literature data, including biochar preparation parameters, physicochemical properties, and external experimental conditions, were integrated and preprocessed (such as handling missing values). Four different frameworks—TPOT, FLAML, AutoGluon, and H₂O AutoML—were employed in conjunction with model interpretability methods, such as SHapley Additive exPlanation and partial dependence analysis (PDA), to identify key features significantly affecting adsorption performance, and to reveal their nonlinear interactions. Subsequently, reverse feature engineering was applied to determine the optimal range of preparation parameters (such as pyrolysis temperature and pyrolysis time) that influence the most important physicochemical properties. Ultimately, a user-friendly and online graphical user interface (GUI) prediction system was developed, enabling intelligent prediction of adsorption performance. This system provides an efficient and intelligent decision support tool for Cd pollution management.

2. Method

The research process comprises six components (Figure 1): (1) data collection; (2) data preprocessing; (3) model performance comparison; (4) feature importance analysis; (5) partial dependence analysis; and (6) graphical user interface.

2.1. Data Collection and Preprocessing

Previous studies have reported that the physicochemical properties of biochar significantly influence its adsorption capacity [7]. For example, properties such as specific surface area and ash content of biochar have a notable impact on its adsorption capacity for Cd [17]. Additionally, external experimental conditions, such as stirring speed and Cd concentration, also significantly affect the adsorption capacity of biochar [18]. Therefore, literature searches were conducted in the Web of Science and Google Scholar databases using the keywords “biochar” and “Cadmium” or “adsorption.” To ensure the reliability of model predictions and interpretations, the biochar used should be pure biochar. The physicochemical properties of biochar recorded in the literature include C (wt%), H (wt%), O (wt%), N (wt%), S (wt%), (O + N)/C, O/C, H/C, ash content (%), pH of Biochar (pH_pzc), particle size (mm), specific surface area (SSA, m²/g), pore volume (cm³/g), average pore diameter (nm), and cation exchange capacity (CEC, mol/kg). External experimental conditions include the initial Cd concentration in the solution (HMC, mg/L), stirring speed (rpm), solution volume (L), biochar concentration (g/L), pH of the dissolved water, adsorption temperature (°C), adsorption time (min), and adsorption capacity (Qe, mg/g). Preparation conditions include pyrolysis temperature (°C), Heating rate (°C/min), and pyrolysis time (min). Through a rigorous manual screening process, 18 peer-reviewed research papers published between 2019 and 2022 were selected, and a total of 598 samples were collected (Table S1). The preparation and experimental treatment process of biochar are illustrated in Figure S1.

Figure 2 displays the frequency of missing values and their cumulative percentage for each feature in the dataset. Among all features with missing data, the highest missing rate was observed for particle size, which reached 100%, followed by cation exchange capacity (missing rate of 68.1%), sulfur (S) content (missing rate of 55.2%), and pH in water (missing rate of 40%). Therefore, features with more than 20% missing values were excluded, while missing values for the remaining features were imputed using the RF model.

2.2. Correlation Coefficient Analysis

Data visualization played a crucial role in the development of ML models, as it provided a clear representation of the data distribution of input and output features [19]. A key characteristic of the dataset is the distribution of data across feature ranges. To effectively display the data trends of various features, histograms offered a simple and efficient way to represent data distributions. Combined with statistical analysis, this approach helped to explore the quality and diversity of the data.

In statistical analysis, the Spearman’s rank correlation coefficient (SCC) was used to measure the linear relationship between two continuous variables, helping to identify redundancy among input features. The formula for its calculation is provided in Equation (1). A heatmap was used to display the SCC between features, aiming to explore the relationship between physicochemical properties, external experimental conditions, and adsorption capacity. Furthermore, the statistical significance of the correlation between input and output features was assessed using p-values [11].

ρ_{x y} = 1 - \frac{6 * \sum d_{i}}{(n * (n^{2} - 1))}

(1)

where

ρ_{x y}

is the SCC for feature to target or target to target, and

d_{i}

is the rank difference between each pair of observations.

ρ_{x y}

ranged from −1 to 1, where 0 indicated no linear correlation, and a highly negative or positive value indicated a strong negative or positive correlation, respectively.

2.3. Implementation of AutoML

To simulate the complex effects of physicochemical properties and external experimental conditions on adsorption capacity, this study selected four excellent AutoML frameworks: TPOT v0.12.0 (Epistasis Lab, University of Pennsylvania, Philadelphia, PA, USA), FLAML v2.2.1 (Microsoft Research, Redmond, WA, USA), AutoGluon v0.8.2 (Amazon Web Services, Seattle, WA, USA), and H₂O AutoML v3.44.0.5 (H₂O.ai, Mountain View, CA, USA). These frameworks have demonstrated exceptional performance in predictive tasks across multiple fields, making them suitable for this study. The four AutoML tools are capable of automatically selecting ML models based on different algorithms and training strategies, and optimizing model performance through stacked ensemble methods. This helps design optimized machine learning models within a limited timeframe, as illustrated in Figure S2.

2.3.1. TPOT

TPOT (Tree-based Pipeline Optimization Tool) is an AutoML tool that utilizes genetic algorithms to simulate the evolutionary process, automating the design and optimization of data preprocessing, feature engineering, and model selection pipelines. It combines ML operations from Scikit-learn in a tree-like structure and optimizes hyperparameters for specific tasks. TPOT’s user manual can be found at https://github.com/EpistasisLab/tpot (accessed on 26 July 2025).

2.3.2. FLAML

FLAML (Fast and Lightweight Automated Machine Learning) is an efficient and lightweight AutoML library developed by Microsoft, designed to quickly build efficient and high-performance ML models through automated processes [14]. It employs intelligent hyperparameter optimization and model selection methods, supporting a variety of tasks such as classification, regression, and time series forecasting. FLAML integrates popular ML algorithms, including XGBoost, LightGBM, and CatBoost. Its key features include high computational efficiency and low resource consumption, making it particularly suitable for medium and small datasets. A detailed description of FLAML can be found at https://github.com/microsoft/FLAML (accessed on 26 July 2025).

2.3.3. AutoGluon

AutoGluon is a high-performance AutoML framework developed by Amazon, designed to simplify the training, optimization, and deployment processes of both deep learning and conventional ML models. It supports a wide range of tasks, including classification, regression, object detection, and analysis of text and tabular data. AutoGluon automatically integrates advanced models, such as neural networks, XGBoost, and CatBoost, and enhances predictive accuracy through intelligent stacking (model ensembling) and multi-model integration. With minimal code, AutoGluon can automatically handle data preprocessing, hyperparameter tuning, and model selection, significantly streamlining the machine learning workflow. It is particularly well-suited for handling large-scale data and complex tasks, enabling developers to quickly build high-performance AI solutions without the need for extensive hyperparameter tuning. Detailed implementation information can be found at https://auto.gluon.ai/ (accessed on 26 July 2025).

2.3.4. H₂O AutoML

H₂O AutoML is an AutoML framework developed by H₂O.ai, designed to simplify the entire ML workflow, from data preprocessing to model deployment [15]. It supports common tasks such as classification and regression and automatically trains and optimizes various models, including Gradient Boosting Machines (GBM), RF, Deep Learning, XGBoost, and Generalized Linear Models (GLM). Additionally, it offers powerful ensemble learning capabilities, such as stacked ensembles, which improve predictive accuracy by integrating multiple models. H₂O AutoML is renowned for its high scalability, enabling efficient handling of large-scale datasets. It provides interfaces for R, Python v3.9, and a web interface, allowing users to quickly build high-performance models without the need for extensive hyperparameter tuning or manual algorithm selection. Detailed documentation can be accessed at https://docs.h2o.ai/h2o/latest-stable/h2o-docs/index.html (accessed on 26 July 2025).

2.4. Evaluation of the AutoML

The raw dataset collected from the literature was randomly split into a training set (90%) and a test set (10%) to ensure the representativeness and independence of both the training and testing data. Five-fold cross-validation was performed on the training set to further ensure the robustness and generalization ability of the model. Subsequently, these AutoML frameworks underwent model training and optimization within a training time range of 60 to 2400 s. To comprehensively evaluate and compare the regression performance of the models, this study utilized common regression performance metrics such as R², Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) [20]. The formulas for these metrics are as follows:

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} (y_{i} - \hat{y_{i}})}

(2)

M A E = \frac{1}{N} \sum_{i = 1}^{N} | y_{i} - \hat{y_{i}} |

(3)

R^{2} = 1 - \frac{{\sum_{i} (\hat{y_{i}} - y_{i})}^{2}}{{\sum_{i} (\hat{y_{i}} - \bar{y})}^{2}}

(4)

where

\hat{y_{i}}

is the predicted output,

y_{i}

is the reported true value of the output,

\bar{y}

is the mean of the output values, and N is the number of data samples in the training or testing datasets.

2.5. Interpretability Analysis

SHAP analysis is a marginal contribution analysis method based on Shapley values [21]. It calculates the marginal contribution of each feature to the model output, constructing an additive explanation model that quantifies and explains the influence of each feature on the prediction results. This approach provides transparent explanations of the model’s decision-making mechanism at both the global and local levels. Residual analysis is a commonly used method to evaluate the quality of model fitting. By plotting the residuals (i.e., the differences between actual and predicted values), the distribution of residuals can be examined [15]. If the residual distribution appears random and without systematic patterns, it indicates that the model fits the data well. Conversely, if there are obvious patterns or trends in the residual distribution, it may suggest potential issues with model fitting. PDA is also a technique used to interpret ML models. By calculating and plotting the average impact of feature changes on the model’s prediction results, PDA demonstrates the marginal effect of a feature on the target variable, thereby helping to understand how the model makes decisions based on changes in features [22].

2.6. Graphical User Interface

After the development of the ML model, it is deployed into website application scenarios. The Graphical User Interface (GUI), as an intuitive and user-friendly interaction method, effectively presents complex machine learning models to practitioners in the environmental field. Through the GUI, users can easily input various physicochemical properties and external experimental conditions, thereby estimating the adsorption capacity of biochar for the heavy metal Cd under real environmental conditions. This interface not only simplifies the model usage process but also enables non-technical personnel to efficiently utilize the model for analysis and decision-making, thus promoting the widespread application of machine learning technologies in environmental protection and pollution management [23].

3. Result

3.1. Data Visualization and Statistical Analysis

This study collected data from two categories and 16 input features, which include the physicochemical properties of biochar and external experimental conditions. To better visualize the data features, the dataset was analyzed using histograms (Figure 3). The histograms clearly display the frequency distribution of the data. For instance, the carbon content (C, wt%) of biochar predominantly falls within the 60–80% range, while the oxygen content (O, wt%) is concentrated in the 10–22% range. The O/C atomic ratio has a mean value of 0.251 ± 0.142 (range: 0.004–0.77), with 75% of the samples falling within the 0–0.35 interval (Supplementary Table S2). The visualization results indicated that the majority of features exhibit a non-normal distribution. Given the limitations of traditional statistical methods in handling non-normal distributions and complex datasets, this study employed ML techniques for analysis. ML has distinct advantages in processing nonlinear relationships and non-normal distribution datasets and has been widely applied and validated across various research fields [19].

Correlation coefficient analysis is a fundamental tool in data science used to reveal the strength of linear relationships between features [24]. Its main purpose is to identify and quantify the linear relationships between features, providing a basis for subsequent model construction. As shown in Figure 4, there was a significant linear relationship between the concentration of Cd in the solution and adsorption capacity (SCC = 0.66, p < 0.001), while the linear relationships between other features and adsorption were relatively weak. This result confirmed the significant impact of Cd concentration in the solution on the adsorption process and further demonstrated the nonlinear effects of other physicochemical properties and external experimental conditions on the adsorption efficiency [20]. It also highlighted the limitations of traditional mathematical models in capturing these complex relationships.

3.2. Model Performance Comparison

This study systematically evaluated the performance of four advanced AutoML frameworks (TPOT, FLAML, AutoGluon, and H₂O AutoML) in predicting biochar adsorption capacity. In the experimental design, the same time range (60 s to 2400 s) was set for each framework to examine the balance between model optimization efficiency and predictive accuracy under varying training time constraints.

As shown in Figure 5a, with the increase in training time, the R² of the model generated by TPOT improved from 0.812 to 0.903. This indicated that the models generated by TPOT could effectively capture the nonlinear relationships between features, thereby improving both fit and predictive performance. Furthermore, compared to the XGBoost model generated by TPOT, the GradientBoosting model generated by TPOT performed best on the test set (RMSE = 7.244).

Similarly to TPOT, FLAML also showed improved performance on the test set as training time increased, with R² rising from 0.897 to 0.913 (Figure 5b). This indicated that FLAML could continuously enhance the fitting accuracy during the automated model optimization process [25]. The cost-frugal hyperparameter optimization algorithm employed by FLAML, which dynamically adjusts the search direction, significantly increased the likelihood of finding the optimal model and parameter combination. This optimization process enabled FLAML to identify XGBoost as the best model for handling this dataset, with an RMSE of 6.861 on the test set.

Compared to TPOT and FLAML, AutoGluon demonstrates extremely high-performance stability after 300 s of training time (Figure 5c). As training time increases, the R² of the models trained by AutoGluon remains consistently around 0.891. Table 1 shows that after training time exceeds 300 s, all model types generated by AutoGluon remain unchanged, further explaining its performance stability [26]. The best model selected by AutoGluon was the WeightedEnsemble, which achieved an RMSE of 7.691 on the test set. Unlike TPOT and FLAML, AutoGluon incorporated neural network models (such as NeuralNetTorch in Table 1) in model evaluation. In performance comparisons, GBM-based models (such as CatBoost, LightGBM, and XGBoost) outperform decision tree-based models (such as RF and ExtraTrees). These results indicated that AutoGluon was capable of identifying and selecting more suitable ML models in a relatively short time, demonstrating its outstanding model optimization and performance stability [27].

Figure 5d shows the performance of H₂O AutoML under the same training time. Compared to other AutoML frameworks, H₂O AutoML demonstrated higher accuracy in predictive performance. Additionally, within the same training time, H₂O AutoML was able to evaluate more models (Table 2). Figure 6a illustrated the number of model types generated by H₂O AutoML at different training times. Whether the training time was 60 s or 2400 s, the models generated by H₂O AutoML exhibited strong predictive capabilities [28]. Figure 6b shows the predictive performance of the optimal model on both the training and test sets after 2400 s of training. The optimal model in H₂O AutoML was the StackedEnsemble model, with an R² of 0.918, RMSE of 6.529, and MAE of 3.002. Figure 6c shows the performance of the optimal model on the test set. The mean of the predicted values for the test set data was represented as σ. The shaded areas represented the three confidence intervals: μ ± σ, μ ± 2σ, and μ ± 3σ. All data points in the test set fell within the μ ± 3σ range, corresponding to a prediction confidence of 99.7%. This indicated that the optimal model accurately predicted the adsorption amount, further proving the reliability and robustness of the model.

In summary, after training times ranging from 60 to 2400 s, all four AutoML frameworks achieved accurate predictions of adsorption capacity on the same dataset (R² > 0.85). These findings further validate the reliability and generalizability of AutoML in biochar adsorption prediction. Moreover, the results indicated that the adsorption capacity of biochar has nonlinear relationships with multiple input features, including physicochemical properties and external experimental conditions. ML models were able to effectively capture and predict these complex relationships, highlighting their potential in analyzing intricate datasets—especially in real-world scenarios involving multiple interacting features [29].

Additionally, the experiments revealed that ensemble-based models, such as StackedEnsemble and WeightedEnsemble_L2, outperformed other models in predicting adsorption capacity. In contrast, models based on GBMs and neural networks showed slightly lower performance [30]. This underscores the advantages of ensemble learning, which combines the strengths of multiple models to reduce individual bias and enhance the robustness and accuracy of predictions [31]. The high-accuracy predictions provided by ensemble learning also lay the solid foundation for subsequent interpretability analysis, facilitating the deeper understanding of the decision-making processes behind the models.

3.3. Feature Importance Analysis

Although the ML models built by the H₂O AutoML framework exhibited the best predictive performance (R² = 0.918), model interpretability analysis remained indispensable. Consequently, this study comprehensively applied various interpretability methods, including residual analysis, feature importance heatmaps, SHAP value analysis, and PDA, to systematically analyze the impact mechanisms of each input feature on the adsorption amount.

Residuals are the differences between the observed values

y_{i}

and the predicted values

\hat{y_{i}}

(i.e.,

y_{i} - \hat{y_{i}}

). Residual analysis helps assess the performance of the Stacked Ensemble model. The residual analysis plot illustrated the distribution of residuals for the test set. Figure 7a shows the frequency distribution of the residuals, with most residuals concentrated around zero, particularly in the low-adsorption-capacity range (0–20 mg/g). The results of the residual analysis provided crucial insights for model optimization, indicating that the model performs better in the lower concentration range [32]. However, prediction errors in the high concentration range could potentially be improved through sample augmentation or feature transformation.

The adsorption behavior of Cd on the surface of biochar was co-regulated by multiple feature parameters, with significant differences in the degree of influence of each feature on the adsorption process. Figure 7b shows the results of the feature importance analysis, in which the concentration of Cd in the solution was identified as the most critical factor (importance 23%). This result indicated that the adsorption process was primarily controlled by mass transfer driving forces, consistent with the concentration gradient-driven mechanism of the Freundlich adsorption isotherm model and aligned with previous studies [33]. The stirring speed was identified as the second-most important feature (importance 14.7%). Increasing the stirring speed could reduce the boundary layer thickness, enhance external mass transfer efficiency, and simultaneously increase the surface renewal frequency, improving the accessibility of active sites and thus increasing adsorption capacity [34]. Among the physicochemical properties of biochar, the most critical feature was the H/C ratio (importance 9.7%). A lower H/C ratio indicated higher aromaticity and carbonization of biochar, containing more aromatic ring structures capable of effectively adsorbing aromatic-ring-containing pollutants [35]. The H/C ratio also influenced the pore structure of biochar, with lower H/C ratios generally associated with the formation of more micropores and mesopores, increasing the specific surface area and enhancing physical adsorption capacity [36]. The second-most important feature in physicochemical properties was ash content (importance 8.6%). Ash contains a large amount of inorganic minerals (such as CaCO₃, SiO₂, and metal oxides), which can alter the surface properties of biochar. Ash content could increase surface roughness and pore structure, enhancing adsorption capacity [37]. Furthermore, the alkalinity of the ash could affect the surface charge properties of biochar, enhancing its adsorption of Cd [38].

The feature importance heatmap further validated the importance ranking of various features across multiple models, supporting the influence of different input features on biochar adsorption capacity. Figure 7c displays the heatmap, clearly showing the relative importance of each feature in different models and their contribution to the prediction results. The heatmap analysis indicated that the concentration of Cd in the solution remained the most critical feature, consistent with previous analyses. Additionally, the heatmap revealed the importance distribution of other features (such as specific surface area and biochar concentration) across different models. Although these features had lower importance, they still influenced the adsorption capacity to some extent. The heatmap further indicated that the adsorption capacity of biochar was a complex nonlinear process driven by the synergistic interaction of multiple features, which requires collaborative optimization for improvement.

The SHAP values generated from the optimal model further elucidate the impact of different features on adsorption capacity across various ranges. SHAP, a method based on game theory for interpreting ML model outputs, quantifies the marginal contribution of input features to the output (i.e., adsorption capacity) through SHAP values. The x-axis represents the SHAP value, indicating the degree of influence that predictor variables exert on the model’s output, while the color gradient (ranging from blue to red) signifies the magnitude of the input variable values. Each point corresponds to an individual sample. Quantitative analysis based on SHAP values is employed to evaluate the overall influence of multiple features on the output results [39]. As illustrated in Figure 7d, the SHAP plot clearly displays the distribution of feature importance. The extensive range of Cd concentrations confirmed their critical role in predicting adsorption capacity. Additionally, stirring speed and adsorption time were predominantly concentrated on the left side in the low-value regions and distributed towards the right side in the high-value regions. This indicated that higher concentrations of Cd in the solution and increased stirring speeds contribute to enhanced adsorption capacity. In contrast, the trend of the H/C ratio exhibited an opposite pattern—medium-to-low ranges of H/C positively affect adsorption capacity, whereas excessively high H/C ratios inhibit the adsorption capabilities of biochar. The influences of ash content and specific surface area on adsorption capacity were relatively gradual, with their effects being insignificant across different numerical ranges.

3.4. Partial Dependency Analysis of Input Features

Through feature importance analysis, we had initially comprehended the influence of physicochemical properties and external experimental conditions on adsorption performance. However, to more thoroughly investigate the relationships among these features, further analysis remains essential. PDA facilitates a comprehensive elucidation of the associations between these features and reveals the mechanisms by which input features affect adsorption capacity. Partial Dependence Plots (PDPs), serving as the central tool of PDA, illustrate the relationship between a target feature and adsorption capacity while holding other features constant. The visualization of PDPs allows for an intuitive observation of the nonlinear interactions and influence dynamics among different features [24]. As depicted in Figure 8, the light-green regions represented the range of partial dependence values within one standard deviation from the mean, aiding in the identification of the sensitivity of adsorption capacity to variations in input features and their potential trends of change.

In this study, for each feature, the height of the bar represented the response value (adsorption capacity). As the Cd concentration increased from 0 mg/L to 137 mg/L, the adsorption capacity of the biochar gradually increased to 34 mg/g (Figure 8a). This phenomenon was attributed to the adsorption process being reliant on the diffusion of Cd ions from the solution to the surface of the biochar. As the concentration of Cd in the solution continued to increase, the concentration gradient intensified, thereby enhancing the driving force for ion migration toward the adsorption sites [40]. Under low concentration conditions, the adsorption sites were not fully occupied, and the adsorption capacity exhibited a linear increase. However, as the concentration was further elevated, the rate of increase in adsorption capacity tended to plateau, possibly due to the near saturation of adsorption sites on the surface of the biochar, which limited the adsorption of additional ions. Stirring speed was the second most influential factor affecting adsorption capacity. With an increase in stirring speed, the adsorption capacity showed a gradually rising trend (as illustrated in Figure 8b). At 173 rpm, a significant increase in adsorption capacity was observed, which was explained by the fact that the stirring speed caused the fluid flow to transition from laminar to turbulent flow beyond the critical Reynolds number, thereby markedly enhancing mass transfer efficiency and, consequently, the adsorption capacity [41].

The H/C ratio was the most important characteristic among the physicochemical properties of biochar. As shown in Figure 8c, the adsorption capacity of biochar generally decreased as the H/C ratio increased. At low H/C ratios, the biochar was derived from high-temperature pyrolysis, primarily composed of condensed polycyclic aromatic hydrocarbon structures, which conferred strong hydrophobicity, fewer surface functional groups (such as -COOH and -OH), as well as a high specific surface area and microporous structure [42]. A higher H/C ratio indicated that the biochar contained more hydrogen and components that were not fully carbonized, thereby retaining more organic functional groups on its surface, such as hydroxyl (-OH) and methyl (-CH₃) groups, which might have been unfavorable for the effective utilization of adsorption sites [43]. Ash content was the second-most important characteristic among the physicochemical properties of biochar. As illustrated in Figure 8d, when the ash content ranged between 0% and 20%, the adsorption capacity gradually increased with increasing ash content. However, as the ash content continued to rise, the adsorption capacity gradually decreased to 10 mg/g. The high ash content likely filled or blocked the pores of the biochar, reducing the specific surface area and the number of effective adsorption sites, which in turn diminished the adsorption capacity [44].

Adsorption time was also an important feature among the external experimental conditions. As illustrated in Figure 8e, during the initial stage of adsorption (0–300 min), the adsorption sites on the biochar were unoccupied, allowing Cd to rapidly diffuse and bind, which resulted in a quick increase in adsorption capacity [45]. This period was characterized by the highest rate of adsorption. As time increased (300–1000 min), the adsorption sites gradually became occupied, leading to a reduction in the adsorption rate. Thereafter, with further extension of the contact time (1000–4800 min), the adsorption capacity exhibited a declining trend, possibly due to the prolonged exposure causing a portion of the adsorbed Cd ions to be released back into the solution.

Biochar concentration was also a crucial feature influencing its adsorption capacity (Figure 8f). In the low concentration range (0–2.5 g/L), the adsorption capacity increased rapidly with the rise in biochar concentration. This was due to the increase in the number of adsorption sites and the good dispersion of biochar particles at lower concentrations, allowing surface sites to fully interact with Cd ions, thereby enhancing the adsorption effect [46]. Within the medium concentration range (2.5–10 g/L), the adsorption capacity tended to stabilize, likely because the adsorption sites became progressively saturated, with most effective sites occupied by Cd ions, resulting in a decreased adsorption rate. When the biochar concentration was further increased (>10 g/L), the adsorption capacity declined. This was possibly attributed to the aggregation of biochar particles at high concentrations, leading to a reduction in specific surface area and a decrease in the utilization efficiency of effective adsorption sites, thereby diminishing the adsorption capacity [47]. Additionally, the trend in specific surface area mirrored that of biochar concentration. Within the specific surface area range of 0–50 m²/g, an increase in specific surface area significantly enhanced adsorption capacity, consistent with previous studies. A larger specific surface area provided more available surface sites for biochar, facilitating the adsorption of Cd ions [48]. Furthermore, under a relatively optimal combination of features, the adsorption capacity of the biochar can reach up to 105 mg/g.

3.5. Optimization Design of Preparation Conditions

As indicated above, the H/C atomic ratio of biochar, as a key feature characterizing its degree of carbonization and surface chemical properties, had become a critical scientific issue for optimizing its Cd adsorption performance. By precisely controlling the preparation process parameters, it was possible to direct the regulation of the H/C ratio and thereby optimize the adsorption performance. In Section 2, the preparation conditions of the biochar—including pyrolysis temperature, pyrolysis time, and heating rate—were collected. By utilizing these three features as inputs and the H/C ratio as the output, partial dependency plots from the model revealed the influence of the preparation conditions on the H/C ratio, thus enabling precise control over the H/C ratio. The modeling procedure was similar to that described in Section 3.2, with detailed parameters provided in the Supplementary Materials (Figure S3).

Based on the results presented in Figure 9a, the H/C atomic ratio of the biochar exhibited a non-monotonic variation within the low-temperature pyrolysis range (<570 °C), which might have been attributed to differences in the release of volatile components during the various stages of pyrolysis. Notably, when the pyrolysis temperature was raised to the mid-to-high temperature range (570–800 °C), the H/C ratio showed a distinct plateau effect, remaining stably below 0.2 with significantly reduced fluctuations. This phenomenon indicated that high-temperature pyrolysis promoted more thorough aromatic condensation and dehydrogenative polymerization, which in turn substantially reduced the number of protonatable sites within the biochar structure and resulted in the formation of a carbonaceous framework with enhanced thermal stability [49]. In contrast to the changes observed with pyrolysis temperature, the influence of pyrolysis time on the H/C ratio exhibited a bistable characteristic (Figure 9b). The experimental results indicated that under both short (≤1 h) and long (≥2.5 h) pyrolysis conditions, the H/C ratio consistently remained at approximately 0.09 ± 0.01.

The heating rate exhibited behavior distinct from the previous feature (Figure 9c). In the lower heating rate range (3–10 °C/min), the H/C ratio was maintained at a relatively low level (<0.10). This observation was likely attributed to the fact that a slower heating process favored the complete decomposition of pyrolysis intermediates and the occurrence of secondary reactions, thereby promoting more exhaustive dehydrogenation and aromatization reactions [50]. However, when the heating rate exceeded 10 °C/min, the H/C ratio stabilized at a higher level (0.12–0.15), indicating that under rapid heating conditions, the kinetics of the pyrolysis reaction dominated the process. The rapid escape of volatile components suppressed deep dehydrogenation reactions, resulting in the retention of a greater number of hydrogen-containing functional groups within the biochar. This phenomenon may have been associated with heat and mass transfer limitations inherent in fast pyrolysis, as the higher heating rate shortened the residence time during pyrolysis and restricted the extent of polymerization reactions, thus maintaining the H/C ratio at an elevated level.

3.6. Graphic User Interface and Data Validation

This study constructed a high-performance predictive model based on the AutoML framework and developed an online graphical user interface for biochar adsorption of Cd (https://heavymetal-cfajqoh4hu5say8vza6c9k.streamlit.app/ (accessed on 26 July 2025)). The interface was developed using the Streamlit framework, leveraging its lightweight and highly interactive characteristics to perfectly align with the rapid prototype development needs in research settings. The interface design adhered to human–computer interaction principles, allowing users to input the physicochemical properties of biochar and external experimental conditions, with the system providing real-time predictions of adsorption capacity (Figure 10a).

The presentation of prediction results was achieved through the use of visualization techniques, which included (1) numerical prediction, displaying the specific predicted value of the adsorption capacity, and (2) a feature importance force plot employing a red–blue dual-color coding system, where red features indicated variables that positively influenced adsorption capacity (SHAP value > 0), and blue features represented inhibitory variables (SHAP value < 0).

The development of this interface was aimed at overcoming the technical barriers associated with applying artificial intelligence techniques in the field of biochar-based Cd adsorption, thereby facilitating the low-threshold deployment and application of machine learning models. At present, the interface has operated stably and supported cross-platform access through mainstream web browsers, providing researchers in relevant fields with a convenient intelligent prediction tool [23].

Finally, the stability of the model was validated using experimental data. The data selected for this experiment were outside of the machine learning dataset, with detailed information available in Supplementary Materials (Table S3). As shown in Figure 10b, the maximum error between the model’s predicted values and the actual values was 9.49%, indicating that the model performed consistently in most cases without significant deviations. This finding demonstrated that the model exhibited a certain degree of robustness, maintaining uniform and consistent predictive performance even when confronted with different input data.

4. Discussion

Future research should focus on refining the entire data–model–application workflow within the AutoML framework. Specifically, future efforts could include (1) establishing a standardized database that encompasses diverse characteristics of biochar (e.g., key features such as cation exchange capacity and pH) to ensure the quality and traceability of the data; (2) integrating the competitive adsorption effects of heavy metals such as Cd, Pb, and Cu into feature engineering by incorporating parameters like ionic strength and complexation constants, thereby enhancing the accuracy of simulations in complex aquatic environments; (3) combining characterization data from techniques such as X-ray absorption spectroscopy to quantify the contributions of repulsive mechanisms (e.g., surface complexation and ion exchange) and, based on these insights, guide feature selection and adjustments to the model architecture; (4) developing an AutoML framework capable of incremental training that continuously incorporates new experimental data and industrial case studies. This iterative approach aims to enhance the robustness of the predictions and ultimately yield an intelligent decision-making platform that simultaneously offers theoretical interpretability and practical engineering utility; (5) despite promising laboratory results, scaling biochar for industrial wastewater treatment is hindered by its cost-competitiveness versus conventional adsorbents, the need to validate long-term performance and stability under complex effluent conditions, the challenge of safely managing and disposing of cadmium-laden spent biochar, and the lack of standardized production protocols to ensure consistent quality and adsorption efficacy

5. Conclusions

This study addressed two major technical challenges in the field of biochar adsorption of heavy metal Cd: (1) the difficulty of traditional experimental methods to systematically analyze the synergistic effects of multiple parameters, and (2) the excessive reliance on expert knowledge in conventional machine learning modeling. To overcome these issues, an intelligent predictive solution based on AutoML was proposed. In order to comprehensively evaluate the predictive performance of four AutoML frameworks under varying training durations (from 60 to 2400 s), the study compared their model accuracy within this time frame. The experimental results demonstrated that all four AutoML frameworks were capable of accurately capturing the relationship between the input features and the output. Among them, the H₂O AutoML framework outperformed the others; under a training time of 2400 s, the StackedEnsemble model achieved an R² of 0.918, an RMSE of 6.529, and an MAE of 3.002, indicating high predictive accuracy and low error. Feature importance analysis revealed that, aside from the concentration of Cd in the solution and the stirring rate, the H/C ratio was the most important feature among the physicochemical properties of biochar. A low H/C ratio contributed to an enhanced adsorption capacity. Therefore, optimizing the preparation process of biochar to adjust the H/C ratio could further improve its adsorption performance. PDP indicated that when the pyrolysis temperature is controlled between 570–800 °C, with a pyrolysis time shorter than 1 h or exceeding 2.5 h, and a heating rate of 3–10 °C/min, biochar can achieve a lower H/C ratio, thereby allowing for precise control of the carbonization degree during the pyrolysis process and resulting in biochar with optimal adsorption properties. Finally, the model obtained through training was integrated into an online GUI, which supported real-time data input and prediction. This integration provided robust support for the optimization and control of the biochar adsorption process for Cd, thereby laying a solid foundation for industrial application. The developed graphical user interface not only enhanced the adsorption performance of biochar for Cd but also offered valuable guidance for intelligent optimization applications in related fields. The next critical task involves the preparation of the optimal biochar predicted in this study, followed by comprehensive laboratory adsorption experiments to validate the accuracy and reliability of our model. The new data obtained from these experiments will be fed back into the AutoML system for iterative model optimization, thereby further enhancing its predictive accuracy.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/w17152266/s1, Figure S1: Schematics of the treatment processes; Figure S2: Automated Machine Learning Framework and Characteristics; Figure S3: Performance of H₂O AutoML on H/C at different training times; Table S1: Literature List; Table S2: Data distribution; Table S3: Validation dataset.

Author Contributions

Writing—original draft, S.W.; Data curation, X.S., S.L. (Shuo Li), D.G. and J.L.; Formal analysis, J.D. and F.M.; Methodology, W.Y.; Software, F.W. and S.Y.; Validation, J.X.; Visualization, S.L. (Siyi Luo) and F.Z.; Funding acquisition, D.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the Special Project for Technology Benefiting People of Qingdao (23–2-7-zdfn-2-nsh), the Luyu Science and Technology Cooperation Project of Shandong Province (2023LYXZ016), and the Theory and Technology Innovation Project of Qingdao University of Technology (CLZ2023–025).

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

Author Jicheng Duan was employed by the company [Qingdao Licun River North Bank Water Affairs Co.]. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

Abbreviation	Full Name
AutoML	Automated Machine Learning
C	Carbon
Cd	Cadmium
CEC	Cation Exchange Capacity
ETMSE	ExtraTreesMSE
FLAML	Fast and Lightweight Automated Machine Learning
GBM	Gradient Boosting Machines
GLM	Generalized Linear Models
GUI	Graphical User Interface
H	Hydrogen
H/C	Hydrogen/Carbon Ratio
HMC	Initial Cd Concentration in the Solution
KNDist	KNeighborsDist
KNUnif	KNeighborsUnif
MAE	Mean Absolute Error
ML	Machine Learning
N	Nitrogen
NNTorch	NeuralNetTorch
O	Oxygen
(O+N)/C	(Oxygen + Nitrogen)/Carbon ratio
O/C	Oxygen/Carbon ratio
PDA	Partial Dependence Analysis
PDP	Partial Dependence Plot
pHpzc	Biochar pH Value
Qe	Adsorption Capacity
R²	Coefficient of Determination
RF	Random Forest
RFMSE	RandomForestMSE
RMSE	Root Mean Squared Error
S	Sulfur
SCC	Spearman’s Rank Correlation Coefficient
SHAP	SHapley Additive exPlanation
SSA	Specific Surface Area
TPOT	Tree-based Pipeline Optimization Tool
WE_L2	WeightedEnsemble_L2

References

Xie, K.; Ou, J.; He, M.; Peng, W.; Yuan, Y. Predicting the Bioaccessibility of Soil Cd, Pb, and As with Advanced Machine Learning for Continental-Scale Soil Environmental Criteria Determination in China. Environ. Health 2024, 2, 631–641. [Google Scholar] [CrossRef]
Liu, C.; Zhang, H.-X. Modified-Biochar Adsorbents (MBAs) for Heavy-Metal Ions Adsorption: A Critical Review. J. Environ. Chem. Eng. 2022, 10, 107393. [Google Scholar] [CrossRef]
Du, Z.; Sun, X.; Zheng, S.; Wang, S.; Wu, L.; An, Y.; Luo, Y. Optimal Biochar Selection for Cadmium Pollution Remediation in Chinese Agricultural Soils via Optimized Machine Learning. J. Hazard. Mater. 2024, 476, 135065. [Google Scholar] [CrossRef]
Wang, R.; Chen, H.; He, Z.; Zhang, S.; Wang, K.; Ren, N.; Ho, S.-H. Discovery of an End-to-End Pattern for Contaminant-Oriented Advanced Oxidation Processes Catalyzed by Biochar with Explainable Machine Learning. Environ. Sci. 2024, 58, 16867–16876. [Google Scholar] [CrossRef] [PubMed]
Kwiatkowski, M.; Kalderis, D. A Complementary Analysis of the Porous Structure of Biochars Obtained from Biomass. Carbon Lett. 2020, 30, 325–329. [Google Scholar] [CrossRef]
Qiu, B.; Tao, X.; Wang, H.; Li, W.; Ding, X.; Chu, H. Biochar as a Low-Cost Adsorbent for Aqueous Heavy Metal Removal: A Review. J. Anal. Appl. Pyrolysis 2021, 155, 105081. [Google Scholar] [CrossRef]
Singh, V.; Pant, N.; Sharma, R.K.; Padalia, D.; Rawat, P.S.; Goswami, R.; Singh, P.; Kumar, A.; Bhandari, P.; Tabish, A.; et al. Adsorption Studies of Pb(II) and Cd(II) Heavy Metal Ions from Aqueous Solutions Using a Magnetic Biochar Composite Material. Separations 2023, 10, 389. [Google Scholar] [CrossRef]
Moon, S.; Lee, Y.-J.; Choi, M.-Y.; Lee, C.-G.; Park, S.-J. Adsorption of Heavy Metals and Bisphenol A from Wastewater Using Spirulina Sp.-Based Biochar as an Effective Adsorbent: A Preliminary Study. J. Appl. Phycol. 2023, 35, 2257–2269. [Google Scholar] [CrossRef]
Nguyen, X.C.; Nguyen, T.T.H.; Le, Q.V.; Le, P.C.; Srivastav, A.L.; Pham, Q.B.; Nguyen, P.M.; La, D.D.; Rene, E.R.; Ngo, H.H.; et al. Developing a New Approach for Design Support of Subsurface Constructed Wetland Using Machine Learning Algorithms. J. Environ. Manag. 2022, 301, 113868. [Google Scholar] [CrossRef] [PubMed]
Hu, A.; Liu, Y.; Wang, X.; Xia, S.; Van Der Bruggen, B. A Machine Learning Based Framework to Tailor Properties of Nanofiltration and Reverse Osmosis Membranes for Targeted Removal of Organic Micropollutants. Water Res. 2025, 268, 122677. [Google Scholar] [CrossRef] [PubMed]
Kandpal, S.; Tagade, A.; Sawarkar, A.N. Critical Insights into Ensemble Learning with Decision Trees for the Prediction of Biochar Yield and Higher Heating Value from Pyrolysis of Biomass. Bioresour. Technol. 2024, 411, 131321. [Google Scholar] [CrossRef] [PubMed]
Jaffari, Z.H.; Abbas, A.; Kim, C.-M.; Shin, J.; Kwak, J.; Son, C.; Lee, Y.-G.; Kim, S.; Chon, K.; Cho, K.H. Transformer-Based Deep Learning Models for Adsorption Capacity Prediction of Heavy Metal Ions toward Biochar-Based Adsorbents. J. Hazard. Mater. 2024, 462, 132773. [Google Scholar] [CrossRef] [PubMed]
Bennett, N.D.; Croke, B.F.W.; Guariso, G.; Guillaume, J.H.A.; Hamilton, S.H.; Jakeman, A.J.; Marsili-Libelli, S.; Newham, L.T.H.; Norton, J.P.; Perrin, C.; et al. Characterising Performance of Environmental Models. Environ. Model. Softw. 2013, 40, 1–20. [Google Scholar] [CrossRef]
Zhang, C.; Tian, X.; Zhao, Y.; Lu, J. Automated Machine Learning-Based Building Energy Load Prediction Method. J. Build. Eng. 2023, 80, 108071. [Google Scholar] [CrossRef]
Xu, R.-Z.; Cao, J.-S.; Ye, T.; Wang, S.-N.; Luo, J.-Y.; Ni, B.-J.; Fang, F. Automated Machine Learning-Based Prediction of Microplastics Induced Impacts on Methane Production in Anaerobic Digestion. Water Res. 2022, 223, 118975. [Google Scholar] [CrossRef]
Bao, H.; Yin, W.; Wang, H.; Lu, Y.; Jiang, S.; Ajibade, F.O.; Ouyang, Q.; Wang, Y.; Nie, S.; Bai, Y.; et al. Automated Machine Learning-Based Models for Predicting and Evaluating Antibiotic Removal in Constructed Wetlands. Bioresour. Technol. 2023, 385, 129436. [Google Scholar] [CrossRef]
Mészároš, L.; Šuránek, M.; Melichová, Z.; Frišták, V.; Ďuriška, L.; Kaňuchová, M.; Soja, G.; Pipíška, M. Green Biochar-Based Adsorbent for Radiocesium and Cu, Ni, and Pb Removal. J. Radioanal. Nucl. Chem. 2023, 332, 4141–4155. [Google Scholar] [CrossRef]
Ren, X.; Xu, Y.; Yang, S.; Chen, Q.; Wei, T. Phosphorous-Functionalized Wheat Straw Biochar for the Efficient Removal of Cadmium and Lead in Aqueous Solution. Water. Air. Soil Pollut. 2023, 234, 555. [Google Scholar] [CrossRef]
Jiang, J.; Xiang, X.; Zhou, Q.; Zhou, L.; Bi, X.; Khanal, S.K.; Wang, Z.; Chen, G.; Guo, G. Optimization of a Novel Engineered Ecosystem Integrating Carbon, Nitrogen, Phosphorus, and Sulfur Biotransformation for Saline Wastewater Treatment Using an Interpretable Machine Learning Approach. Environ. Sci. Technol. 2024, 58, 12989–12999. [Google Scholar] [CrossRef]
Liu, B.; Xi, F.; Zhang, H.; Peng, J.; Sun, L.; Zhu, X. Coupling Machine Learning and Theoretical Models to Compare Key Properties of Biochar in Adsorption Kinetics Rate and Maximum Adsorption Capacity for Emerging Contaminants. Bioresour. Technol. 2024, 402, 130776. [Google Scholar] [CrossRef]
Tian, Y.; Yang, X.; Chen, N.; Li, C.; Yang, W. Data-Driven Interpretable Analysis for Polysaccharide Yield Prediction. Environ. Sci. Ecotechnol. 2024, 19, 100321. [Google Scholar] [CrossRef]
Wang, X.; Gao, Y.; Hou, J.; Yang, J.; Smits, K.; He, H. Machine Learning Facilitates Connections between Soil Thermal Conductivity, Soil Water Content, and Soil Matric Potential. J. Hydrol. 2024, 633, 130950. [Google Scholar] [CrossRef]
Hu, J.; Xu, J.; Li, M.; Jiang, Z.; Mao, J.; Feng, L.; Miao, K.; Li, H.; Chen, J.; Bai, Z.; et al. Identification and Validation of an Explainable Prediction Model of Acute Kidney Injury with Prognostic Implications in Critically Ill Children: A Prospective Multicenter Cohort Study. eClinicalMedicine 2024, 68, 102409. [Google Scholar] [CrossRef]
Fu, W.; Feng, M.; Guo, C.; Zhou, J.; Zhang, X.; Lv, S.; Huo, Y.; Wang, F. Machine Learning-Driven Prediction of Phosphorus Removal Performance of Metal-Modified Biochar and Optimization of Preparation Processes Considering Water Quality Management Objectives. Bioresour. Technol. 2024, 403, 130861. [Google Scholar] [CrossRef]
Jas, K.; Mangalathu, S.; Dodagoudar, G.R. Evaluation and Analysis of Liquefaction Potential of Gravelly Soils Using Explainable Probabilistic Machine Learning Model. Comput. Geotech. 2024, 167, 106051. [Google Scholar] [CrossRef]
Erickson, N.; Mueller, J.; Shirkov, A.; Zhang, H.; Larroy, P.; Li, M.; Smola, A. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. arXiv 2020, arXiv:2003.06505. [Google Scholar]
Angarita-Zapata, J.S.; Maestre-Gongora, G.; Calderín, J.F. A Bibliometric Analysis and Benchmark of Machine Learning and AutoML in Crash Severity Prediction: The Case Study of Three Colombian Cities. Sensors 2021, 21, 8401. [Google Scholar] [CrossRef] [PubMed]
Ferreira, L.; Pilastri, A.; Martins, C.M.; Pires, P.M.; Cortez, P. A Comparison of AutoML Tools for Machine Learning, Deep Learning and XGBoost. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Virtual, 18–22 July 2021; pp. 1–8. [Google Scholar]
Chen, H.; Cao, X.; Wang, J.; Huang, J.; Wang, Z.; Wei, J.; Zhao, D.; Guo, Y.; Peng, Y.; Miao, L. Machine Learning Accelerating the Condition Screening of Ceftriaxone Sodium Anaerobic Co-Metabolic Degradation. ACS EST Eng. 2024, 4, 947–955. [Google Scholar] [CrossRef]
Rudin, C. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef] [PubMed]
Cui, Z.; Ke, R.; Pu, Z.; Ma, X.; Wang, Y. Learning Traffic as a Graph: A Gated Graph Wavelet Recurrent Neural Network for Network-Scale Traffic Prediction. Transp. Res. Part C Emerg. Technol. 2020, 115, 102620. [Google Scholar] [CrossRef]
Tan, X.; Liu, Y.; Zeng, G.; Wang, X.; Hu, X.; Gu, Y.; Yang, Z. Application of Biochar for the Removal of Pollutants from Aqueous Solutions. Chemosphere 2015, 125, 70–85. [Google Scholar] [CrossRef]
Ucun Özel, H.; Gemici, B.; Özel, H.; Berberler, E. Evaluating Forest Waste on Adsorptionof Cd(II) from Aqueous Solution: Equilibriumand Thermodynamic Studies. Pol. J. Environ. Stud. 2019, 28, 3829–3836. [Google Scholar] [CrossRef]
Keiluweit, M.; Nico, P.S.; Johnson, M.G.; Kleber, M. Dynamic Molecular Structure of Plant Biomass-Derived Black Carbon (Biochar). Environ. Sci. Technol. 2010, 44, 1247–1253. [Google Scholar] [CrossRef]
Tong, X.; Li, J.; Yuan, J.; Xu, R. Adsorption of Cu(II) by Biochars Generated from Three Crop Straws. Chem. Eng. J. 2011, 172, 828–834. [Google Scholar] [CrossRef]
Wang, B.; Gao, B.; Fang, J. Recent Advances in Engineered Biochar Productions and Applications. Crit. Rev. Environ. Sci. Technol. 2017, 47, 2158–2207. [Google Scholar] [CrossRef]
Inyang, M.I.; Gao, B.; Yao, Y.; Xue, Y.; Zimmerman, A.; Mosa, A.; Pullammanappallil, P.; Ok, Y.S.; Cao, X. A Review of Biochar as a Low-Cost Adsorbent for Aqueous Heavy Metal Removal. Crit. Rev. Environ. Sci. Technol. 2016, 46, 406–433. [Google Scholar] [CrossRef]
Hong, S.M.; Morgan, B.J.; Stocker, M.D.; Smith, J.E.; Kim, M.S.; Cho, K.H.; Pachepsky, Y.A. Using Machine Learning Models to Estimate Escherichia Coli Concentration in an Irrigation Pond from Water Quality and Drone-Based RGB Imagery Data. Water Res. 2024, 260, 121861. [Google Scholar] [CrossRef]
Xiang, J.; Lin, Q.; Cheng, S.; Guo, J.; Yao, X.; Liu, Q.; Yin, G.; Liu, D. Enhanced Adsorption of Cd(II) from Aqueous Solution by a Magnesium Oxide–Rice Husk Biochar Composite. Environ. Sci. Pollut. Res. 2018, 25, 14032–14042. [Google Scholar] [CrossRef]
Zhang, J.; Liu, H.; Wu, J.; Chen, C.; Ding, Y.; Liu, H.; Zhou, Y. Rethinking the Biochar Impact on the Anaerobic Digestion of Food Waste in Bench-Scale Digester: Spatial Distribution and Biogas Production. Bioresour. Technol. 2025, 420, 132115. [Google Scholar] [CrossRef]
Wang, X.; Zhou, W.; Liang, G.; Song, D.; Zhang, X. Characteristics of Maize Biochar with Different Pyrolysis Temperatures and Its Effects on Organic Carbon, Nitrogen and Enzymatic Activities after Addition to Fluvo-Aquic Soil. Sci. Total Environ. 2015, 538, 137–144. [Google Scholar] [CrossRef]
Rombolà, A.G.; Fabbri, D.; Meredith, W.; Snape, C.E.; Dieguez-Alonso, A. Molecular Characterization of the Thermally Labile Fraction of Biochar by Hydropyrolysis and Pyrolysis-GC/MS. J. Anal. Appl. Pyrolysis 2016, 121, 230–239. [Google Scholar] [CrossRef]
Fan, J.; Li, Y.; Yu, H.; Li, Y.; Yuan, Q.; Xiao, H.; Li, F.; Pan, B. Using Sewage Sludge with High Ash Content for Biochar Production and Cu(II) Sorption. Sci. Total Environ. 2020, 713, 136663. [Google Scholar] [CrossRef]
Qu, J.; Meng, Q.; Peng, W.; Shi, J.; Dong, Z.; Li, Z.; Hu, Q.; Zhang, G.; Wang, L.; Ma, S.; et al. Application of Functionalized Biochar for Adsorption of Organic Pollutants from Environmental Media: Synthesis Strategies, Removal Mechanisms and Outlook. J. Clean. Prod. 2023, 423, 138690. [Google Scholar] [CrossRef]
Berslin, D.; Reshmi, A.; Sivaprakash, B.; Rajamohan, N.; Kumar, P.S. Remediation of Emerging Metal Pollutants Using Environment Friendly Biochar-Review on Applications and Mechanism. Chemosphere 2022, 290, 133384. [Google Scholar] [CrossRef]
Tan, X.-F.; Zhu, S.-S.; Wang, R.-P.; Chen, Y.-D.; Show, P.-L.; Zhang, F.-F.; Ho, S.-H. Role of Biochar Surface Characteristics in the Adsorption of Aromatic Compounds: Pore Structure and Functional Groups. Chin. Chem. Lett. 2021, 32, 2939–2946. [Google Scholar] [CrossRef]
Zhang, A.; Li, X.; Xing, J.; Xu, G. Adsorption of Potentially Toxic Elements in Water by Modified Biochar: A Review. J. Environ. Chem. Eng. 2020, 8, 104196. [Google Scholar] [CrossRef]
Tomczyk, A.; Sokołowska, Z.; Boguta, P. Biochar Physicochemical Properties: Pyrolysis Temperature and Feedstock Kind Effects. Rev. Environ. Sci. Biotechnol. 2020, 19, 191–215. [Google Scholar] [CrossRef]
Kan, T.; Strezov, V.; Evans, T.J. Lignocellulosic Biomass Pyrolysis: A Review of Product Properties and Effects of Pyrolysis Parameters. Renew. Sustain. Energy Rev. 2016, 57, 1126–1140. [Google Scholar] [CrossRef]

Figure 1. Automated machine learning flowchart.

Figure 2. Frequency and percentage of missing data in the dataset.

Figure 3. Input feature data distribution histograms.

Figure 4. Spearman’s rank correlation coefficient and significance.

Figure 5. The performance of each automated machine learning framework at different training times. (a) Performance of TPOT under different training time. (b) Performance of FLAML under different training time. (c) Performance of AutoGluon under different training time. (d) Performance of H₂O AutoML under different training time.

Figure 6. The performance of H₂O AutoML on the dataset. (a) A comparison of the number of model types generated by H₂O AutoML at different training times. (b) The performance of the best model generated by H₂O AutoML on the training and test sets. (c) The predictive performance of the best model generated by H₂O AutoML on the test set. μ is the mean prediction. The filled colors represent three confidence intervals, i.e., μ ± σ, μ ± 2σ, and μ ± 3σ.

Figure 7. The explainable analysis for features. (a) Residual analysis; (b) Feature importance; (c) Feature importance heatmap; (d) SHAP summary plots.

Figure 8. The PDPs for features. (a) Heavy metal Cd concentration; (b) Stirring speed; (c) H/C ratio; (d) ASH content; (e) Adsorption time; (f) Biochar concentration; (g) Specific surface area; (h) Adsorption temperature.

Figure 9. The PDPs for H/C. (a) Pyrolysis temperature (b) Pyrolysis time; (c) Heating rate.

Figure 10. Graphical user interface and data validation. (a) Online prediction interface. (b) The difference between the actual value and the predicted value.

Table 1. The types of models generated by AutoGluon at different training times *.

Time (s)	60	300	900	1200	1800	2400
Model Name	WE_L2	WE_L2	WE_L2	WE_L2	WE_L2	WE_L2
	LGBMXT	LGBMXT	LGBMXT	LGBMXT	LGBMXT	LGBMXT
	CatBoost	CatBoost	CatBoost	CatBoost	CatBoost	CatBoost
	XGBoost	XGBoost	XGBoost	XGBoost	XGBoost	XGBoost
	LGBM	LGBM	LGBM	LGBM	LGBM	LGBM
	LGBML	LGBML	LGBML	LGBML	LGBML	LGBML
	ETMSE	ETMSE	ETMSE	ETMSE	ETMSE	ETMSE
	RFMSE	RFMSE	RFMSE	RFMSE	RFMSE	RFMSE
	NNTorch	NNTorch	NNTorch	NNTorch	NNTorch	NNTorch
	/	NNFAI	NNFAI	NNFAI	NNFAI	NNFAI
	/	KNDist	KNDist	KNDist	KNDist	KNDist
	/	KNUnif	KNUnif	KNUnif	KNUnif	KNUnif

* WE_L2 represented WeightedEnsemble_L2, NNTorch represented NeuralNetTorch, ETMSE represented ExtraTreesMSE, RFMSE represented RandomForestMSE, KNUnif represented KNeighborsUnif, KNDist represented KNeighborsDist.

Table 2. The 20 optimized models and test performance of H₂O AutoML at 2400 s.

Model ID	R²	RMSE	MAE
StackedEnsemble_BestOfFamily_7	0.918	6.529	3.002
GBM_grid_1_AutoML_8_51	/	6.646	3.060
StackedEnsemble_AllModels_6	0.906	7.157	3.624
GBM_grid_1_AutoML_8_167	/	7.061	3.241
GBM_grid_1_AutoML_8_122	/	7.523	3.952
GBM_grid_1_AutoML_8_65	/	8.275	4.033
GBM_grid_1_AutoML_8_151	/	6.223	2.746
GBM_grid_1_AutoML_8_70	/	7.356	3.317
GBM_grid_1_AutoML_8_81	/	6.998	3.004
StackedEnsemble_BestOfFamily_4	0.886	7.851	4.163
GBM_grid_1_AutoML_8_43	/	7.664	3.514
StackedEnsemble_BestOfFamily_6	0.883	7.974	4.102
GBM_grid_1_AutoML_8_139	/	6.691	3.306
GBM_grid_1_AutoML_8_71	/	7.874	3.851
GBM_grid_1_AutoML_8_80	/	7.863	3.538
DeepLearning_grid_1_AutoML_8_7	/	8.515	4.836
GBM_grid_1_AutoML_8_72	/	7.162	3.446
GBM_grid_1_AutoML_8_179	/	7.068	3.185
GBM_grid_1_AutoML_8_120	/	7.206	3.053
GBM_grid_1_AutoML_8_1	/	7.592	3.608

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, S.; Song, X.; Duan, J.; Li, S.; Gao, D.; Liu, J.; Meng, F.; Yang, W.; Yu, S.; Wang, F.; et al. Automated Machine Learning-Based Prediction of the Effects of Physicochemical Properties and External Experimental Conditions on Cadmium Adsorption by Biochar. Water 2025, 17, 2266. https://doi.org/10.3390/w17152266

AMA Style

Wang S, Song X, Duan J, Li S, Gao D, Liu J, Meng F, Yang W, Yu S, Wang F, et al. Automated Machine Learning-Based Prediction of the Effects of Physicochemical Properties and External Experimental Conditions on Cadmium Adsorption by Biochar. Water. 2025; 17(15):2266. https://doi.org/10.3390/w17152266

Chicago/Turabian Style

Wang, Shuoyang, Xiangyu Song, Jicheng Duan, Shuo Li, Dangdang Gao, Jia Liu, Fanjing Meng, Wen Yang, Shixin Yu, Fangshu Wang, and et al. 2025. "Automated Machine Learning-Based Prediction of the Effects of Physicochemical Properties and External Experimental Conditions on Cadmium Adsorption by Biochar" Water 17, no. 15: 2266. https://doi.org/10.3390/w17152266

APA Style

Wang, S., Song, X., Duan, J., Li, S., Gao, D., Liu, J., Meng, F., Yang, W., Yu, S., Wang, F., Xu, J., Luo, S., Zhao, F., & Chen, D. (2025). Automated Machine Learning-Based Prediction of the Effects of Physicochemical Properties and External Experimental Conditions on Cadmium Adsorption by Biochar. Water, 17(15), 2266. https://doi.org/10.3390/w17152266

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automated Machine Learning-Based Prediction of the Effects of Physicochemical Properties and External Experimental Conditions on Cadmium Adsorption by Biochar

Abstract

1. Introduction