DFT-Assisted Machine Learning for Global Optimization of Fe–Carbon Catalyst: Persulfate Activation and Targeted Removal of Emerging Contaminants

Yan, Changchun; Xu, Zhiqiang; Xue, Dingming; Wang, Jiaqing; Lin, Xiaochen; Zhou, Hao; Ma, Bing; Zhang, Houhu

doi:10.3390/catal16050444

Open AccessArticle

DFT-Assisted Machine Learning for Global Optimization of Fe–Carbon Catalyst: Persulfate Activation and Targeted Removal of Emerging Contaminants

by

Changchun Yan

^1,2,†,

Zhiqiang Xu

^1,2,†,

Dingming Xue

^1,2

,

Jiaqing Wang

³

,

Xiaochen Lin

^1,2

,

Hao Zhou

^1,2,*

,

Bing Ma

^1,2,* and

Houhu Zhang

^1,2,*

¹

Nanjing Institute of Environmental Sciences, Ministry of Ecology and Environment of the People’s Republic of China, Nanjing 210042, China

²

East China Regional Technology Center of Hazardous Waste Environmental Risk Prevention and Control, Nanjing 210042, China

³

College of Civil Engineering, Nanjing Forestry University, Nanjing 210037, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Catalysts 2026, 16(5), 444; https://doi.org/10.3390/catal16050444

Submission received: 26 March 2026 / Revised: 4 May 2026 / Accepted: 8 May 2026 / Published: 11 May 2026

(This article belongs to the Special Issue Synthesis and Catalytic Applications of Advanced Porous Materials, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Fe–carbon catalysts (FCCs) are extensively used for persulfate activation in advanced oxidation processes (PS-AOPs), an approach regarded as an efficient and cost-effective strategy for removing emerging contaminants (ECs). However, the quantitative structure–activity relationship between the degradation efficiency of ECs with diverse molecular characteristics and the microstructure of FCCs has not been clearly elucidated. This hinders the widespread practical implementation of FCCs. Herein, density functional theory (DFT)-derived molecular-descriptor-assisted machine learning models were employed to accurately predict the reaction rate constants for EC degradation in FCC-PS AOPs, mainly focusing on three aspects: performance prediction, operating condition optimization and mechanism interpretation. Additionally, DFT-derived descriptors are integrated with fabrication and operational parameters to facilitate the generative design of FCCs. The excellent fitting performance of the overall XGB model in predicting the reaction constants for EC degradation (Test R² = 0.813) highlights the notable advantages of customized hyperparameter tuning for improving predictive accuracy. Subsequently, the submodels trained using different EC clusters (derived from t-SNE and K-means clustering methods) can offer specific strategies for selecting optimal parameters of FCC-PS AOPs that target ECs with distinct properties. The interpretability of the model was improved by using SHAP values and partial-dependence plots to clarify the internal relationships of the ML “black box”. Overall, a feasible and generalizable ML model is proposed to facilitate a paradigm shift in the inverse design of FCCs for the degradation of specific ECs.

Keywords:

machine learning; Fe–carbon catalyst; density functional theory; persulfate; emerging contaminants

1. Introduction

Emerging contaminants (ECs) originate from natural or synthetic sources and reach the environment via diverse pathways, posing serious threats to both human and wildlife endocrine systems. The ECs are divided into several categories, including pharmaceuticals, personal-care products, industrial chemicals and pesticides, which exhibit varying stability and redox properties in the environment, creating significant challenges in their treatment. Up to now, various approaches have been used for the treatment of ECs, such as adsorption [1,2], membrane separation [3,4], bioremediation [5,6] and advanced oxidation processes (AOPs) [7,8]. Among these, AOPs have attracted considerable attention due to their great potential for the degradation of ECs, particularly for persulfate (PS)-based AOPs (PS-AOPs) [9]. Notably, the degradation efficiency of PS-AOPs in removing ECs is strongly governed by the molecular structure. The basic parent structure not only determines the effective reaction time between the pollutant and reactive oxygen species (ROS), but also reshapes the relative contributions of ROS in the degradation pathway by modulating the second-order reaction rate constants, thereby collectively influencing the apparent degradation rate [10]. Given the wide variety of ECs, it is urgent to establish clear and rational PS activation strategies to precisely control the degradation efficiency, thereby achieving efficient and rapid degradation of diverse ECs.

PS-AOPs (peroxymonosulfate (PMS) and peroxydisulfate (PDS)) primarily focus on introducing additional energy sources (e.g., heat, ultraviolet, and ultrasound) and exogenous catalysts including both homogeneous (e.g., Fe²⁺, Cu²⁺ and Co²⁺) and heterogeneous catalysts (e.g., metal oxide and carbon-based material), to activate PS and generate ROS for the degradation of ECs [11,12]. Among them, heterogeneous catalytic PS systems, particularly carbon-based catalytic systems, exhibit outstanding removal efficiency for ECs and are regarded as the most promising for further development due to their high specific surface area, acid–base adaptability, lower cost and environmental friendliness [13]. Recently, Fe–carbon catalysts (FCCs) for PS activation have advanced considerably, especially in catalyst design, mechanistic insights, and practical applications. For instance, Xi et al. prepared an iron and nitrogen co-doped biochar material (Fe@N co-doped biochar) and applied it to activate PS for norfloxacin degradation, achieving 95% degradation within 20 min under optimized conditions [14]. Another study reported that the FeNC-CA catalyst facilitated the rapid degradation of 4-chlorophenol via PMS activation, achieving nearly 100% removal efficiency and a 66.8% mineralization rate within 18 min [15]. However, the removal efficiency of ECs by FCC/PS systems is strongly influenced by their physicochemical properties and substituents. Chen et al. reported that the FeNC-CA-500/PMS system achieved only 18.8% removal of nitrobenzene, a nitro-substituted aromatic, which is markedly lower than the removal efficiencies observed for the aromatic compound with hydroxyl groups [15]. In the D-FeN₄-C/PMS system, the electron-donating compounds could be efficiently removed, while the contaminants with electron-withdrawing groups were difficult to eliminate [16]. Notably, tailoring FCCs to the types of ECs remains a multidimensional challenge due to the complicated nonlinear relationships between the FCC preparation process and degradation performance of ECs. Therefore, establishing a quantitative structure–activity relationship between the degradation efficiency of ECs with different molecular characteristics and the microstructure of FCCs is essential for the rational design of FCCs and the targeted removal of specific ECs.

With the advancement of computing, machine learning (ML) has become a powerful tool for optimizing catalytic materials. It can uncover intricate relationships and patterns within datasets and is now widely adopted in the field of PS-AOP research, including biochar–PS AOPs, metal-free carbon–PS AOPs and UV-PS AOPs [17,18,19]. Although Zhang et al. have constructed an ML model to predict the degradation efficiency of ECs in FCC/PMS systems, a systematic elucidation of how contaminant properties shape this efficiency remains lacking [20]. This could be attributed to the difficulty in ensuring that the molecular descriptors used adequately capture comprehensive information about ECs, which largely determine the predictive accuracy of the ML model. Molecular descriptors are mathematical representations of chemical compounds. They can be derived from density functional theory (DFT) to offer more detailed insights. Integrating these insights with machine learning enables rapid and precise optimization of material designs, significantly shortening development cycles and reducing research costs [21]. For instance, Liu et al. used a DFT-derived molecular-descriptor-assisted ML model to fabricate membranes with high recovery efficiency for textile wastewater [22]. Therefore, the construction of such a model is expected to promote FCC design and reaction optimization tailored to target ECs. Nevertheless, a thorough investigation into its application in the preparation of FCCs is still lacking.

Given these challenges, a comprehensive dataset was created from the published literature, comprising EC characteristics, reaction conditions, and FCC properties correlated with the reaction rate constant (k), to mine and map contaminant degradation. Subsequently, ML models were employed to effectively correlate EC characteristics, reaction conditions, and FCC properties with k, enabling precise prediction of k for diverse ECs. Additionally, K-means clustering and t-distributed stochastic neighbor embedding (t-SNE) methods, along with interpretable ML tools, were applied to guide the rational design of FCCs for efficient PS activation and targeted removal of ECs. This work systematically integrates DFT and ML approaches to establish a novel paradigm for optimizing the entire process of FCC-PS AOPs to remove various ECs, thereby facilitating broader application against EC threats.

2. Materials and Methods

2.1. Data Collection and Preprocessing

Figure 1 presents the specific workflow for the detailed model-building process. For data collection, a comprehensive literature review was performed in Web of Science and Google Scholar with the keywords carbon, Fe, degradation and PS. A total of 48 relevant studies published from 2018 and 2025 were selected and analyzed, with detailed information presented in Table S1. Following a systematic literature review and data collection, a total of 25 input features in three categories of datasets were collected as possible factors influencing the reaction rate constant (k). These included the pyrolysis temperature (T), Fe–carbon catalyst properties (total pore volume (TPV), surface area (SSA) and Id/Ig), chemical composition (Fe, C, N, O, Pyridinic N (Pyd-N), Pyrrole N (Pyr-N), Graphitic N (Gph-N), Fe-Nx), atomic ratios (N/C and O/C), reaction conditions (temperature (Tr), pH, PS concentration and catalyst concertation (Ccatl), and contaminant fingerprints (highest occupied molecular orbital (HOMO), lowest unoccupied molecular orbital (LUMO), the energy gap of HOMO and LUMO (ΔG), vertical ionization potential (VIP), and molecular weight (MW)). In addition, to mitigate the skewed distribution of the target variable, all k values were subjected to a log transformation (lg k). The detailed information about the input features is summarized and listed in Table S2.

To acquire a consistent and comprehensive dataset, data collection and missing value imputation methods were mainly adopted from previous studies [23,24]. The diverse properties of Fe–carbon catalysts were directly captured in tables or extracted from figures in the literature using Web Plot Digitizer 4.6 (https://apps.automeris.io/wpd/ (accessed on 12 March 2026)). The reaction conditions, which play a crucial role in determining the k value, were gathered from the Materials and Methods Sections of the literature. Information on the molecular orbitals and electron structures of pollutants was calculated using the Gaussian 16 package based on the conceptual DFT [25]. Additionally, features with missing values below 10% were imputed with the median, whereas those above the threshold underwent imputation using a random forest model trained on relevant features with the output label excluded to prevent data leakage (Table S3) [17].

Following the imputation of missing values, violin plots were employed for a descriptive analysis of all input features and target variables, offering an initial overview of the raw dataset and highlighting the features with skewed distributions (Figure 2). Notably, the dataset was collected from published studies and therefore inevitably contains inter-study heterogeneity. Differences in catalyst synthesis protocols, reactor configurations, water matrices, analytical methods, and kinetic fitting procedures may introduce uncertainty into the collected variables and target values. To ensure consistency of units and mitigate interference caused by heterogeneity in the selected variable values, both the input and output parameters were normalized using the StandardScaler from Scikit-Learn (Version 1.3.2) to achieve a uniform scale and approximate a normal distribution, as illustrated in Equation (1):

Z_{i} = \frac{x_{i} - μ}{σ}

(1)

where Z_i represents the normalized value of the input variable,

x_{i}

represents a specific origin variable,

μ

represents the average value of x_i, and

σ

represents the standard deviation of the variable [26].

2.2. Model Construction and Evaluation

Following normalization, 70% of the preprocessed dataset was used as the training set to construct the model, with the remaining 30% being reserved as test data to evaluate the performance of the ML models [17,27]. During the training process, the k-fold cross-validation (Stratified k-fold CV) method and grid search were employed to adjust the hyperparameters and evaluate the potential overfitting risk associated with the limited dataset size, thereby enhancing the performance and generalization capability of the models. Six ML models, including k-nearest neighbor (KNN), random forest regression (RF), extreme gradient boosting (XGB), light gradient boosting machine (LGBM), categorical boosting (CAB) and multi-layer perceptron (MLP) were trained separately after hyperparameter tuning. A detailed description of each model is provided in Text S1. To evaluate the predictive performance of the model, the coefficient of determination (R²) and the root mean square error (RMSE) were calculated using Equation (2) and Equation (3), respectively [28,29].

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{p}^{i} - y_{t}^{i})}^{2}}{\sum_{i = 1}^{N} {(y_{t}^{i} - y_{m})}^{2}}

(2)

R M S E = \sqrt{\frac{\sum_{i = 1}^{N} {(y_{t}^{i} - y_{p}^{i})}^{2}}{N}}

(3)

In these equations,

y_{p}^{i}

and

y_{t}^{i}

represents the predicted and actual values, respectively. The

y_{m}

represents the mean values of the output variable, while

N

represents the total number of data points.

To systematically elucidate the relationship between contaminant properties, reaction conditions, and FCC characteristics, and to achieve contaminant-oriented FCC-PS AOPs, a hierarchical ML analysis was employed in this study [30]. Initially, an overall model was developed in the first layer using the EC-AOP dataset to accurately predict lg k. However, since ECs encompass various organic pollutants with different properties, it is difficult to clarify their intrinsic relationships individually. To address this, t-SNE and K-means clustering methods were applied to identify potential groupings of ECs [31,32]. Following this, separate modeling and explanatory analysis were conducted to optimize strategies for each EC category. Specifically, ECs were clustered based on their properties, with the number of clusters being determined using silhouette coefficients. The EC-AOP dataset was then divided into K (the number of clusters) sub-datasets based on these classifications. Regression models were subsequently trained for each sub-dataset to uncover degradation patterns specific to each EC category, which were hidden within the individual clusters.

2.3. Model Interpretability Method

To evaluate the vector similarity between any two variables, the Pearson correlation coefficient (PCC) analysis was calculated using Equation (4):

r = \frac{\sum (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum {(x_{i} - \bar{x})}^{2} \sum {(y_{i} - \bar{y})}^{2}}}

(4)

where

r

is the PCC value for reflecting the strength of the relationship between two variables, and

\bar{x}

and

\bar{y}

are the means of the input variable

x_{i}

and output variable

y_{i}

[33].

To assess the impact of input features on a specific output variable, feature importance analysis was conducted using the SHAPley Additive exPlanations (SHAP) methodology. The SHAP values were derived to quantify the contribution of each feature to the model output, thereby elucidating both the inter-dependencies among predictors and the relative importance of individual variables in driving the final prediction. Additionally, partial-correlation analysis was conducted on the optimized model through one-dimensional and two-dimensional partial-dependency plots (featuring interactions). This approach provided a systematic representation of the relationships between input and output variables, while controlling for the influence of other features, allowing for an assessment of the impact of individual features on the prediction. All modeling and visualizations were implemented in Python (version 3.8), utilizing libraries from Scikit-learn, Matplotlib, and Seaborn.

2.4. Experimental Validation

Detailed descriptions of the chemicals used, catalyst synthesis and characterization, and pollutant detection methods are provided in Texts S2 and S3 of the Supporting Information.

3. Results and Discussion

3.1. Evaluation of Model Performance

A matrix diagram displays the outcomes of the PCC analysis and significance differences between variables: the magnitude of coefficients reflects the extent of influence between two parameters, while the stars indicate the significance of the difference (p < 0.01). No variable exhibited a high linear relation with the target lg k, suggesting the need to construct a comprehensive model (Figure 3). Given the complex dependencies between input features and target variables, six ML models (KNN, RF, XGB, CAB, LGBM and MLP) with varying complexities were constructed to predict the back-end lg k value of ECs in FCC-PS AOPs. The optimal hyperparameters for each model were determined via 5-fold cross-validation during training to enhance model robustness and prevent overfitting. A summary of the optimized hyperparameters is provided in Table S4, while the corresponding predictive performance metrics are listed in Table S5 and visualized in Figure 4. A higher testing R² value coupled with a lower RMSE indicates superior predictive performance of the model [28]. The testing R² of the XGB model was 0.813 (larger than those of other models), and the respective RMSE was 0.441 (smaller than those of other models). This superior accuracy performance validated the XGB model as the optimal candidate for analyzing the datasets during subsequent exploratory data analysis.

To enable easy access for scientists and practitioners, a web-based graphical interface (GUI) was constructed using a Python library called Tkinter, as illustrated in Figure S1. To confirm the ability of the constructed model to seamlessly connect the ECs to FCC design and reaction optimization, the GUI was further validated with four unseen data points that include three ECs with different degradation difficulties. These experimental data points were independent of the initial dataset. As shown in Figure S2, the deviation between the predicted value and the actual value was below 17%. This demonstrates that the GUI could accurately predict the k for different ECs. Consequently, the GUI can reduce the time and cost in relevant research, and quickly acquire contaminant-oriented optimal reaction conditions as well as FCC properties for the operation optimization of real-world FCC-PS AOPs.

3.2. Contaminant-Oriented Hierarchical Framework of Models

Although the constructed model predicts k with satisfactory accuracy, its black-box nature prevents the extraction of effective information. Consequently, the correlation between pollutant properties and the microstructure of FCCs cannot be elucidated, thus hindering the targeted design of the catalyst for ECs removal. To this end, a contaminant-oriented hierarchical analysis was employed. This methodology aims to decipher the opaque decision-making processes within the trained model, thereby extracting the necessary insights for catalyst optimization. Figure S3 provides the overall importance of each input feature to the k, wherein the absolute SHAP values corresponding to each feature are presented. Higher SHAP values signify a more significant influence of the respective input feature on predictive outcomes of the models. Based on SHAP value ranking, the top five features were C, VIP, N/C, Fe-Nx, Tr and SSA, listed in descending order of importance. It was observed that most features were associated with FCC properties, indicating that the surface characteristics of catalysts may inherently control their engagement with superoxide radicals in PS. This further influences the degradation of ECs via radical or nonradical pathways, ultimately affecting the overall degradation efficiency [18]. Notably, the EC quantum chemical descriptor of VIP, which is a standard metric for evaluating molecular stability, is also identified as a significant predictor for the k. Furthermore, the prediction accuracy of the model was significantly compromised with the exclusion of key molecular properties of ECs (e.g., HOMO, LUMO, VIP, △G and MW) (Figure S4). Specifically, the R² value of the testing set decreased substantially from 0.813 to 0.686, highlighting the indispensable role of these molecular descriptors in maintaining predictive performance of the model. Previous studies also confirmed that the contaminant properties are an important quantum parameter for predicting the k of the EC degradation in persulfate and Fenton oxidation processes [34,35]. Given the diversity of ECs in chemical classes and characteristics, it is difficult to rely solely on overall feature importance to clarify the relationships among EC properties, reaction conditions, and FCC characteristics, as well as their influence on k. Therefore, further investigation that focuses on contaminant features is essential, as it is critical for accurately interpreting model predictions and revealing the fundamental mechanisms by which material properties affect the k of different types of ECs.

To address the aforementioned issue, t-SNE and K-means clustering were employed to classify the 27 common ECs into distinct categories. Subsequent Modeling and interpretive analysis for each category established contaminant feature-oriented secondary systems. Specifically, the dimensionality of contaminant characteristics was reduced using t-SNE, and the ECs were divided into two categories through K-means clustering, with the results presented in Figure 5a. More detailed information, including the chemical names and molecular structures in the submodels Model I and Model II, can be found in Table S6. Based on distinct electronic and structural characteristics, the ECs were categorized into two groups for modeling. Model I comprises ECs with low HOMO, LUMO, VIP, and △G values but a high MW. Conversely, Model II is defined by higher HOMO, LUMO, VIP and △G values, coupled with a lower MW (Figure 5b). The HOMO energy reflects the electron-donating ability of ECs, with a higher value indicating stronger electron donation and a lower LUMO energy implying better electron acceptance [34,35,36]. The △G and VIP parameters describe molecular stability, where higher values denote greater stability and oxidative resistance, and lower values suggest higher reactivity and degradation tendency [30]. Moreover, a larger MW may suggest the presence of a more complex chemical structure and multiple functional groups, providing more reaction sites and possibilities, thereby enabling the ECs to be degraded more readily.

Therefore, most ECs in Model I are characterized by relatively easy degradability, high molecular weight, and susceptibility to nucleophilic attack, whereas those in Model II, conversely, exhibit greater resistance to degradation, lower molecular weight, and a higher susceptibility to electrophilic attack. They were all fitted using the basic XGB algorithm, and two submodels were trained for further in-depth analysis. Details regarding the datasets of Model I and Model II are explicitly presented in Table S7. As shown in Figure 5c,d, the test set R² values for both models surpassed 0.81, with Model I and Model II achieving training-set R² values up to 0.99, demonstrating their satisfactory performance in predicting k values. The model hyperparameters and model stability evaluation are listed in Tables S8 and S9. Briefly, the findings demonstrate that these two submodels, with their optimized hyperparameters, are the most effective algorithms for achieving a comprehensive understanding and simulation of the degradation kinetics of various ECs, thereby promoting the rational design of FCCs and the targeted removal of specific ECs.

3.3. Critical Roles of Specific Features in Different Models

SHAP, a widely used interpretable ML technique, was employed to quantify the influence of key features on the rate constant. In this framework, a larger absolute SHAP value indicates a stronger influence of a feature on the model predictions. For Model I, the factors affecting the prediction of k are ranked by their contribution as follows: FCC characteristics (66.8%) > experimental conditions (18.0%) > EC properties (15.2%). As shown in Figure 6a, Fe content is the most significant predictor, with an average absolute SHAP value of 0.37, followed by C content at 0.30. Higher Fe content values are associated with negative SHAP values, while lower values show positive contributions, suggesting a negative relationship between Fe content and k (Figure 6b). PDP analysis reveals a convergence trend in the output variables, showing that Fe content and k exhibit a positive correlation below 0.54%. Conversely, within the range of 0.54% to 5.6%, an inverse correlation is observed. Thereafter, with Fe content surpassing 5.6%, the k value stabilizes at a minimum level (Figure 6c). Figure 6d displays that k exhibits an increasing trend within the C content range of 56.07–74.08%, followed by a gradual gentle plateau as the C content further increases, demonstrating a nonlinear influence. The optimal Fe and C contents for achieving maximum k are determined via the 2D PDP analysis [37], which can offer guidance on screening the reaction conditions of an actual water environment (Figure 6e). Notably, when the carbon content is at the optimal level of approximately 74% or above, the degradation constant k decreases obviously with increasing Fe content. This observation can be ascribed to the superior dispersibility of an appropriate amount of Fe on the carbon support, which favors the formation of abundant active sites. In contrast, excess Fe tends to agglomerate during the synthesis process, leading to the formation of larger Fe particles. These larger particles not only block the pore structure of the carbon support and impede the mass transfer of reactants and products, but also may cover the intrinsic active sites on the carbon material or at the Fe–carbon interfaces, ultimately diminishing the total effective reaction area and decreasing the degradation kinetics [38].

Feature importance analysis for Model II indicates that Ccatl, N, CPMS and Fe are the four most significant features. The importance of those related to reaction conditions is significantly increased compared to Model I, accounting for 31.6% of the total contribution (Figure 7a). Additionally, higher values of the dosages of catalyst and PMS are associated with positive SHAP values, confirming their positive influence on predicting k (Figure 7b). Analysis of the PDP in Figure S5a,b indicates a continuous rise in the k value with increasing dosages of FCCs and PMS, confirming their critical role in the degradation of ECs in Model II. The 2D PDP diagram of Ccatl and CPMS provides a visual representation of the complex relationship between FCC dosage, PMS concentration and k (Figure S5c). Increasing both PMS concentration and catalyst dosage enhances degradation efficiency, though at a higher cost. Additionally, with a fixed catalyst dosage, degradation efficiency shows limited improvement beyond a certain PMS concentration due to the saturation of active sites [39]. It is noteworthy that the impact of FCC characteristics is more significant, accounting for 48.2% of the total contribution. Specifically, among these characteristics, the N content of FCCs is found to be the most important feature for predicting k, followed by the Fe content. The PDP of N content is presented in Figure 7c, where the curve exhibits a significant initial increase followed by a flattening trend with the N content exceeding a certain value (>6.89%). This confirms that N doping can effectively promote the enhancement of the degradation constant. It has been found that the modification of the sp²-carbon framework with N could alters its local charge distribution, which in turn facilitates the redox cycle of the iron species and enhances the overall activation efficacy [40]. In contrast to Pyr-N, the content of Pyd-N showed a positive correlation with the k value, suggesting that it likely serves as the primary active site among the nitrogen species. Therefore, the enhancement effect of increased N content on pollutant degradation may stems from its ability to elevate the content of Pyd-N, which in turn drives persulfate activation to generate more reactive species and ultimately accelerates the degradation kinetics [41,42]. However, excessive N doping may compromise the charge balance and disrupt the charge redistribution within the catalyst, which in turn impairs PMS activation [43,44,45]. Therefore, despite an increase in nitrogen content, the enhancement in the EC degradation rate remains modest. Consistent with Model I, increased Fe content is also found to inhibit the removal of ECs (Figure 7d). As shown in Figure 7e, it can be observed that when the Fe content ranges from 0.12% to 0.966%, the degradation constant of ECs is significantly enhanced as N levels increase from 4.48%. This finding offers critical insights into the design of such materials.

For the pollutants targeted in Model I, optimization efforts should focus on regulating the Fe and C contents of the FCC. In contrast, for those in Model II, the degradation constant can be improved by co-optimizing the dosages of FCCs and PMS, along with the N content of the FCC. This pollutant-oriented optimization strategy can serve as a guide for the preliminary design of FCCs in practical applications. However, the actual situation must also be considered, and the GUI should be utilized for more detailed predictive analysis to verify the results. In short, this approach promotes the seamless integration of ML model development and application for targeted EC degradation and material design.

3.4. FCC Design Based on Machine Learning for EC Removal

To validate the constructed ML tools, two representative ECs were selected as targets: methylene blue (MB) from Model I, and p-nitrophenol (p-NP) from Model II. As shown in Figure 8a, the degradation efficiencies of MB and p-NP were evaluated via PMS activation using the low-N-doped FCC-1. The preparation parameters were primarily based on the predictions from Model I, with an emphasis on regulating the Fe and C contents. The results demonstrate that MB was effectively degraded in this system under conventional conditions, whereas p-NP exhibited limited degradation efficiency in the FCC-1-mediated oxidation process. To enhance the catalytic performance of the FCC material for treating antibiotics represented in Model II, subsequent experiments were conducted based on the predictions of Model II by increasing the nitrogen doping level. Specifically, the nitrogen content of FCC-2 was elevated from 2.36% to 8.78% to systematically investigate the role of N doping in the catalytic system (Table S10). XRD analysis revealed that both materials exhibited a broad diffraction peak at around 26.7°, corresponding to the (002) plane of graphitic carbon, confirming the formation of a partially graphitized structure (Figure 8b). In the case of FCC-2, the observed diffraction peaks mainly originated from crystalline Fe₃O₄, Fe₃C, and Fe₂C species. The removal efficiency of p-NP increased from 40.5% to 85.2%, and the corresponding k rose from 0.0155 to 0.0693 min⁻¹ after 30 min of treatment, demonstrating the significant role of N doping (Figure S9). XPS analysis revealed that the relative content of Fe(II) decreased from 84.29% to 52.42%, indicating the consumption of Fe species during PMS activation for generating reactive oxygen species (Figure 8c). Concurrently, the content of Pyd-N (397.9 eV) declined from 62.63% to 57.77%, and that of Fe-Nx (399.8 eV) decreased from 19.36% to 16.14%. In contrast, the contents of Pyr-N (400.3 eV) and Oxi-N (402.6 eV) increased from 13.59% to 17.82% and from 4.42% to 8.28%, respectively [46] (Figure 8d). Chen et al. reported that due to the attack of radicals on Pyd-N and Gph-N, N dopants may undergo restructuring, resulting in the transformation of Pyd-N or Gph-N into more stable Pyr-N [47]. These results indicated that Pyd-N and Fe-Nx participated in the activation of PMS as active species, with Pyd-N playing a particularly prominent role. This observation is consistent with the prediction result. The Pyd-N with lone-pair electrons enables the transfer of delocalized π-electrons from sp²-hybridized carbons to activate PMS, generating •OH and SO₄^•−, which ultimately enhance the removal efficiency of p-NP [48].

Furthermore, the developed Model II was employed to predict the k for p-NP degradation in the two distinct FCCs. As shown in Figure S10, the predicted k values for the FCC-1-PMS system were slightly lower than the experimentally determined value, whereas those for FCC-2-PMS were marginally higher. The overall prediction errors of the models remained below 12%. The contribution of individual features to the predicted output for a given sample is visualized using force plots. In these plots, positive and negative contributions are represented by red and blue bars, respectively. For FCC-1, most of the N-related parameters (e.g., total N content, Pyr-N content, and N/C) demonstrated an inverse correlation with the predicted k values, indicating that N doping may effectively improve the prediction accuracy of Model II (Figure 8e). Regarding FCC-2, whose performance was further optimized, the significant increase in the k value correlated with the improvement in total N content. The substantial increase in the N content of FCC-2 resulted in the generation of numerous Prd-N, thereby accelerating the degradation rate of p-NP (Figure 8f). Notably, the degradation efficiency in natural surface water experiments decreased significantly due to interference from coexisting substances, but the degradation trend remained consistent with that in pure water, confirming that the model has good stability in practical applications (Figure S11).

4. Conclusions

In summary, DFT-derived molecular-descriptor-assisted machine learning models were proposed to guide FCC design for EC degradation, integrating input features belonging to three categories: EC properties, FCC characteristics and reaction conditions. A precise and robust XGB model was developed to uncover the nonlinear relationships inherent in the dataset and accurately predict the performance of FCC materials with the assistance of the GUI. With the feature importance tool of the overall model, we subsequently discovered that the EC properties had a significant impact on the k. Therefore, targeting the molecular fingerprints of ECs, hierarchical analysis based on t-SNE and K-means clustering methods was developed to categorize the ECs into Model I and Model II to facilitate the rational design of FCCs. Through SHAP analysis, both local and global contributions of the input features were identified, and the key fabrication factors affecting different types of EC degradation were emphasized. Finally, through the actual experiments, the hierarchical ML analysis was significantly effective in guiding the rational design of FCCs for efficient PS activation and targeted removal of ECs. While machine learning is incapable of entirely replicating the intricacies of experimental procedures, the framework introduced in this study establishes a technical foundation for developing a scalable, interpretable, and highly accurate platform for optimizing water treatment processes. Furthermore, the synergistic system developed in this study will significantly shorten the development cycle of iron–carbon catalysts and effectively reduce material and energy consumption during the research and development process. This strategy not only provides a quantifiable pathway for applying green chemistry principles in water treatment but also directly supports the sustainable development goals of low carbon usage and low resource consumption, laying a solid foundation for the future intelligent and green transformation of environmental remediation technologies.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/catal16050444/s1.

Author Contributions

Conceptualization, C.Y. and Z.X.; methodology, D.X.; software, C.Y.; validation, D.X., J.W. and X.L.; formal analysis, C.Y.; investigation, Z.X.; resources, C.Y.; data curation, C.Y.; writing—original draft preparation, C.Y.; writing—review and editing, C.Y.; visualization, H.Z. (Hao Zhou); supervision, B.M.; project administration, H.Z. (Hao Zhou); funding acquisition, H.Z. (Houhu Zhang). All authors have read and agreed to the published version of the manuscript.

Funding

This study was financially supported by the National Science and Technology Major Project (2025ZD1208204), the Special Fund of the Carbon Peak and Carbon Neutrality Research Institute that is supported by the Nanjing Institute of Environmental Sciences, Ministry of Ecology and Environment (ZX2023SZY060) and the National Key Research and Development Program of China (2022YFC3901405).

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding authors upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yan, C.C.; Li, J.; Sun, Z.H.; Wang, X.J.; Xia, S.Q. Mechanistic insights into removal of pollutants in adsorption and advanced oxidation processes by livestock manure derived biochar: A review. Sep. Purif. Technol. 2024, 346, 127457. [Google Scholar] [CrossRef]
Jia, Y.Y.; Ou, Y.Y.; Khanal, S.K.; Sun, L.P.; Shu, W.S.; Lu, H. Biochar-based strategies for antibiotics removal: Mechanisms, factors, and application. ACS ES&T Eng. 2024, 4, 1256–1274. [Google Scholar] [CrossRef]
Guo, Y.L.; Jiang, L.; Zhao, S.C.; Ghazouani, S.; Liu, Y.Y.; Yuan, S.S.; Wei, S.C.; Xue, S.M.; Hou, Y.F.; Wang, Z.Y.; et al. Multistage nanofiltration membranes with adsorptive substrate for enhanced removal of emerging organic pollutants. J. Membr. Sci. 2025, 727, 124095. [Google Scholar] [CrossRef]
Yang, H.L.; Zhang, W.T.; Zhang, H.G.; Tan, Q.; Zhang, X.R.; Yang, X.; Meng, F.G.; Zhao, S.S. Electric field responsive carbon nanotube-based polyamide membrane for enhanced antibiotics separation: From charge regulation to performance enhancement. J. Membr. Sci. 2026, 738, 124761. [Google Scholar] [CrossRef]
Ibrahim, F.G.G.; López-Serna, R.; Torre, R.M.; Pastor, A.C.; Crespo, I.D. Evaluating emerging pollutant removal in a scale-down high rate algae pond. J. Environ. Chem. Eng. 2025, 13, 116010. [Google Scholar] [CrossRef]
Nawaz, M.Z.; Khalid, H.R.; Mirza, M.U.; Xu, L.X.; Haider, S.Z.; Al-Ghanim, K.A.; Barceló, D.; Zhu, D.C. Elucidating the bioremediation potential of laccase and peroxidase enzymes from Bacillus ligniniphilus L1 in antibiotic degradation: A computationally guided study. Bioresour. Technol. 2024, 413, 131520. [Google Scholar] [CrossRef] [PubMed]
Nie, C.Y.; Wang, J.L.; Cai, B.H.; Lai, B.; Wang, S.B.; Ao, Z.M. Multifunctional roles of MoS₂ in persulfate-based advanced oxidation processes for eliminating aqueous organic pollutants: A review. Appl. Catal. B-Environ. Energy 2024, 340, 123173. [Google Scholar] [CrossRef]
Yan, C.C.; Sun, Z.H.; Li, J.; Sun, X.; Wang, X.J.; Xia, S.Q. Enhanced activation of peroxymonosulfate by oxygen-doped FeS₂/Co₃S₄ anchored on activated carbon fibers for sulfamethoxazole degradation and sulfonamide-resistant bacteria inactivation: The key role of surface and interface sulfur vacancies. Chem. Eng. J. 2025, 505, 159028. [Google Scholar] [CrossRef]
Chen, F.T.; Fan, Y.L.; Lin, Z.H.; Li, Z.X.; Qiu, X.Q.; Lou, H.M.; Pang, Y.X. Sulfur-doped FeP enables a pH-induced switch from •OH to Fe(IV)=O for efficient sulfamethoxazole degradation via Fenton-like catalysis. Appl. Catal. B-Environ. Energy 2025, 379, 125712. [Google Scholar] [CrossRef]
Xie, Z.H.; He, C.S.; Zhou, H.Y.; Li, L.L.; Liu, Y.; Du, Y.; Liu, W.; Mu, Y.; Lai, B. Effects of molecular structure on organic contaminants’ degradation efficiency and dominant ROS in the advanced oxidation process with multiple ROS. Environ. Sci. Technol. 2022, 56, 8784–8795. [Google Scholar] [CrossRef]
Wang, J.L.; Wang, S.Z. Activation of persulfate (PS) and peroxymonosulfate (PMS) and application for the degradation of emerging contaminants. Chem. Eng. J. 2018, 334, 1502–1517. [Google Scholar] [CrossRef]
Guo, Y.W.; Cheng, M.; Wang, Z.W.; Zhang, G.X.; Wang, G.F.; Liu, H.D.; Chen, A.; Shi, Q.K.; Kang, H.Y.; Yuan, Z.Y. Carbon-based single-atom catalysts from biomass for persulfate-advanced oxidation processes: Perspectives on synthesis, application, modulation, and sustainability. Chem. Eng. J. 2025, 520, 166119. [Google Scholar] [CrossRef]
Oyekunle, D.T.; Gendy, E.A.; Ifthikar, J.; Chen, Z.Q. Heterogeneous activation of persulfate by metal and non-metal catalyst for the degradation of sulfamethoxazole: A review. Chem. Eng. J. 2022, 437, 135277. [Google Scholar] [CrossRef]
Xi, M.F.; Cui, K.P.; Cui, M.S.; Ding, Y.; Guo, Z.; Chen, Y.H.; Li, C.X.; Li, X.Y. Enhanced norfloxacin degradation by iron and nitrogen co-doped biochar: Revealing the radical and nonradical co-dominant mechanism of persulfate activation. Chem. Eng. J. 2021, 420, 129902. [Google Scholar] [CrossRef]
Chen, L.K.; Huang, Y.F.; Zhou, M.L.; Xing, K.W.; Rao, L.J.; Lv, W.Y.; Yao, Y.Y. Enhanced peroxymonosulfate activation process based on homogenously dispersed iron and nitrogen active sites on a three-dimensional porous carbon framework. Chem. Eng. J. 2021, 404, 126537. [Google Scholar] [CrossRef]
Wu, Z.L.; Xiong, Z.K.; Huang, B.K.; Yao, G.; Zhan, S.H.; Lai, B. Long-range interactions driving neighboring Fe-N sites in Fenton-like reactions for sustainable water decontamination. Nat. Commun. 2024, 15, 7775. [Google Scholar] [CrossRef]
Wang, R.P.; He, Z.X.; Chen, H.L.; Wang, K.; Zhang, S.Y.; Ren, N.Q.; Ho, S.H. Deciphering N-doped biochar design for non-radical pathways through hierarchical machine learning. ACS ES&T Eng. 2024, 4, 1738–1747. [Google Scholar] [CrossRef]
Li, Z.J.; Zhou, J.C.; Huang, W.J.; Qin, J.H.; Li, A.J.; Yin, R.L.; Zhu, M.S. Integrated machine learning and density functional theory to reveal the relationship between water contaminants and singlet oxygen persulfate activation. Appl. Catal. B-Environ. Energy 2025, 372, 125322. [Google Scholar] [CrossRef]
Liang, J.L.; Wang, D.D.; Zhen, P.; Wu, J.K.; Li, Y.Y.; Liu, F.Y.; Shen, Y.; Tong, M.P. Combination of density functional theory and machine learning provides deeper insight of the underlying mechanism in the ultraviolet/persulfate system. Environ. Sci. Technol. 2025, 59, 6891–6899. [Google Scholar] [CrossRef] [PubMed]
Zhang, S.Z.; Chen, S.; Jiang, H. A new tool to predict the advanced oxidation process efficiency: Using machine learning methods to predict the degradation of organic pollutants with Fe-carbon catalyst as a sample. Chem. Eng. J. 2023, 280, 119069. [Google Scholar] [CrossRef]
Aldossary, A.; Campos-Gonzalez-Angulo, J.A.; Pablo-García, S.; Leong, S.X.; Rajaonson, E.M.; Thiede, L.; Tom, G.; Wang, A.; Avagliano, D.; Aspuru-Guzik, A. In silico chemical experiments in the age of AI: From quantum chemistry to machine learning and back. Adv. Mater. 2024, 36, e2402369. [Google Scholar] [CrossRef]
Liu, P.; Xu, H.B.; Jin, P.R.; Zhu, X.W.; Zheng, J.F.; Liu, Y.L.; Yang, J.X.; Xu, D.L.; Liang, H. DFT-assisted machine learning for polyester membrane design in textile wastewater recovery applications. Water Res. 2025, 279, 123438. [Google Scholar] [CrossRef] [PubMed]
Palansooriya, K.N.; Li, J.; Dissanayake, P.D.; Suvarna, M.; Li, L.Y.; Yuan, X.Z.; Sarkar, B.; Tsang, D.C.W.; Rinklebe, J.; Wang, X.N.; et al. Prediction of soil heavy metal immobilization by biochar using machine learning. Environ. Sci. Technol. 2022, 56, 4187–4198. [Google Scholar] [CrossRef] [PubMed]
Wang, R.P.; Zhang, S.Y.; Chen, H.L.; He, Z.X.; Cao, G.L.; Wang, K.; Li, F.H.; Ren, N.Q.; Xing, D.F.; Ho, S.H. Enhancing biochar-based nonradical persulfate activation using data-driven techniques. Environ. Sci. Technol. 2023, 57, 4050–4059. [Google Scholar] [CrossRef]
Guo, F.S.; Sun, S.Y.; Son, Y.; Kim, J.; Han, S.Y.; Zhang, Z.S.; Cui, M.C.; Khim, J. Machine learning predictive models, interpretations, and applications of metal-based catalysts, UV, and thermal activation persulfate-based advanced oxidation processes. ACS ES&T Eng. 2025, 5, 2910–2924. [Google Scholar] [CrossRef]
Qiao, J.Z.; Li, B.B.; Li, Y.Q.; Deng, D.F.; Wang, Y. Prediction of Biochar’s capacitive deionization performance through machine learning. Sep. Purif. Technol. 2025, 379, 134848. [Google Scholar] [CrossRef]
Deng, H.; Luo, Z.Y.; Imbrogno, J.; Swenson, T.M.; Jiang, Z.Y.; Wang, X.N.; Zhang, S. Machine learning guided polyamide membrane with exceptional solute-solute selectivity and permeance. Environ. Sci. Technol. 2022, 57, 17841–17850. [Google Scholar] [CrossRef]
Du, Z.L.; Sun, X.; Zheng, S.N.; Wang, S.Y.; Wu, L.N.; An, Y.; Luo, Y.M. Optimal biochar selection for cadmium pollution remediation in Chinese agricultural soils via optimized machine learning. J. Hazard. Mater. 2024, 476, 135065. [Google Scholar] [CrossRef]
Zhu, X.Z.; He, M.J.; Sun, Y.Q.; Xu, Z.B.; Wan, Z.H.; Hou, D.Y.; Alessi, D.S.; Tsang, D.C.W. Insights into the adsorption of pharmaceuticals and personal care products (PPCPs) on biochar and activated carbon with the aid of machine learning. J. Hazard. Mater. 2022, 423, 127060. [Google Scholar] [CrossRef]
Wang, R.P.; Chen, H.L.; He, Z.X.; Zhang, S.Y.; Wang, K.; Ren, N.Q.; Ho, S.H. Discovery of an end-to-end pattern for contaminant-oriented advanced oxidation processes catalyzed by biochar with explainable machine learning. Environ. Sci. Technol. 2024, 58, 16867–16876. [Google Scholar] [CrossRef]
Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. 2023, 622, 178–210. [Google Scholar] [CrossRef]
Akhavan Anvari, M.; Rahmati, D.; Kumar, S. t-Distributed stochastic neighbor embedding. In Dimensionality Reduction in Machine Learning; Elsevier: Amsterdam, The Netherlands, 2025; pp. 187–207. [Google Scholar] [CrossRef]
Li, J.; Zhu, X.Z.; Li, Y.A.; Tong, Y.W.; Ok, Y.S.; Wang, X.N. Multi-task prediction and optimization of hydrochar properties from high-moisture municipal solid waste: Application of machine learning on waste-to-resource. J. Clean. Prod. 2021, 278, 123928. [Google Scholar] [CrossRef]
Gao, W.; Xu, Y.; Chang, X.; Xu, X.; Li, N.; Yan, B.; Chen, G.; Duan, X. Machine learning-driven global optimization of single-atom catalyst-mediated advanced oxidation processes. Environ. Sci. Technol. 2025, 59, 24044–24054. [Google Scholar] [CrossRef] [PubMed]
Cheng, Z.W.; Yang, B.W.; Chen, Q.C.; Ji, W.C.; Shen, Z.M. Characteristics and difference of oxidation and coagulation mechanisms for the removal of organic compounds by quantum parameter analysis. Chem. Eng. J. 2018, 332, 351–360. [Google Scholar] [CrossRef]
Xiao, R.Y.; Ye, T.T.; Wei, Z.S.; Luo, S.; Yang, Z.H.; Spinney, R. Quantitative structure-activity relationship (QSAR) for the oxidation of trace organic contaminants by sulfate radical. Environ. Sci. Technol. 2015, 49, 13394–13402. [Google Scholar] [CrossRef]
Sun, Y.X.; Sun, P.H.; Jia, J.X.; Liu, Z.Y.; Huo, L.L.; Zhao, L.X.; Zhao, Y.N.; Niu, W.J.; Yao, Z.L. Machine learning in clarifying complex relationships: Biochar preparation procedures and capacitance characteristics. Chem. Eng. J. 2024, 485, 149975. [Google Scholar] [CrossRef]
Yang, M.; Li, S.; Deng, Y.M.; Baeyens, J.; Zhang, H.L. Effect of Fe-loading in iron-based catalysts for the CH₄ decomposition to H₂ and nanocarbons. J. Environ. Manag. 2023, 346, 118999. [Google Scholar] [CrossRef]
Lei, Y.X.; Guo, X.; Jiang, M.J.; Sun, W.; He, H.; Chen, Y.; Thummavichai, K.; Ola, O.; Zhu, Y.Q.; Wang, N.N. Co-ZIF reinforced cow manure biochar (CMB) as an effective peroxymonosulfate activator for degradation of carbamazepine. Appl. Catal. B-Environ. Energy 2022, 319, 121932. [Google Scholar] [CrossRef]
Liu, L.; Zhao, X.Y.; Ding, G.F.; Han, C.J.; Liu, J. Fe₃N sites anchored reduced graphene oxide activate peroxymonosulfate via singlet oxygen dominated process: Performance and mechanisms. Chem. Eng. J. 2023, 470, 143820. [Google Scholar] [CrossRef]
Li, C.Y.; Kang, L.; Wang, S.; Gao, B.; Gao, X.Q.; Shi, W.J.; Ai, S.Y.; Wu, J.C.; Sun, Y.Y. Enhanced PFOA degradation via non-radical PMS pathway through synergistic Fe single-atoms and Fe nanoparticles in Fe-N-C catalysts. Chem. Eng. J. 2025, 512, 162424. [Google Scholar] [CrossRef]
Shen, Y.; Shen, Q.C.; Zhu, C.; Lin, Y.J.; Fang, Q.L.; Liu, R.L.; Wang, J.; Lu, L.; Song, S. Pyridinic nitrogen-modulated carbon nanotubes facilitate dual-electron transfer for polymerization-enhanced pollutant removal. Appl. Catal. B-Environ. Energy 2025, 378, 125579. [Google Scholar] [CrossRef]
Duan, X.G.; O’Donnell, K.; Sun, H.Q.; Wang, Y.X.; Wang, S.B. Sulfur and nitrogen co-doped graphene for metal-free catalytic oxidation reactions. Small 2015, 11, 3036–3044. [Google Scholar] [CrossRef] [PubMed]
Wen, Q.; Wang, J.L. Degradation of sulfamethoxazole using peroxymonosulfate activated by nitrogen-doped biochar: Insights into the active sites and activation mechanism. Chem. Eng. J. 2025, 508, 161020. [Google Scholar] [CrossRef]
Wang, X.; Mao, K.Y.; Ma, J.Q.; Zhang, H.N.; Qian, Y.X.; Jin, H.X.; Zhang, K.F. Balanced adsorption and catalysis in nitrogen-doped biochar for efficient peroxymonosulfate activation and 4-chlorophenol removal via electron transfer pathways. Appl. Surf. Sci. 2026, 716, 164640. [Google Scholar] [CrossRef]
Wu, M.L.; Tao, Y.; Liu, Y.W.; Hou, S.Y.; Xiao, T.Y.; Fu, Q.; Peng, G.W.; Shi, L. Fe-N-coordinated graphene-like honeycomb porous carbon as an extremely effective catalyst for catalytic oxidation. Sep. Purif. Technol. 2025, 354, 129225. [Google Scholar] [CrossRef]
Chen, X.; Oh, W.D.; Hu, Z.T.; Sun, Y.M.; Webster, R.D.; Li, S.Z.; Lim, T.T. Enhancing sulfacetamide degradation by peroxymonosulfate activation with N-doped graphene produced through delicately-controlled nitrogen functionalization via tweaking thermal annealing processes. Appl. Catal. B-Environ. Energy 2018, 225, 243–257. [Google Scholar] [CrossRef]
Xu, L.; Fu, B.R.; Sun, Y.; Jin, P.K.; Bai, X.; Jin, X.; Shi, X.; Wang, Y.; Nie, S.T. Degradation of organic pollutants by Fe/N co-doped biochar via peroxymonosulfate activation: Synthesis, performance, mechanism and its potential for practical application. Chem. Eng. J. 2020, 400, 125870. [Google Scholar] [CrossRef]

Figure 1. Flowchart detailing the strategies of the machine learning framework to predict the reaction rate constants of the EC degradation process in the persulfate environment with an Fe–carbon catalyst.

Figure 2. Violin plots summarize the statistical properties of all features derived from the literature: properties of material (a), experimental conditions (b) and contaminant fingerprints (c).

Figure 3. Pearson correlation matrix between any two variables in the master dataset. Asterisks and red font represent p < 0.05.

Figure 4. Predictive performance of (a) KNN, (b) RF, (c) XGB, (d) LGBM, (e) CAB and (f) MLP models, trained on FCC characteristics, EC properties and reaction conditions as input features.

Figure 5. t-SNE and K-clustering of contaminant molecules (a); boxplots with EC properties (b); predictive performance of XGB model for predicting lg k based on (c) Model I dataset (c) and Model II data set (d).

Figure 6. ML-based feature importance k prediction from Model I based on SHAP values and proportion of empirical categories from pie chart presentations contributing to Model I (a); combined SHAP summary values (b); PDP plots for Fe content (c) and C content (d); 2D PDP plots of Fe content and C content (e). (Note: all k values are converted to lg k.)

Figure 7. ML-based feature importance k prediction from the Model II based on SHAP values and proportion of empirical categories from pie chart presentations contributing to Model I (a); combined SHAP summary values (b); PDP plots for N content (c); 2D PDP plots of catalyst concentration and PMS concentration (d); 2D PDP plots of N content and N/C (e). (Note: all k values are converted to lg k.)

Figure 8. Degradation performance of FCC-1 and FCC-2 for MB and p-NP. Conditions: Catalyst = 0.2 g/L, PMS = 0.5 mM, MB = 10 mg/L, p-NP = 10 mg/L, Tr = 25 °C, pH = 6.8 (a). The XRD spectrum of FCC-1 and FCC-2 (b). Fe 2p (c) and N 1s (d) of FCC-2 before and after catalytic reaction. Force plots for FCC-1 (e) and FCC-2 (f).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yan, C.; Xu, Z.; Xue, D.; Wang, J.; Lin, X.; Zhou, H.; Ma, B.; Zhang, H. DFT-Assisted Machine Learning for Global Optimization of Fe–Carbon Catalyst: Persulfate Activation and Targeted Removal of Emerging Contaminants. Catalysts 2026, 16, 444. https://doi.org/10.3390/catal16050444

AMA Style

Yan C, Xu Z, Xue D, Wang J, Lin X, Zhou H, Ma B, Zhang H. DFT-Assisted Machine Learning for Global Optimization of Fe–Carbon Catalyst: Persulfate Activation and Targeted Removal of Emerging Contaminants. Catalysts. 2026; 16(5):444. https://doi.org/10.3390/catal16050444

Chicago/Turabian Style

Yan, Changchun, Zhiqiang Xu, Dingming Xue, Jiaqing Wang, Xiaochen Lin, Hao Zhou, Bing Ma, and Houhu Zhang. 2026. "DFT-Assisted Machine Learning for Global Optimization of Fe–Carbon Catalyst: Persulfate Activation and Targeted Removal of Emerging Contaminants" Catalysts 16, no. 5: 444. https://doi.org/10.3390/catal16050444

APA Style

Yan, C., Xu, Z., Xue, D., Wang, J., Lin, X., Zhou, H., Ma, B., & Zhang, H. (2026). DFT-Assisted Machine Learning for Global Optimization of Fe–Carbon Catalyst: Persulfate Activation and Targeted Removal of Emerging Contaminants. Catalysts, 16(5), 444. https://doi.org/10.3390/catal16050444

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DFT-Assisted Machine Learning for Global Optimization of Fe–Carbon Catalyst: Persulfate Activation and Targeted Removal of Emerging Contaminants

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection and Preprocessing

2.2. Model Construction and Evaluation

2.3. Model Interpretability Method

2.4. Experimental Validation

3. Results and Discussion

3.1. Evaluation of Model Performance

3.2. Contaminant-Oriented Hierarchical Framework of Models

3.3. Critical Roles of Specific Features in Different Models

3.4. FCC Design Based on Machine Learning for EC Removal

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI