Performance Prediction of Perovskite-Catalyzed CO2 Decomposition Based on Machine-Learning Method

Chen, Jiayi; Wang, Kun; Xie, Huaqing; Ma, Kerong; Li, Kunlun

doi:10.3390/en19061388

Open AccessArticle

Performance Prediction of Perovskite-Catalyzed CO₂ Decomposition Based on Machine-Learning Method

by

Jiayi Chen

^1,2,

Kun Wang

^1,2,3,*,

Huaqing Xie

^2,3,

Kerong Ma

^1,3 and

Kunlun Li

^1,3

¹

National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Northeastern University, Shenyang 110819, China

²

State Environmental Protection Key Laboratory of Eco-Industry, School of Metallurgy, Northeastern University, Shenyang 110819, China

³

Key Laboratory of Data Analytics and Optimization for Smart Industry (Northeastern University), Ministry of Education, Shenyang 110819, China

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(6), 1388; https://doi.org/10.3390/en19061388

Submission received: 20 January 2026 / Revised: 23 February 2026 / Accepted: 7 March 2026 / Published: 10 March 2026

(This article belongs to the Special Issue Innovative Catalytic Approaches for Energy Conversion and Storage)

Download

Browse Figures

Versions Notes

Abstract

Perovskite oxides show excellent catalytic performance for thermochemical CO₂ splitting, with A/B-site cation substitution further enhancing redox activity. While traditional first-principles methods are computationally expensive, machine learning (ML) provides an efficient approach to perovskite optimization. In this paper, machine learning is employed to investigate and predict the performance of perovskite catalysts in CO₂ decomposition reactions. Based on 227 perovskite compositions (A₁A₂)(B₁B₂)O₃ curated from experimental literature, a total of five ML models are used, including Decision Tree, Bagging, Random Forest, Extra Trees, and Gradient Boosting Regression (GBR). The Random Forest model performed best. After hyperparameter optimization, the Random Forest model achieved an R² of 0.910 and an MAE of 41.528 on an independent test set. SHAP analysis indicated that the thermal reduction temperature (T₁) and the B1-site stoichiometric fraction (C_b₁) are the most influential features governing the predicted CO yield. A higher CO yield is predicted when C_b₁ ranges from 0.6 to 0.8, and T₁ exceeds 1300 °C. This behavior can be attributed to the enhanced formation of oxygen vacancies at elevated temperatures and the optimized electronic structure induced by appropriate B-site stoichiometry.

Keywords:

machine learning; material design; perovskites; CO yield; SHAP

1. Introduction

In recent years, greenhouse gas emissions, especially carbon dioxide (CO₂) produced by the combustion of fossil fuels, have become a major driver of global climate change [1,2]. To cope with the increasingly severe climate crisis, researchers have proposed a variety of CO₂ decomposition technologies, including photocatalysis [3,4], biocatalysis [5,6], electrocatalysis [7], and the solar thermochemical cycle [8,9]. To effectively reduce CO₂ emissions and regulate CO₂ concentration in the atmosphere, two-step thermochemical cycling reactions based on metal oxides, known as the chemical looping process, have received extensive attention in recent years. This is due to their harmless products, facile gas separation process, and relatively simple operating conditions [10,11]. This two-step thermochemical cycle reaction divides the CO₂ decomposition process into two stages, both carried out at high temperatures. First, at a high temperature (T₁), the metal oxide undergoes thermal reduction, releasing oxygen as shown in Equation (1). Then, when the reduced metal oxide reacts with CO₂ through an oxidation reaction, it generates CO. At a lower temperature (T₂), the reduced metal oxide is re-oxidized by CO₂, producing CO and regenerating the original oxide phase, as shown in Equation (2). This process not only effectively decomposes CO₂ but also offers advantages such as flexible operation and controllable reaction conditions.

Thermal Reduction (TR) Step:

M O_{x} \to M O_{x - δ} + \frac{δ}{2} O_{2}

(1)

Carbon Dioxide Splitting (CDS) Step:

M O_{x - δ} + δ C O_{2} \to M O_{x} + δ C O

(2)

where MO_x represents the metal oxide and δ represents the non-stoichiometric oxygen content within the metal oxide.

Current studies have mainly focused on developing and evaluating redox materials (oxygen carriers) that exhibit high performance in thermochemical cycles to enhance CO production [12]. In this reaction system, non-stoichiometric metal oxides, such as ceria-based materials (CeO₂/CeO_2-δ, doped CeO₂), and perovskite-structured ABO₃/ABO_3-δ [13], have attracted significant research attention. Perovskite oxides have attracted much attention due to their excellent redox properties. In perovskite oxides, the A site is occupied by cations with larger ionic radii, such as alkali and alkaline earth metals and lanthanide elements. The B site is usually a cation with a smaller ionic radius, generally including transition metal elements, Al, Sn, and others [14,15]. Generally, A-site cations contribute to structural stability, while B-site cations control material reactivity [16]. Both A- and B-site cations can be replaced. This substitution allows the perovskite structure to adopt diverse combinations, which helps regulate the non-stoichiometric oxygen content and its kinetic and thermodynamic properties. In addition, perovskite oxides can remain solid in a two-step thermochemical process and exhibit good structural stability at high temperatures, making them ideal redox materials.

The catalytic activity of perovskite oxides mainly depends on the B-site element; the A-site regulates the valence state and dispersion state of the B-site element [17]. By regulating the partial replacement of the A- and B-site cations, studies have shown that the CO yield capacity of perovskite oxides in the thermochemical CO₂ decomposition reaction is significantly improved [18,19,20,21,22]. To further optimize the redox properties of perovskite oxides, researchers worldwide have conducted extensive redox activity evaluations on a variety of perovskite materials [23,24,25,26,27,28,29,30]. McDaniel et al. [25] demonstrated that perovskites with partial substitution of Al for Mn at B sites can efficiently decompose CO₂. This process achieves a high yield and fast reaction speed. Perovskite materials with different combinations of B-site cations, such as La(B₁, B₂)O₃ (where B₁ and B₂ are Mn, Co, Ni), or (A₁A₂)(B₁B₂)O₃ (where A₁ and A₂ are La, Sr, Ca, and B₁ and B₂ are Mn, Fe, Co), showed a significant increase in CO yield. This improvement was achieved through the partial displacement of A- and B-site cations. Therefore, studying the effect of partial replacement of the A–B cation combination on the redox performance of perovskites remains an important approach to improving the redox activity of materials. However, the existing systematic studies are limited, and most studies rely on first-principles calculations and experimental methods. Although these methods reveal the potential properties of perovskite materials, they face challenges such as large calculations, limited data, and low experimental efficiency. Therefore, efficiently screening for perovskite materials with excellent properties remains a key issue in current research.

With the continuous progress of science and technology, the application of materials science in many fields is becoming increasingly important. Traditional materials research and development rely on trial-and-error, which consumes substantial experimental time and resources and limits the speed of discovery and design of new materials. To improve R&D efficiency, scientists have introduced the concept of material design. This approach predicts and designs materials with specific properties by integrating theoretical calculations from computer simulation, materials science, chemistry, and physics. As a result, it reduces the trial-and-error process and enhances R&D efficiency [31]. The advantages of material design include improving R&D efficiency, reducing costs and time, avoiding invalid experiments, and aligning experimental results with expectations.

Through computer simulations, researchers can simulate various experimental conditions, such as temperature, pressure, chemical environment, etc., to test the properties and behavior of materials in a virtual environment. This method not only quickly screens out potential high-performance materials but also optimizes experimental parameters and reduces uncertainties in actual experiments. The information age, particularly the advancement of big data technology, enables the rapid collection and sharing of data. This provides new opportunities for material researchers, accelerates material design research and development, and enhances the accuracy and reliability of material performance prediction. In recent years, machine-learning (ML) technology has brought breakthroughs to material design. As a powerful data mining tool, ML can identify potential laws from a large amount of experimental data and optimize the prediction and design of material properties. By constructing an efficient model based on material data, ML can accurately predict the target performance of undetermined samples [32]. It also reveals the relationship between material structure and physical and chemical properties, screens out high-performance materials that meet specific needs [31], and identifies the optimal experimental conditions.

To the best of our knowledge, ML-based prediction of CO yield in thermochemical CO₂ splitting using ABO₃ perovskites remains limited, and this work provides a practical framework to bridge experimental datasets and catalyst design. This study investigates the feasibility of applying machine learning (ML) to a two-step thermochemical CO₂ splitting process to predict experimental CO yields for ABO₃ perovskite catalysts. By integrating experimental data with feature engineering and model development, we establish data-driven models to (i) accelerate catalyst screening, (ii) optimize reaction conditions, and (iii) identify the key factors governing CO production. Compared with conventional trial-and-error experimentation, the proposed ML workflow—data collection, feature engineering, model selection, validation, and application—provides a systematic pathway to quantify the coupled effects of material composition and operating conditions, thereby supporting the rational design of highly redox-active perovskite oxides.

2. Datasets and Methods

2.1. Data Collection

During the dataset collection process, perovskite, thermochemical decomposition, and CO₂ decomposition were used as keywords. Data for 227 groups of (A₁A₂)(B₁B₂)O₃ perovskite materials, as of 2024, were collected through databases such as ScienceDirect and Web of Science. The data covers parameters including perovskite composition, CO yield, CO₂ partial pressure, CO₂ gas flow rate, reaction temperature, and reaction time. We collected 44 atomic parameters from the WebElements website as initial feature variables. Ensuring the reliability and comparability of the data is crucial to the model’s plausibility. Therefore, in addition to 44 atomic parameters, this paper also uses the experimental conditions of each sample as additional characteristic variables to reduce the influence of experimental condition differences on the ML model. To ensure high data quality, we applied strict inclusion and exclusion criteria for the dataset. The inclusion criteria require that composition, T₁, T₂, P_CO2, t₁, t₂, and gas flow rate (GF) be reported, along with CO yield in convertible units. The exclusion criteria involve removing samples with unclear or non-convertible units, missing key variables such as T₁/T₂, anomalies in duplicate measurements under the same experimental conditions, and obvious data entry errors. Additionally, we harmonized the units for consistency, converting CO yield to μmol/g-material, and standardized units for temperature, time, pressure, and flow.

We also acknowledge the potential publication bias in the literature-derived datasets, where studies with successful or high-yield results are more likely to be published, potentially skewing the CO yield data higher. This workflow is shown in Figure 1.

2.2. ML Model

In this study, we used five ML models: Decision Tree, Bagging, Random Forest, Extra Trees, and Gradient Boosting Regression (GBR). Decision Trees predict by recursively segmenting data, but they are prone to overfitting. Bagging trains multiple models by resampling, reducing variance, and improving stability [33]. Random Forest introduces random feature selection based on Bagging to reduce the risk of overfitting [34]. Extra Trees further simplifies the splitting process [35], improving training speed but possibly slightly reducing accuracy. GBR is suitable for handling complex nonlinear relationships by iteratively constructing weak learners and correcting errors [36]. Each of these models has its advantages and can meet different forecasting needs.

2.3. Modeling Procedure

During the dataset collection phase, 55 parameters were collected, including perovskite reaction conditions and atomic properties. After data preprocessing, a two-stage feature selection strategy was adopted to reduce redundancy and improve model generalization.

Stage 1: mRMR ranking and preliminary redundancy removal. We first applied maximum relevance minimum redundancy (mRMR) to rank all candidate features by maximizing relevance to the target while minimizing redundancy among selected features, yielding an ordered feature list. We then examined pairwise correlations using the Pearson correlation coefficient matrix; for any feature pair with an absolute Pearson correlation coefficient exceeding 0.90, one of the two features was removed to mitigate multicollinearity.

Stage 2: Embedded feature selection combined with mRMR. To further refine the feature subset, an embedded feature selection scheme was combined with the mRMR ranking. Specifically, the number of selected features, K, was varied from 1 to the number of remaining features after Stage 1. For each K, the top-K features from the mRMR ranking were used as model inputs, and five tree-based learners (Decision Tree, Bagging, Random Forest, Extra Trees, and Gradient Boosting Regression) were employed as base models to evaluate predictive performance under different K values.

The dataset was split into training and test sets at a ratio of 9:1. A 10-fold cross-validation was conducted on the training set: the training data were divided into 10 folds, each fold was used once as the validation set, and the remaining folds were used for training [37]. The average performance across the 10 folds was used as the cross-validated evaluation of each model. Model performance was assessed using the coefficient of determination (R²) and error metrics, including mean absolute error (MAE) and mean absolute percentage error (MAPE). The optimal feature subset was determined by jointly considering a higher R² and lower MAE/MAPE.

For regression performance evaluation, R² measures the goodness of fit of the model and is calculated as follows:

R^{2} = 1 - \frac{S S_{res}}{S S_{t o t}}

(3)

where SS_res represents the residual sum of squares and SS_tot represents the total sum of squares.

MAE measures the average absolute difference between predicted and observed values and is defined as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} | {\hat{y}}_{i} - y_{i} |

(4)

where

{\hat{y}}_{i}

represents the predicted value and

y_{i}

represents the actual value.

MAPE quantifies the average relative error between predicted and observed values and is defined as follows:

M A P E = \frac{1}{n} \sum_{i = 1}^{n} | \frac{{\hat{y}}_{i} - y_{i}}{y_{i}} | \times 100 %

(5)

where a smaller MAPE indicates better predictive accuracy in relative terms.

2.4. SHAP Method

The SHAP method is an efficient framework for explaining the contributions of input features to any prediction algorithm. SHAP values calculated by game theory quantify the influence of each feature on the model’s prediction, revealing how each feature changes the model’s expected output [38,39]. Combined with ML models, the SHAP method can be used to explore the correlation between input features and output features [40]. Compared with the traditional sensitivity analysis method, the SHAP method can provide local interpretability and identify the significant impact of each data point on the prediction result. Additionally, it can evaluate the overall contribution of each feature to the output feature by accumulating the absolute value of the SHAP value of each data point [41].

3. Results and Discussion

3.1. Establishment and Preprocessing of the Dataset

In this study, we constructed a dataset containing 55 independent variables as initial characteristic variables. These independent variables include 44 atomic parameters and seven experimental conditions, and four characteristic variables are used to express the composition of atomic numbers (x = a₁, a₂, b₁, b₂). The performance evaluation index is the CO yield. The complete list of initial features is provided in Table 1 and Table 2. Figure 2 presents violin plots of the experimental conditions and the CO yield distribution. Notably, the CO yield (Figure 2h) shows a right-skewed, long-tailed distribution. Approximately 68.8% of the samples are concentrated in the low-yield range (<300 μmol/g), suggesting that the model is trained with sufficient data in this region and is therefore expected to provide more reliable predictions for low-to-mid CO yields. Classical geometric descriptors, such as the Goldschmidt tolerance factor (τ) and the octahedral factor, are widely used to characterize perovskite formability. In this study, the feature set already includes A/B-site ionic radii, which implicitly contain the geometric information required to compute these descriptors; moreover, defining a unique τ for multi-site substituted (A₁A₂)(B₁B₂)O₃ compositions can be ambiguous. As a sanity check, the calculated τ values of the test samples fall within the commonly reported stability range (0.8–1.05) (Supplementary Table S3).

Catalyst surface area (e.g., BET) is also important for accessible active sites, but it is not consistently reported in the literature sources used to build the dataset; therefore, it was not included as an input feature. Future work will systematically collect surface-area data to enrich the feature set.

3.2. Feature Correlation Analysis and Screening

Using the mRMR method, we calculated the correlation between each feature and the target variable, as well as the feature redundancy. After comprehensively considering the correlation and redundancy, the features were ranked according to the score (as shown in Figure 3). Subsequently, we further screened using a Pearson correlation coefficient matrix to measure the linear correlation between the two sets of data. These data were calculated from the covariance and standard deviation of the two variables. The calculated values ranged from −1 to 1, reflecting a linear relationship. A positive value means a positive correlation, and vice versa. The corresponding coefficient values

P (X, Y)

between the input features are calculated as follows:

P (X, Y) = \frac{cov (X, Y)}{σ_{x} σ_{y}}

(6)

where P(X, Y) represents the Pearson correlation coefficient between the input features; cov(X, Y) represents the covariance, and σ_xσ_y represents the standard deviation of the input parameters X and Y, respectively.

The correlation between features and target variables was calculated using mRMR; redundant features (threshold > 0.9) were removed using the Pearson correlation coefficient matrix; and finally, 34 features were retained. The correlation matrices before and after the preliminary feature screening are shown in Figure 4, and the results indicate that the correlation between these features is low.

To further improve the performance and generalization ability of the model, this study uses the embedded method combined with the mRMR algorithm for the second feature screening. We select five ML algorithms: Decision Trees, Bagging, Random Forest, Extra Tree, and GBR as the basic models of embedded feature screening. In the feature screening process, first, for each value of K (ranging from 1 to 34), the mRMR algorithm is used to screen features. For each K, a corresponding top-K subset was generated by mRMR and evaluated by the embedded models. The relationships between R2 and K for the five ML models are shown in Figure 5, and Table 3 lists the final screening feature variables for each model. Using this method, the feature combinations that most significantly improve the model’s performance can be effectively screened out, providing a reliable feature set for subsequent modeling.

Competitive validation of the feature-selection scheme. To demonstrate that the proposed three-step feature selection is not an arbitrary choice, we performed a controlled comparison using the same Random Forest model with identical hyperparameters and the same test split. Only the T1 feature-selection strategy was changed among four schemes: using all 55 features, the proposed three-step procedure, LASSO selection, and RFE selection. As shown in Table S4, the proposed three-step scheme achieves the best test performance (highest R² and lowest MAE), indicating that it is a competitive and reproducible choice for subsequent modeling.

3.3. Model Selection

A two-stage model selection workflow was adopted in this study. First, after feature selection, we conducted 10-fold cross-validation across five commonly used regression algorithms (Decision Tree, Bagging, Random Forest, Extra Trees, and Gradient Boosting Regressor, GBR) to establish a baseline. Second, the most promising and robust model was selected for further hyperparameter optimization.

The cross-validation results are summarized in Table 4 and Figure 6. Extra Trees (R² = 0.880, MAE = 44.789) and Decision Tree (R² = 0.872, MAE = 44.296) achieved strong cross-validated performance; however, both models showed an almost perfect fit on the training data (training R² = 0.997), suggesting a larger risk of overfitting and sensitivity to data partitioning. Bagging performed worse overall (R² = 0.741, MAE = 57.796) and was therefore not selected. GBR exhibited relatively weak performance on this dataset (R² = 0.766, MAE = 70.427). We attribute this, at least in part, to the dataset characteristics and the sequential boosting mechanism: the CO-yield distribution is strongly right-skewed and long-tailed, and extremely high-yield samples may exert disproportionate influence during iterative residual fitting, thereby impairing generalization in small datasets. In contrast, Random Forest relies on parallel bagging-based aggregation and is often more robust under limited sample sizes and imbalanced target distributions.

Considering both predictive performance and robustness, Random Forest was selected as the final model and subsequently optimized using an evolutionary algorithm to further improve performance.

3.4. Hyperparameter Optimization

Hyperparameters [42] are parameters in ML algorithms that are not learned directly from the training data and therefore need to be specified or tuned before or during model training. The selection of hyperparameters has a decisive influence on the performance and generalization ability of the model and is an indispensable key link in the training process of ML models [43]. The evolutionary algorithm is a global optimization algorithm based on natural selection and genetic mechanisms, and has been widely used for hyperparameter optimization of ML models in recent years. Its core idea is to gradually search for the optimal hyperparameter combination by simulating selection, crossover, and mutation during biological evolution (as shown in Figure 7). The advantage of the evolutionary algorithm lies in its powerful global search ability, which can effectively avoid falling into the local optimal solution.

Hyperparameters of the Random Forest model were optimized using an evolutionary algorithm to reduce manual tuning and improve generalization. Each individual in the population represents a candidate set of hyperparameters, and model performance under 10-fold cross-validation was used as the fitness criterion. The algorithm iteratively updates the population through selection, crossover, and mutation until convergence or the maximum number of generations is reached. In this study, the population size was set to 50, the crossover probability to 0.7, the mutation probability to 0.3, and the maximum number of generations to 60. The optimal hyperparameter configuration obtained by the evolutionary search is reported in Table 5, while additional implementation details are provided in the Supplementary Material.

3.5. Model Validation

In this study, the dataset was divided into a training set and a test set using a 9:1 hold-out strategy. Feature engineering, model selection, and hyperparameter optimization were conducted on the training set, and the remaining 10% of samples were used to evaluate generalization performance. The model performance before and after optimization is summarized in Table 6.

After optimization, the Random Forest model achieved R² = 0.996 and MAE = 7.700 μmol/g on the training set, and R² = 0.910 and MAE = 41.528 μmol/g on the test set, indicating improved test performance but also a remaining train–test gap.

To assess robustness and provide uncertainty estimates, we repeated the 9:1 split using 10 independent random seeds. The model showed stable performance across splits, with a mean test R² = 0.942 ± 0.010 and a mean test MAE = 49.27 ± 7.21 μmol/g, corresponding to 95% confidence intervals of [0.935, 0.948] (R²) and [44.11, 54.42] μmol/g (MAE). Figure 8 compares the predicted and experimental CO yields on the test set, demonstrating the predictive performance of the optimized model. Figure 9 presents the learning curve, which further illustrates the model’s training behavior and generalization ability. Table 7 compares the experimental and predicted CO yields for six representative test samples to provide an intuitive illustration of the prediction error. These examples span low- and high-yield regimes, showing that the optimized model captures the overall CO yield trend with reasonable accuracy.

To further assess the reliability of the optimized Random Forest model at different CO yield levels, test samples were grouped into CO yield intervals, and the mean absolute percentage error (MAPE) was calculated for each interval. MAPE complements MAE by reflecting relative deviation, but it may be amplified when measured yields are close to zero.

Results show that samples with CO yields > 100 μmol/g account for 82.8% of the dataset, with MAPE ranging from 0.47% to 7.31%, indicating strong reliability in the dominant yield range. The low-yield interval (0–100 μmol/g) exhibits larger percentage errors, likely due to the small-denominator effect and higher uncertainty. Overall, the optimized model is robust for most samples. The interval-wise MAPE results are provided in the Supplementary Material (Table S5).

3.6. SHAP-Based Model Interpretation

To further explore the correlation between input features and CO generation, we use SHAP to interpret the predictions of the optimized Random Forest model. This allows us to analyze the importance and contribution of each input feature to the model’s predictions. The importance of features is shown in Figure 10. The results show that the T₁ and the C_b₁ have the most significant effects on CO formation. Secondly, there are three key reaction conditions: t₁, t₂, and HR.

Given the limited contribution of lower-ranked variables, we summarize the top 10 globally important features using a SHAP summary (beeswarm) plot (Figure 11), which shows how each feature affects the predicted CO yield across all samples. The x-axis denotes SHAP values (negative to positive impact on the model output), each point represents one sample, and the color encodes the feature value (red: high; blue: low).

As shown in Figure 11, T₁ exhibits the widest SHAP-value range, confirming it as the dominant predictor: high T₁ values are mainly associated with positive contributions, whereas low T₁ values concentrate in the negative region, indicating that increasing T₁ generally increases the predicted CO yield. C_b₁ and t₁ are the next most influential features and exhibit similar monotonic behavior: higher values contribute positively, and lower values contribute negatively. Several features (e.g., t₂ and IEs_b₂) display asymmetric SHAP distributions, suggesting conditional or threshold-like effects rather than a purely linear relationship. To evaluate whether such asymmetry reflects genuine interactions or sparse-data artifacts, we further computed pairwise SHAP interaction strengths and ranked all feature pairs; the results are provided in the Supplementary Material (Figure S2). In addition, HR and P_CO2 exhibit compact, near-symmetric SHAP distributions, indicating relatively stable, weaker marginal effects within the investigated range.

To further elucidate the mechanistic roles of the key features, we combine SHAP analysis results with experimental observations and discuss the underlying effects from the perspective of catalytic reaction kinetics and the electronic structure of perovskite materials. Figure 12 shows the SHAP dependence plots for T₁ and C_b₁, illustrating how the predicted CO yield varies with each feature across all samples.

From the SHAP dependence graph, it can be seen that the contribution of T₁ to the model prediction presents a nonlinear trend. When T₁ < 1300 °C, the SHAP values are negative, indicating that lower temperatures contribute less to CO generation and may even play an inhibitory role. When T₁ > 1300 °C, the SHAP value gradually turns positive, indicating that higher temperatures can promote CO generation. This trend has been verified in the experimental data. Combined with the changes in CO yield at different T₁ temperatures shown in Figure 13, it can be observed that CO yield initially rises and then falls as the temperature increases. To illustrate the influence of T₁ more clearly, Figure 14 presents CO yield data for various perovskites in the database, with temperature as the independent variable and CO yield as the dependent variable, while keeping other experimental conditions constant. When T₁ increases from 1000 °C to 1400 °C, the CO yield shows an upward trend. For example, taking La_0.7Sr_0.3Mn_0.9Cr_0.1O₃, the CO yield peaks at 228 μmol/g when T₁ = 1400 °C. This phenomenon can be explained by oxygen vacancy kinetics: an increase in the thermal reduction reaction temperature (T₁) accelerates the desorption of lattice oxygen, forming more oxygen vacancies. Oxygen vacancies are key active sites for CO₂ adsorption and activation, and their concentration directly affects the reduction efficiency of CO₂ [47,48,49]. However, excessively high T₁ (>1400 °C) may lead to material sintering or structural collapse, reducing the stability of active sites. The high SHAP values of T₁ in the positive contribution region confirm the importance of its optimization range.

Similarly, when C_b₁ is between 0.6 and 0.8, it shows a more significant SHAP value (Figure 12), indicating that the promotion of CO yield is most pronounced in this range. This aligns with the trend of CO yield changing with C_b₁, as shown in Figure 13. When C_b₁ is adjusted within the range of 0–1, CO yield first increases and then decreases, reaching its peak when the stoichiometric number of B₁ sites is 0.6–0.8. Experimental data in Figure 14 demonstrate that, under constant experimental conditions, with C_b₁ as the independent variable and CO yield as the dependent variable, the CO yield of Sr_0.6Ce_0.4Mn_0.8Al_0.2O₃ (C_b₁ = 0.8) is significantly higher (799.34 μmol/g) compared to Sr_0.6Ce_0.4Mn_0.2Al_0.8O₃ (C_b₁ = 0.2) (302.61 μmol/g). During the CO₂ reduction process, the stoichiometric ratio of B₁ sites significantly influences the catalyst’s electronic structure, oxygen vacancy concentration, and the distribution of surface-active sites [50]. When C_b₁ is between 0.6 and 0.8, perovskite oxides can form an appropriate amount of oxygen vacancies, which not only serve as adsorption and activation centers for CO₂ molecules but also optimize the stability of CO₂ reduction intermediates (such as COOH) by providing additional charge transport pathways, thereby accelerating CO generation. Additionally, the electronic structure adjustment at the B1 site may optimize the electron filling of the d-orbital, facilitating CO₂ adsorption on the catalyst surface and promoting C-O bond cleavage, thereby improving the conversion efficiency of CO₂ to CO.

Figure 15 shows the change in CO yield under the combined influence of T₁ and C_b₁. When T₁ > 1300 °C and C_b₁ is between 0.6 and 0.8, the CO yield reaches its peak. This indicates that high temperature, in combination with the appropriate stoichiometric number of the B₁ site, promotes CO yield. The synergistic effect between T₁ and C_b₁ arises from the enhanced oxygen vacancy formation at high temperatures (T₁ > 1300 °C), combined with the optimized electronic structure of the B₁ site (C_b₁ = 0.6–0.8), which collectively facilitates CO₂ adsorption and C–O bond cleavage. This finding is consistent with the conclusion drawn from the SHAP dependence plot.

It should be noted that SHAP values reflect associations learned from the available dataset and do not establish causality. The identified key features may also act as proxies for unmeasured factors (e.g., morphology, synthesis atmosphere, or correlated compositional descriptors). To provide an additional, model-consistent visualization of the interaction highlighted by SHAP, we include a two-dimensional partial dependence plot (2D PDP) for T₁ and C_b₁ (Figure 16). The 2D PDP shows that CO yield is predicted to increase markedly when T₁ exceeds 1300 °C and C_b₁ falls within approximately 0.6–0.8, indicating a strong coupled effect between reaction temperature and the B-site stoichiometric descriptor. This interaction-based visualization supports SHAP interpretation by revealing the high-yield region of the model’s response surface, while causal mechanisms remain to be confirmed through controlled experiments.

4. Conclusions

In this work, we curated 227 literature-reported ABO₃ perovskite catalysts for thermochemical CO₂ splitting and constructed a dataset containing compositional descriptors, atomic parameters, and experimental conditions. After feature screening and embedded selection, Random Forest was identified as the most robust model among the five evaluated regressors. Evolutionary hyperparameter optimization further improved generalization, yielding a test-set performance of R² = 0.910 and MAE = 41.528 μmol/g-material. SHAP and partial dependence analyses consistently indicate that the thermal reduction temperature (T₁) and the B1-site stoichiometric fraction (C_b₁) dominate the model predictions, with higher CO yields associated with T₁ > 1300 °C and C_b₁ ≈ 0.6–0.8. These results provide a data-driven basis for prioritizing perovskite compositions and operating conditions in catalyst screening, while mechanistic causality should be further validated experimentally.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/en19061388/s1. Table S1. Key dependencies for machine learning. Table S2. AB-site cation atomic parameters. Table S3. Summary statistics of the Goldschmidt tolerance factor (τ) for the test set. Table S4. Competitive comparison of feature-selection schemes using a fixed Random Forest model. Table S5. Interval-wise MAPE (%) for the optimized Random Forest model. Figure S1. SHAP analysis of other important features. Figure S2. Top 15 pairwise SHAP interaction strengths measured by mean (|ϕij|).

Author Contributions

Conceptualization, K.W.; Methodology, H.X.; Formal analysis, J.C.; Investigation, K.L.; Data curation, K.M.; Writing—original draft, J.C.; Writing—review and editing, K.W.; Funding acquisition, K.W. and H.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (52276102 and 52476181).

Data Availability Statement

The data supporting the findings of this study, along with the code used for model training and evaluation, are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Sarkodie, S.A.; Owusu, P.A.; Leirvik, T. Global Effect of Urban Sprawl, Industrialization, Trade and Economic Development on Carbon Dioxide Emissions. Environ. Res. Lett. 2020, 15, 34049. [Google Scholar] [CrossRef]
Capellán-Pérez, I.; Arto, I.; Polanco-Martínez, J.M.; González-Eguino, M.; Neumann, M.B. Likelihood of Climate Change Pathways Under Uncertainty on Fossil Fuel Resource Availability. Energy Environ. Sci. 2016, 9, 2482–2496. [Google Scholar] [CrossRef]
Li, D.; Kassymova, M.; Cai, X.; Zang, S.-Q.; Jiang, H.-L. Photocatalytic CO₂ Reduction over Metal-Organic Framework-Based Materials. Coord. Chem. Rev. 2020, 412, 213262. [Google Scholar] [CrossRef]
Ola, O.; Maroto-Valer, M.M. Review of Material Design and Reactor Engineering on TiO₂ Photocatalysis for CO₂ Reduction. J. Photochem. Photobiol. C 2015, 24, 16–42. [Google Scholar] [CrossRef]
Erb, T.J.; Zarzycki, J. Biochemical and Synthetic Biology Approaches to Improve Photosynthetic CO₂-Fixation. Curr. Opin. Chem. Biol. 2016, 34, 72–79. [Google Scholar] [CrossRef]
Appel, A.M.; Bercaw, J.E.; Bocarsly, A.B.; Dobbek, H.; DuBois, D.L.; Dupuis, M.; Ferry, J.G.; Fujita, E.; Hille, R.; Kenis, P.J.A.; et al. Frontiers, Opportunities, and Challenges in Biochemical and Chemical Catalysis of CO₂ Fixation. Chem. Rev. 2013, 113, 6621–6658. [Google Scholar] [CrossRef]
Qiao, J.L.; Liu, Y.Y.; Hong, F.F.; Zhang, J. A Review of Catalysts for the Electroreduction of Carbon Dioxide to Produce Low-Carbon Fuels. Chem. Soc. Rev. 2014, 43, 631–675. [Google Scholar] [CrossRef]
Hao, Y.; Steinfeld, A. Fuels from Water, CO₂ and Solar Energy. Sci. Bull. 2017, 62, 1099–1101. [Google Scholar] [CrossRef]
Kodama, T. High-Temperature Solar Chemistry for Converting Solar Heat to Chemical Fuels. Prog. Energy Combust. Sci. 2003, 29, 567–597. [Google Scholar] [CrossRef]
Yadav, D.; Banerjee, R. A Review of Solar Thermochemical Processes. Renew. Sustain. Energy Rev. 2016, 54, 497–532. [Google Scholar] [CrossRef]
Loutzenhiser, P.G.; Meier, A.; Steinfeld, A. Review of the Two-Step H₂O/CO₂-Splitting Solar Thermochemical Cycle Based on Zn/ZnO Redox Reactions. Materials 2010, 3, 4922–4938. [Google Scholar] [PubMed]
Scheffe, J.R.; Steinfeld, A. Oxygen Exchange Materials for Solar Thermochemical Splitting of H₂O and CO₂: A Review. Mater. Today 2014, 17, 341–348. [Google Scholar] [CrossRef]
Riaz, A.; Kreider, P.; Kremer, F.; Tabassum, H.; Yeoh, J.S.; Lipiński, W.; Lowe, A. Electrospun Manganese-Based Perovskites as Efficient Oxygen Exchange Redox Materials for Improved Solar Thermochemical CO₂ Splitting. ACS Appl. Energy Mater. 2019, 2, 2494–2505. [Google Scholar] [CrossRef]
Tabish, A.; Varghese, A.M.; Wahab, M.A.; Karanikolos, G.N. Perovskites in the Energy Grid and CO₂ Conversion: Current Context and Future Directions. Catalysts 2020, 10, 95. [Google Scholar] [CrossRef]
Zhang, K.; Sunarso, J.; Shao, Z.; Zhou, W.; Sun, C.; Wang, S.; Liu, S. Research Progress and Materials Selection Guidelines on Mixed Conducting Perovskite-Type Ceramic Membranes for Oxygen Production. RSC Adv. 2011, 1, 1661–1676. [Google Scholar] [CrossRef]
Nair, M.M.; Abanades, S. Correlating Oxygen with Thermochemical CO2-Splitting Efficiency in A-Site Substituted Manganite Perovskites. Sustain. Energy Fuels 2021, 5, 4570–4574. [Google Scholar] [CrossRef]
Jing, Y.; Aluru, N.R. The Role of A-Site Ion on Proton Diffusion in Perovskite Oxides (ABO₃). J. Power Sources 2020, 445, 227327. [Google Scholar] [CrossRef]
Nair, M.M.; Abanades, S. Experimental Screening of Perovskite Oxides as Efficient Redox Materials for Solar Thermochemical CO₂ Conversion. Sustain. Energy Fuels 2018, 2, 843–854. [Google Scholar] [CrossRef]
Dey, S.; Naidu, B.S.; Rao, C.N.R. Ln_0.5A_0.5MnO₃ (Ln = Lanthanide, A = Ca, Sr) Perovskites Exhibiting Remarkable Performance in the Thermochemical Generation of CO and H₂ from CO₂ and H₂O. Chem. Eur. J. 2015, 21, 7077–7081. [Google Scholar] [CrossRef]
McDaniel, A.H.; Ambrosini, A.; Coker, E.; Miller, J.; Chueh, W.; O’hAyre, R.; Tong, J. Nonstoichiometric Perovskite Oxides for Solar Thermochemical H₂ and CO Production. Energy Procedia 2014, 49, 2009–2018. [Google Scholar] [CrossRef]
Gokon, N.; Hara, K.; Ito, N.; Sawaguri, H.; Bellan, S.; Kodama, T.; Cho, H.-S. Thermochemical H₂O Splitting Using LaSrMnCrO₃ of Perovskite Oxides for Solar Hydrogen Production. In Proceedings of the SOLARPACES 2019: International Conference on Concentrating Solar Power and Chemical Energy Systems, Daegu, Republic of Korea, 1–4 October 2019; p. 170007. [Google Scholar]
Gokon, N.; Hara, K.; Sugiyama, Y.; Bellan, S.; Kodama, T.; Hyun-Seok, C. Thermochemical Two-Step Water Splitting Cycle Using Perovskite Oxides Based on LaSrMnO₃ Redox System for Solar H₂ Production. Thermochim. Acta 2019, 680, 178374. [Google Scholar] [CrossRef]
Demont, A.; Abanades, S. High Redox Activity of Sr-Substituted Lanthanum Manganite Perovskites for Two-Step Thermochemical Dissociation of CO₂. RSC Adv. 2014, 4, 54885–54891. [Google Scholar] [CrossRef]
Rao, C.N.R.; Dey, S. Generation of H₂ and CO by Solar Thermochemical Splitting of H₂O and CO₂ by Employing Metal Oxides. J. Solid State Chem. 2016, 242, 107–115. [Google Scholar] [CrossRef]
McDaniel, A.H.; Miller, E.C.; Arifin, D.; Ambrosini, A.; Coker, E.N.; O’Hayre, R.; Chueh, W.C.; Tong, J. Sr- and Mn-Doped LaAlO_3−δ for Solar Thermochemical H₂ and CO Production. Energy Environ. Sci. 2013, 6, 2424. [Google Scholar] [CrossRef]
Dey, S.; Naidu, B.S.; Rao, C.N.R. Beneficial Effects of Substituting Trivalent Ions in the B-Site of La_0.5Sr_0.5Mn_1−xA_xO₃ (A = Al, Ga, Sc) on the Thermochemical Generation of CO and H₂ from CO₂ and H₂O. Dalton Trans. 2016, 45, 2430–2435. [Google Scholar] [CrossRef] [PubMed]
Takalkar, G.; Bhosale, R.; AlMomani, F. Combustion Synthesized A_0.5Sr_0.5MnO_3−δ Perovskites (Where, A = La, Nd, Sm, Gd, Tb, Pr, Dy, and Y) as Redox Materials for Thermochemical Splitting of CO₂. Appl. Surf. Sci. 2019, 489, 80–91. [Google Scholar] [CrossRef]
Bork, A.H.; Kubicek, M.; Struzik, M.; Rupp, J.L.M. Perovskite La_0.6Sr_0.4Cr_1−xCo_xO_3−δ Solid Solutions for Solar-Thermochemical Fuel Production: Strategies to Lower the Operation Temperature. J. Mater. Chem. A 2015, 3, 15546–15557. [Google Scholar] [CrossRef]
Takalkar, G.; Bhosale, R.R.; AlMomani, F.; Kumar, A.; Banu, A.; Ashok, A.; Rashid, S.; Khraisheh, M.; Shakoor, A.; al Ashraf, A. Thermochemical Splitting of CO₂ Using Solution Combustion Synthesized LaMO₃ (Where, M = Co, Fe, Mn, Ni, Al, Cr, Sr). Appl. Surf. Sci. 2020, 509, 144908. [Google Scholar] [CrossRef]
Demont, A.; Abanades, S. Solar Thermochemical Conversion of CO₂ into Fuel via Two-Step Redox Cycling of Non-Stoichiometric Mn-Containing Perovskite Oxides. J. Mater. Chem. A 2015, 3, 3536–3546. [Google Scholar] [CrossRef]
Agrawal, A.; Choudhary, A. Perspective: Materials Informatics and Big Data: Realization of the “Fourth Paradigm” of Science in Materials Science. APL Mater. 2016, 4, 053208. [Google Scholar] [CrossRef]
Zhou, T.; Song, Z.; Sundmacher, K. Big Data Creates New Opportunities for Materials Research: A Review on Methods and Applications of Machine Learning for Materials Design. Engineering 2019, 5, 1017–1026. [Google Scholar] [CrossRef]
Sashidhar, D.; Kutz, J.N. Bagging, Optimized Dynamic Mode Splitting for Robust, Stable Forecasting with Spatial and Temporal Uncertainty Quantification. Philos. Trans. R. Soc. A 2022, 380, 20210198. [Google Scholar] [CrossRef] [PubMed]
Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947–1958. [Google Scholar] [CrossRef]
Melanson, D. Extremely Randomized Trees with Multiparty Computation. Ph.D. Thesis, University of Washington Tacoma, Tacoma, WA, USA, 2020. [Google Scholar]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Picard, R.R.; Cook, R.D. Cross-Validation of Regression Models. J. Am. Stat. Assoc. 1984, 79, 575–583. [Google Scholar] [CrossRef]
Vega García, M.; Aznarte, J.L. Shapley Additive Explanations for NO₂ Forecasting. Ecol. Inform. 2020, 56, 101039. [Google Scholar] [CrossRef]
Li, J.; Pan, L.; Suvarna, M.; Tong, Y.W.; Wang, X. Fuel Properties of Hydrochar and Pyrochar: Prediction and Exploration with Machine Learning. Appl. Energy 2020, 269, 115166. [Google Scholar] [CrossRef]
Zhu, X.; Wang, X.; Ok, Y.S. The Application of Machine Learning Methods for Prediction of Metal Sorption onto Biochars. J. Hazard. Mater. 2019, 378, 120727. [Google Scholar] [CrossRef]
Mu, L.; Wang, Z.; Wu, D.; Zhao, L.; Yin, H. Prediction and Evaluation of Fuel Properties of Hydrochar from Waste Solid Biomass: Machine Learning Algorithm Based on Proposed PSO–NN Model. Fuel 2022, 318, 123644. [Google Scholar] [CrossRef]
Sampson, J.R. Adaptation in Natural and Artificial Systems (John H. Holland). SIAM Rev. 1976, 18, 529–530. [Google Scholar] [CrossRef]
Carrillo, A.J.; Bork, A.H.; Moser, T.; Sediva, E.; Hood, Z.D.; Rupp, J.L.M. Modifying La_0.6Sr_0.4MnO₃ Perovskites with Cr Incorporation for Fast Isothermal CO₂-Splitting Kinetics in Solar-Driven Thermochemical Cycles. Adv. Energy Mater. 2019, 9, 1803886. [Google Scholar] [CrossRef]
Sawaguri, H.; Gokon, N.; Hayashi, K.; Iwamura, Y.; Yasuhara, D. Two-Step Thermochemical CO₂ Splitting Using Partially-Substituted Perovskite Oxides of La_0.7Sr_0.3Mn_0.9X_0.1O₃ for Solar Fuel Production. Front. Energy Res. 2022, 10, 872959. [Google Scholar] [CrossRef]
Luciani, G.; Landi, G.; Aronne, A.; Di Benedetto, A. Partial Substitution of B Cation in La_0.6Sr_0.4MnO₃ Perovskites: A Promising Strategy to Improve the Redox Properties Useful for Solar Thermochemical Water and Carbon Dioxide Splitting. Sol. Energy 2018, 171, 1–7. [Google Scholar] [CrossRef]
Yang, C.; Yin, L.-L.; Bebensee, F.; Buchholz, M.; Sezen, H.; Heissler, S.; Chen, J.; Nefedov, A.; Idriss, H.; Gong, X.-Q.; et al. Chemical Activity of Oxygen Vacancies on Ceria: A Combined Experimental and Theoretical Study on CeO₂(111). Phys. Chem. Chem. Phys. 2014, 16, 24165–24168. [Google Scholar] [CrossRef] [PubMed]
Schaub, R.; Thostrup, P.; Lopez, N.; Lægsgaard, E.; Stensgaard, I.; Nørskov, J.K.; Besenbacher, F. Oxygen Vacancies as Active Sites for Water Dissociation on Rutile TiO₂(110). Phys. Rev. Lett. 2001, 87, 266104. [Google Scholar] [CrossRef]
Daza, Y.A.; Maiti, D.; Kent, R.A.; Bhethanabotla, V.R.; Kuhn, J.N. Isothermal Reverse Water Gas Shift Chemical Looping on La_0.75Sr_0.25Co_1-xFeO₃ Perovskite-Type Oxides. Catal. Today 2015, 258, 691–698. [Google Scholar] [CrossRef]
Kozokaro, V.F.; Biswas, S.; Toroker, M.C. Changing Your Tune on Catalytic Efficiency: Tuning Cr Concentration in La_0.3Sr_0.7Fe_1-xCr_xO_3−δ Perovskite as a Cathode in Solid Oxide Electrolysis Cell. Comput. Mater. Sci. 2022, 210, 111462. [Google Scholar] [CrossRef]
Connor, B.A.; Smaha, R.W.; Li, J.; Gold-Parker, A.; Heyer, A.J.; Toney, M.F.; Lee, Y.S.; Karunadasa, H.I. Alloying a Single and a Double Perovskite: A Cu^+/2+ Mixed-Valence Layered Halide Perovskite with Strong Optical Absorption. Chem. Sci. 2021, 12, 8689–8697. [Google Scholar] [CrossRef]

Figure 1. ML algorithm flow in this work.

Figure 2. Violin plots of experimental conditions and CO yield distribution. (a) HR; (b) T₁; (c) T₂; (d) P_CO2; (e) t₁; (f) t₂; (g) GF; (h) CO Production.

Figure 3. Ranking of mRMR scores for 55 initial characteristic variables.

Figure 4. Pearson correlation matrices before and after preliminary screening (55 features vs. 34 retained features).

Figure 5. Feature selection results of five ML models.

Figure 6. Verification results of 10-fold cross-validation of five regression models. (a) Comparison of R² values of the five models; (b) Comparison of MAE values of the five models.

Figure 7. Evolutionary algorithm model.

Figure 8. Random Forest model performance before and after optimization.

Figure 9. Learning curve of the optimized Random Forest model based on multi-seed hold-out validation.

Figure 10. Importance ranking of Random Forest features based on SHAP.

Figure 11. SHAP value visualization of the Random Forest model. (a) SHAP summary plot of the top 10 features; (b) Top 10 feature importances of the Random Forest model.

Figure 12. Dependence plots of T₁ and C_b₁ features on CO yield. (a) SHAP dependence plot of T₁; (b) SHAP dependence plot of C_b₁.

Figure 13. LOESS scatter plots of T₁ and C_b₁. (a) Relationship between T₁ and CO yield; (b) Relationship between C_b₁ and CO yield. In each panel, the solid line represents the LOESS-fitted trend.

Figure 14. Partial experimental data plot of the effect of T₁ and C_b₁ on CO yield.

Figure 15. Interaction between T₁ and C_b₁ affecting the CO yield diagram.

Figure 16. Two-dimensional partial dependence plot (2D PDP) showing the interaction between T₁ and C_b₁ on the predicted CO yield.

Table 1. Atomic parameters and their physical significance.

No.	Characteristics	Physical Meaning
1	C_x	Stoichiometric number of X site (0–1)
2	Z_x	Atomic number of the X site
3	m_x	Relative atomic mass of the X site
4	rf_x	Van der Waals radius of the X site (pm)
5	rc_x	Covalent radius of the X site (pm)
6	X_x	Pauling electronegativity of the X site
7	IEf_x	First ionization energy of the X site (kJ/mol)
8	IEs_x	Second ionization energy of the X site (kJ/mol)
9	IEt_x	Third ionization energy of the X site (kJ/mol)
10	Hf_x	Fusion enthalpy of the X site (kJ/mol)
11	Hv_x	Vaporization enthalpy of the X site (kJ/mol)
12	Ha_x	Atomization enthalpy of the X site (kJ/mol)

Table 2. Experimental conditions and their physical significance.

No.	Characteristics	Physical Meaning
1	HR	Heating rate (°C/min)
2	T₁	TR temperature (°C)
3	T₂	CDS temperature (°C)
4	P_CO2	CO₂ partial pressure (atm)
5	t₁	TR duration (min)
6	t₂	CDS duration (min)
7	GF	CO₂ gas flowrate (mL/min)

Table 3. Optimal feature subsets selected by the embedded scheme for each regression model.

Algorithm	Number of Features (K)	Best Feature Subset
Decision Tree GBR Random Forest	29	rf_b₁, t₁, C_b₁, X_a₂, rc_b₂, Hf_a₂, rc_a₂, rc_b₁, Z_b₁, t₂, Hv_a₂, IEf_b₂, Z_a₂, IEf_b₁, Z_a₁, HR, rf_a₂, Z_b₂, IEs_b₂, T₂, C_a₁, Hv_b₂, T₁, IEs_a₁, Hf_b₂, IEf_a₂, X_b₁, P_CO₂, rf_b₂
Extra Trees	25	rf_b₁, t₁, C_b₁, X_a₂, rc_b₂, Hf_a₂, rc_a₂, rc_b₁, Z_b₁, t₂, Hv_a₂, IEf_b₂, Z_a₂, IEf_b₁, Z_a₁, HR, rf_a₂, Z_b₂, IEs_b₂, T₂, Hv_b₂, C_a₁, T₁, IEs_a₁, P_CO₂
Bagging	31	rf_b₁, t₁, C_b₁, X_a₂, rc_b₂, Hf_a₂, rc_a₂, rc_b₁, Z_b₁, t₂, Hv_a₂, IEf_b₂, Z_a₂, IEf_b₁, Z_a₁, HR, rf_a₂, Z_b₂, IEs_b₂, T₂, C_a₁, Hv_b₂, T₁, IEs_a₁, Hf_b₂, IEf_a₂, X_b₁, P_CO₂, rf_b₂, IEt_b₂, Hf_b₁

Table 4. 10-fold cross-validation results of different algorithms.

Algorithm	R²	MAE
Decision Tree	0.872	44.296
Bagging	0.741	57.796
Random Forest	0.870	49.758
Extra Trees	0.880	44.789
GBR	0.766	70.427

Table 5. Optimal hyperparameter values of the Random Forest model.

Parameter	Optimum Value
Max depth	20
Estimators	62
Min samples split	2
Min samples leaf	1
Max features	Sqrt
Bootstrap	False
Min impurity decrease	0.3
Random state	44

Table 6. Performance of the Random Forest model.

	R²	MAE
Pre-optimization training set	0.946	34.155
Pre-optimization test set	0.870	49.758
Training set after optimization	0.996	7.700
Test set after optimization	0.910	41.528

Table 7. Comparison between experimental and predicted CO yields for six representative test samples.

Materials	TR Temperature (°C)	CDS Temperature (°C)	Experimental CO Yield (μmol/g-Material)	Predicted CO Yield (μmol/g-Material)	References
La_0.6Sr_0.4Mn_0.5Cr_0.5O₃	1200	1200	122.32	125.16	[44]
La_0.7Sr_0.3Mn_0.9Co_0.1O₃	1400	800	325	317.17	[45]
La_0.6Sr_0.4Mn_0.6Al_0.4O₃	1400	1050	196	207.26	[30]
La_0.5Sr_0.5Mn_0.75Al_0.25O₃	1400	1100	330.00	342.50	[26]
La_0.6Sr_0.4MnO₃	1100	1100	83.93	67.90	[44]
La_0.6Sr_0.4Mn_0.8Fe_0.2O₃	1350	1000	329.9	317.50	[46]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, J.; Wang, K.; Xie, H.; Ma, K.; Li, K. Performance Prediction of Perovskite-Catalyzed CO₂ Decomposition Based on Machine-Learning Method. Energies 2026, 19, 1388. https://doi.org/10.3390/en19061388

AMA Style

Chen J, Wang K, Xie H, Ma K, Li K. Performance Prediction of Perovskite-Catalyzed CO₂ Decomposition Based on Machine-Learning Method. Energies. 2026; 19(6):1388. https://doi.org/10.3390/en19061388

Chicago/Turabian Style

Chen, Jiayi, Kun Wang, Huaqing Xie, Kerong Ma, and Kunlun Li. 2026. "Performance Prediction of Perovskite-Catalyzed CO₂ Decomposition Based on Machine-Learning Method" Energies 19, no. 6: 1388. https://doi.org/10.3390/en19061388

APA Style

Chen, J., Wang, K., Xie, H., Ma, K., & Li, K. (2026). Performance Prediction of Perovskite-Catalyzed CO₂ Decomposition Based on Machine-Learning Method. Energies, 19(6), 1388. https://doi.org/10.3390/en19061388

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu