Next Article in Journal
Correction: Scholliers et al. Modeling Merit-Order Shifts in District Heating Networks: A Life Cycle Assessment Method for High-Temperature Aquifer Thermal Energy Storage Integration. Energies 2026, 19, 212
Previous Article in Journal
Trend Prediction of Distribution Network Fault Symptoms Based on XLSTM-Informer Fusion Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Performance Prediction of Perovskite-Catalyzed CO2 Decomposition Based on Machine-Learning Method

1
National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Northeastern University, Shenyang 110819, China
2
State Environmental Protection Key Laboratory of Eco-Industry, School of Metallurgy, Northeastern University, Shenyang 110819, China
3
Key Laboratory of Data Analytics and Optimization for Smart Industry (Northeastern University), Ministry of Education, Shenyang 110819, China
*
Author to whom correspondence should be addressed.
Energies 2026, 19(6), 1388; https://doi.org/10.3390/en19061388
Submission received: 20 January 2026 / Revised: 23 February 2026 / Accepted: 7 March 2026 / Published: 10 March 2026
(This article belongs to the Special Issue Innovative Catalytic Approaches for Energy Conversion and Storage)

Abstract

Perovskite oxides show excellent catalytic performance for thermochemical CO2 splitting, with A/B-site cation substitution further enhancing redox activity. While traditional first-principles methods are computationally expensive, machine learning (ML) provides an efficient approach to perovskite optimization. In this paper, machine learning is employed to investigate and predict the performance of perovskite catalysts in CO2 decomposition reactions. Based on 227 perovskite compositions (A1A2)(B1B2)O3 curated from experimental literature, a total of five ML models are used, including Decision Tree, Bagging, Random Forest, Extra Trees, and Gradient Boosting Regression (GBR). The Random Forest model performed best. After hyperparameter optimization, the Random Forest model achieved an R2 of 0.910 and an MAE of 41.528 on an independent test set. SHAP analysis indicated that the thermal reduction temperature (T1) and the B1-site stoichiometric fraction (C_b1) are the most influential features governing the predicted CO yield. A higher CO yield is predicted when C_b1 ranges from 0.6 to 0.8, and T1 exceeds 1300 °C. This behavior can be attributed to the enhanced formation of oxygen vacancies at elevated temperatures and the optimized electronic structure induced by appropriate B-site stoichiometry.

1. Introduction

In recent years, greenhouse gas emissions, especially carbon dioxide (CO2) produced by the combustion of fossil fuels, have become a major driver of global climate change [1,2]. To cope with the increasingly severe climate crisis, researchers have proposed a variety of CO2 decomposition technologies, including photocatalysis [3,4], biocatalysis [5,6], electrocatalysis [7], and the solar thermochemical cycle [8,9]. To effectively reduce CO2 emissions and regulate CO2 concentration in the atmosphere, two-step thermochemical cycling reactions based on metal oxides, known as the chemical looping process, have received extensive attention in recent years. This is due to their harmless products, facile gas separation process, and relatively simple operating conditions [10,11]. This two-step thermochemical cycle reaction divides the CO2 decomposition process into two stages, both carried out at high temperatures. First, at a high temperature (T1), the metal oxide undergoes thermal reduction, releasing oxygen as shown in Equation (1). Then, when the reduced metal oxide reacts with CO2 through an oxidation reaction, it generates CO. At a lower temperature (T2), the reduced metal oxide is re-oxidized by CO2, producing CO and regenerating the original oxide phase, as shown in Equation (2). This process not only effectively decomposes CO2 but also offers advantages such as flexible operation and controllable reaction conditions.
Thermal Reduction (TR) Step:
M O x M O x δ + δ 2 O 2
Carbon Dioxide Splitting (CDS) Step:
M O x δ + δ C O 2 M O x + δ C O
where MOx represents the metal oxide and δ represents the non-stoichiometric oxygen content within the metal oxide.
Current studies have mainly focused on developing and evaluating redox materials (oxygen carriers) that exhibit high performance in thermochemical cycles to enhance CO production [12]. In this reaction system, non-stoichiometric metal oxides, such as ceria-based materials (CeO2/CeO2-δ, doped CeO2), and perovskite-structured ABO3/ABO3-δ [13], have attracted significant research attention. Perovskite oxides have attracted much attention due to their excellent redox properties. In perovskite oxides, the A site is occupied by cations with larger ionic radii, such as alkali and alkaline earth metals and lanthanide elements. The B site is usually a cation with a smaller ionic radius, generally including transition metal elements, Al, Sn, and others [14,15]. Generally, A-site cations contribute to structural stability, while B-site cations control material reactivity [16]. Both A- and B-site cations can be replaced. This substitution allows the perovskite structure to adopt diverse combinations, which helps regulate the non-stoichiometric oxygen content and its kinetic and thermodynamic properties. In addition, perovskite oxides can remain solid in a two-step thermochemical process and exhibit good structural stability at high temperatures, making them ideal redox materials.
The catalytic activity of perovskite oxides mainly depends on the B-site element; the A-site regulates the valence state and dispersion state of the B-site element [17]. By regulating the partial replacement of the A- and B-site cations, studies have shown that the CO yield capacity of perovskite oxides in the thermochemical CO2 decomposition reaction is significantly improved [18,19,20,21,22]. To further optimize the redox properties of perovskite oxides, researchers worldwide have conducted extensive redox activity evaluations on a variety of perovskite materials [23,24,25,26,27,28,29,30]. McDaniel et al. [25] demonstrated that perovskites with partial substitution of Al for Mn at B sites can efficiently decompose CO2. This process achieves a high yield and fast reaction speed. Perovskite materials with different combinations of B-site cations, such as La(B1, B2)O3 (where B1 and B2 are Mn, Co, Ni), or (A1A2)(B1B2)O3 (where A1 and A2 are La, Sr, Ca, and B1 and B2 are Mn, Fe, Co), showed a significant increase in CO yield. This improvement was achieved through the partial displacement of A- and B-site cations. Therefore, studying the effect of partial replacement of the A–B cation combination on the redox performance of perovskites remains an important approach to improving the redox activity of materials. However, the existing systematic studies are limited, and most studies rely on first-principles calculations and experimental methods. Although these methods reveal the potential properties of perovskite materials, they face challenges such as large calculations, limited data, and low experimental efficiency. Therefore, efficiently screening for perovskite materials with excellent properties remains a key issue in current research.
With the continuous progress of science and technology, the application of materials science in many fields is becoming increasingly important. Traditional materials research and development rely on trial-and-error, which consumes substantial experimental time and resources and limits the speed of discovery and design of new materials. To improve R&D efficiency, scientists have introduced the concept of material design. This approach predicts and designs materials with specific properties by integrating theoretical calculations from computer simulation, materials science, chemistry, and physics. As a result, it reduces the trial-and-error process and enhances R&D efficiency [31]. The advantages of material design include improving R&D efficiency, reducing costs and time, avoiding invalid experiments, and aligning experimental results with expectations.
Through computer simulations, researchers can simulate various experimental conditions, such as temperature, pressure, chemical environment, etc., to test the properties and behavior of materials in a virtual environment. This method not only quickly screens out potential high-performance materials but also optimizes experimental parameters and reduces uncertainties in actual experiments. The information age, particularly the advancement of big data technology, enables the rapid collection and sharing of data. This provides new opportunities for material researchers, accelerates material design research and development, and enhances the accuracy and reliability of material performance prediction. In recent years, machine-learning (ML) technology has brought breakthroughs to material design. As a powerful data mining tool, ML can identify potential laws from a large amount of experimental data and optimize the prediction and design of material properties. By constructing an efficient model based on material data, ML can accurately predict the target performance of undetermined samples [32]. It also reveals the relationship between material structure and physical and chemical properties, screens out high-performance materials that meet specific needs [31], and identifies the optimal experimental conditions.
To the best of our knowledge, ML-based prediction of CO yield in thermochemical CO2 splitting using ABO3 perovskites remains limited, and this work provides a practical framework to bridge experimental datasets and catalyst design. This study investigates the feasibility of applying machine learning (ML) to a two-step thermochemical CO2 splitting process to predict experimental CO yields for ABO3 perovskite catalysts. By integrating experimental data with feature engineering and model development, we establish data-driven models to (i) accelerate catalyst screening, (ii) optimize reaction conditions, and (iii) identify the key factors governing CO production. Compared with conventional trial-and-error experimentation, the proposed ML workflow—data collection, feature engineering, model selection, validation, and application—provides a systematic pathway to quantify the coupled effects of material composition and operating conditions, thereby supporting the rational design of highly redox-active perovskite oxides.

2. Datasets and Methods

2.1. Data Collection

During the dataset collection process, perovskite, thermochemical decomposition, and CO2 decomposition were used as keywords. Data for 227 groups of (A1A2)(B1B2)O3 perovskite materials, as of 2024, were collected through databases such as ScienceDirect and Web of Science. The data covers parameters including perovskite composition, CO yield, CO2 partial pressure, CO2 gas flow rate, reaction temperature, and reaction time. We collected 44 atomic parameters from the WebElements website as initial feature variables. Ensuring the reliability and comparability of the data is crucial to the model’s plausibility. Therefore, in addition to 44 atomic parameters, this paper also uses the experimental conditions of each sample as additional characteristic variables to reduce the influence of experimental condition differences on the ML model. To ensure high data quality, we applied strict inclusion and exclusion criteria for the dataset. The inclusion criteria require that composition, T1, T2, PCO2, t1, t2, and gas flow rate (GF) be reported, along with CO yield in convertible units. The exclusion criteria involve removing samples with unclear or non-convertible units, missing key variables such as T1/T2, anomalies in duplicate measurements under the same experimental conditions, and obvious data entry errors. Additionally, we harmonized the units for consistency, converting CO yield to μmol/g-material, and standardized units for temperature, time, pressure, and flow.
We also acknowledge the potential publication bias in the literature-derived datasets, where studies with successful or high-yield results are more likely to be published, potentially skewing the CO yield data higher. This workflow is shown in Figure 1.

2.2. ML Model

In this study, we used five ML models: Decision Tree, Bagging, Random Forest, Extra Trees, and Gradient Boosting Regression (GBR). Decision Trees predict by recursively segmenting data, but they are prone to overfitting. Bagging trains multiple models by resampling, reducing variance, and improving stability [33]. Random Forest introduces random feature selection based on Bagging to reduce the risk of overfitting [34]. Extra Trees further simplifies the splitting process [35], improving training speed but possibly slightly reducing accuracy. GBR is suitable for handling complex nonlinear relationships by iteratively constructing weak learners and correcting errors [36]. Each of these models has its advantages and can meet different forecasting needs.

2.3. Modeling Procedure

During the dataset collection phase, 55 parameters were collected, including perovskite reaction conditions and atomic properties. After data preprocessing, a two-stage feature selection strategy was adopted to reduce redundancy and improve model generalization.
Stage 1: mRMR ranking and preliminary redundancy removal. We first applied maximum relevance minimum redundancy (mRMR) to rank all candidate features by maximizing relevance to the target while minimizing redundancy among selected features, yielding an ordered feature list. We then examined pairwise correlations using the Pearson correlation coefficient matrix; for any feature pair with an absolute Pearson correlation coefficient exceeding 0.90, one of the two features was removed to mitigate multicollinearity.
Stage 2: Embedded feature selection combined with mRMR. To further refine the feature subset, an embedded feature selection scheme was combined with the mRMR ranking. Specifically, the number of selected features, K, was varied from 1 to the number of remaining features after Stage 1. For each K, the top-K features from the mRMR ranking were used as model inputs, and five tree-based learners (Decision Tree, Bagging, Random Forest, Extra Trees, and Gradient Boosting Regression) were employed as base models to evaluate predictive performance under different K values.
The dataset was split into training and test sets at a ratio of 9:1. A 10-fold cross-validation was conducted on the training set: the training data were divided into 10 folds, each fold was used once as the validation set, and the remaining folds were used for training [37]. The average performance across the 10 folds was used as the cross-validated evaluation of each model. Model performance was assessed using the coefficient of determination (R2) and error metrics, including mean absolute error (MAE) and mean absolute percentage error (MAPE). The optimal feature subset was determined by jointly considering a higher R2 and lower MAE/MAPE.
For regression performance evaluation, R2 measures the goodness of fit of the model and is calculated as follows:
R 2 = 1 S S res S S t o t
where SSres represents the residual sum of squares and SStot represents the total sum of squares.
MAE measures the average absolute difference between predicted and observed values and is defined as follows:
M A E = 1 n i = 1 n | y ^ i y i |
where y ^ i represents the predicted value and y i represents the actual value.
MAPE quantifies the average relative error between predicted and observed values and is defined as follows:
M A P E = 1 n i = 1 n | y ^ i y i y i | × 100 %
where a smaller MAPE indicates better predictive accuracy in relative terms.

2.4. SHAP Method

The SHAP method is an efficient framework for explaining the contributions of input features to any prediction algorithm. SHAP values calculated by game theory quantify the influence of each feature on the model’s prediction, revealing how each feature changes the model’s expected output [38,39]. Combined with ML models, the SHAP method can be used to explore the correlation between input features and output features [40]. Compared with the traditional sensitivity analysis method, the SHAP method can provide local interpretability and identify the significant impact of each data point on the prediction result. Additionally, it can evaluate the overall contribution of each feature to the output feature by accumulating the absolute value of the SHAP value of each data point [41].

3. Results and Discussion

3.1. Establishment and Preprocessing of the Dataset

In this study, we constructed a dataset containing 55 independent variables as initial characteristic variables. These independent variables include 44 atomic parameters and seven experimental conditions, and four characteristic variables are used to express the composition of atomic numbers (x = a1, a2, b1, b2). The performance evaluation index is the CO yield. The complete list of initial features is provided in Table 1 and Table 2. Figure 2 presents violin plots of the experimental conditions and the CO yield distribution. Notably, the CO yield (Figure 2h) shows a right-skewed, long-tailed distribution. Approximately 68.8% of the samples are concentrated in the low-yield range (<300 μmol/g), suggesting that the model is trained with sufficient data in this region and is therefore expected to provide more reliable predictions for low-to-mid CO yields. Classical geometric descriptors, such as the Goldschmidt tolerance factor (τ) and the octahedral factor, are widely used to characterize perovskite formability. In this study, the feature set already includes A/B-site ionic radii, which implicitly contain the geometric information required to compute these descriptors; moreover, defining a unique τ for multi-site substituted (A1A2)(B1B2)O3 compositions can be ambiguous. As a sanity check, the calculated τ values of the test samples fall within the commonly reported stability range (0.8–1.05) (Supplementary Table S3).
Catalyst surface area (e.g., BET) is also important for accessible active sites, but it is not consistently reported in the literature sources used to build the dataset; therefore, it was not included as an input feature. Future work will systematically collect surface-area data to enrich the feature set.

3.2. Feature Correlation Analysis and Screening

Using the mRMR method, we calculated the correlation between each feature and the target variable, as well as the feature redundancy. After comprehensively considering the correlation and redundancy, the features were ranked according to the score (as shown in Figure 3). Subsequently, we further screened using a Pearson correlation coefficient matrix to measure the linear correlation between the two sets of data. These data were calculated from the covariance and standard deviation of the two variables. The calculated values ranged from −1 to 1, reflecting a linear relationship. A positive value means a positive correlation, and vice versa. The corresponding coefficient values P ( X , Y ) between the input features are calculated as follows:
P ( X , Y ) = cov ( X , Y ) σ x σ y
where P(X, Y) represents the Pearson correlation coefficient between the input features; cov(X, Y) represents the covariance, and σxσy represents the standard deviation of the input parameters X and Y, respectively.
The correlation between features and target variables was calculated using mRMR; redundant features (threshold > 0.9) were removed using the Pearson correlation coefficient matrix; and finally, 34 features were retained. The correlation matrices before and after the preliminary feature screening are shown in Figure 4, and the results indicate that the correlation between these features is low.
To further improve the performance and generalization ability of the model, this study uses the embedded method combined with the mRMR algorithm for the second feature screening. We select five ML algorithms: Decision Trees, Bagging, Random Forest, Extra Tree, and GBR as the basic models of embedded feature screening. In the feature screening process, first, for each value of K (ranging from 1 to 34), the mRMR algorithm is used to screen features. For each K, a corresponding top-K subset was generated by mRMR and evaluated by the embedded models. The relationships between R2 and K for the five ML models are shown in Figure 5, and Table 3 lists the final screening feature variables for each model. Using this method, the feature combinations that most significantly improve the model’s performance can be effectively screened out, providing a reliable feature set for subsequent modeling.
Competitive validation of the feature-selection scheme. To demonstrate that the proposed three-step feature selection is not an arbitrary choice, we performed a controlled comparison using the same Random Forest model with identical hyperparameters and the same test split. Only the T1 feature-selection strategy was changed among four schemes: using all 55 features, the proposed three-step procedure, LASSO selection, and RFE selection. As shown in Table S4, the proposed three-step scheme achieves the best test performance (highest R2 and lowest MAE), indicating that it is a competitive and reproducible choice for subsequent modeling.

3.3. Model Selection

A two-stage model selection workflow was adopted in this study. First, after feature selection, we conducted 10-fold cross-validation across five commonly used regression algorithms (Decision Tree, Bagging, Random Forest, Extra Trees, and Gradient Boosting Regressor, GBR) to establish a baseline. Second, the most promising and robust model was selected for further hyperparameter optimization.
The cross-validation results are summarized in Table 4 and Figure 6. Extra Trees (R2 = 0.880, MAE = 44.789) and Decision Tree (R2 = 0.872, MAE = 44.296) achieved strong cross-validated performance; however, both models showed an almost perfect fit on the training data (training R2 = 0.997), suggesting a larger risk of overfitting and sensitivity to data partitioning. Bagging performed worse overall (R2 = 0.741, MAE = 57.796) and was therefore not selected. GBR exhibited relatively weak performance on this dataset (R2 = 0.766, MAE = 70.427). We attribute this, at least in part, to the dataset characteristics and the sequential boosting mechanism: the CO-yield distribution is strongly right-skewed and long-tailed, and extremely high-yield samples may exert disproportionate influence during iterative residual fitting, thereby impairing generalization in small datasets. In contrast, Random Forest relies on parallel bagging-based aggregation and is often more robust under limited sample sizes and imbalanced target distributions.
Considering both predictive performance and robustness, Random Forest was selected as the final model and subsequently optimized using an evolutionary algorithm to further improve performance.

3.4. Hyperparameter Optimization

Hyperparameters [42] are parameters in ML algorithms that are not learned directly from the training data and therefore need to be specified or tuned before or during model training. The selection of hyperparameters has a decisive influence on the performance and generalization ability of the model and is an indispensable key link in the training process of ML models [43]. The evolutionary algorithm is a global optimization algorithm based on natural selection and genetic mechanisms, and has been widely used for hyperparameter optimization of ML models in recent years. Its core idea is to gradually search for the optimal hyperparameter combination by simulating selection, crossover, and mutation during biological evolution (as shown in Figure 7). The advantage of the evolutionary algorithm lies in its powerful global search ability, which can effectively avoid falling into the local optimal solution.
Hyperparameters of the Random Forest model were optimized using an evolutionary algorithm to reduce manual tuning and improve generalization. Each individual in the population represents a candidate set of hyperparameters, and model performance under 10-fold cross-validation was used as the fitness criterion. The algorithm iteratively updates the population through selection, crossover, and mutation until convergence or the maximum number of generations is reached. In this study, the population size was set to 50, the crossover probability to 0.7, the mutation probability to 0.3, and the maximum number of generations to 60. The optimal hyperparameter configuration obtained by the evolutionary search is reported in Table 5, while additional implementation details are provided in the Supplementary Material.

3.5. Model Validation

In this study, the dataset was divided into a training set and a test set using a 9:1 hold-out strategy. Feature engineering, model selection, and hyperparameter optimization were conducted on the training set, and the remaining 10% of samples were used to evaluate generalization performance. The model performance before and after optimization is summarized in Table 6.
After optimization, the Random Forest model achieved R2 = 0.996 and MAE = 7.700 μmol/g on the training set, and R2 = 0.910 and MAE = 41.528 μmol/g on the test set, indicating improved test performance but also a remaining train–test gap.
To assess robustness and provide uncertainty estimates, we repeated the 9:1 split using 10 independent random seeds. The model showed stable performance across splits, with a mean test R2 = 0.942 ± 0.010 and a mean test MAE = 49.27 ± 7.21 μmol/g, corresponding to 95% confidence intervals of [0.935, 0.948] (R2) and [44.11, 54.42] μmol/g (MAE). Figure 8 compares the predicted and experimental CO yields on the test set, demonstrating the predictive performance of the optimized model. Figure 9 presents the learning curve, which further illustrates the model’s training behavior and generalization ability. Table 7 compares the experimental and predicted CO yields for six representative test samples to provide an intuitive illustration of the prediction error. These examples span low- and high-yield regimes, showing that the optimized model captures the overall CO yield trend with reasonable accuracy.
To further assess the reliability of the optimized Random Forest model at different CO yield levels, test samples were grouped into CO yield intervals, and the mean absolute percentage error (MAPE) was calculated for each interval. MAPE complements MAE by reflecting relative deviation, but it may be amplified when measured yields are close to zero.
Results show that samples with CO yields > 100 μmol/g account for 82.8% of the dataset, with MAPE ranging from 0.47% to 7.31%, indicating strong reliability in the dominant yield range. The low-yield interval (0–100 μmol/g) exhibits larger percentage errors, likely due to the small-denominator effect and higher uncertainty. Overall, the optimized model is robust for most samples. The interval-wise MAPE results are provided in the Supplementary Material (Table S5).

3.6. SHAP-Based Model Interpretation

To further explore the correlation between input features and CO generation, we use SHAP to interpret the predictions of the optimized Random Forest model. This allows us to analyze the importance and contribution of each input feature to the model’s predictions. The importance of features is shown in Figure 10. The results show that the T1 and the C_b1 have the most significant effects on CO formation. Secondly, there are three key reaction conditions: t1, t2, and HR.
Given the limited contribution of lower-ranked variables, we summarize the top 10 globally important features using a SHAP summary (beeswarm) plot (Figure 11), which shows how each feature affects the predicted CO yield across all samples. The x-axis denotes SHAP values (negative to positive impact on the model output), each point represents one sample, and the color encodes the feature value (red: high; blue: low).
As shown in Figure 11, T1 exhibits the widest SHAP-value range, confirming it as the dominant predictor: high T1 values are mainly associated with positive contributions, whereas low T1 values concentrate in the negative region, indicating that increasing T1 generally increases the predicted CO yield. C_b1 and t1 are the next most influential features and exhibit similar monotonic behavior: higher values contribute positively, and lower values contribute negatively. Several features (e.g., t2 and IEs_b2) display asymmetric SHAP distributions, suggesting conditional or threshold-like effects rather than a purely linear relationship. To evaluate whether such asymmetry reflects genuine interactions or sparse-data artifacts, we further computed pairwise SHAP interaction strengths and ranked all feature pairs; the results are provided in the Supplementary Material (Figure S2). In addition, HR and PCO2 exhibit compact, near-symmetric SHAP distributions, indicating relatively stable, weaker marginal effects within the investigated range.
To further elucidate the mechanistic roles of the key features, we combine SHAP analysis results with experimental observations and discuss the underlying effects from the perspective of catalytic reaction kinetics and the electronic structure of perovskite materials. Figure 12 shows the SHAP dependence plots for T1 and C_b1, illustrating how the predicted CO yield varies with each feature across all samples.
From the SHAP dependence graph, it can be seen that the contribution of T1 to the model prediction presents a nonlinear trend. When T1 < 1300 °C, the SHAP values are negative, indicating that lower temperatures contribute less to CO generation and may even play an inhibitory role. When T1 > 1300 °C, the SHAP value gradually turns positive, indicating that higher temperatures can promote CO generation. This trend has been verified in the experimental data. Combined with the changes in CO yield at different T1 temperatures shown in Figure 13, it can be observed that CO yield initially rises and then falls as the temperature increases. To illustrate the influence of T1 more clearly, Figure 14 presents CO yield data for various perovskites in the database, with temperature as the independent variable and CO yield as the dependent variable, while keeping other experimental conditions constant. When T1 increases from 1000 °C to 1400 °C, the CO yield shows an upward trend. For example, taking La0.7Sr0.3Mn0.9Cr0.1O3, the CO yield peaks at 228 μmol/g when T1 = 1400 °C. This phenomenon can be explained by oxygen vacancy kinetics: an increase in the thermal reduction reaction temperature (T1) accelerates the desorption of lattice oxygen, forming more oxygen vacancies. Oxygen vacancies are key active sites for CO2 adsorption and activation, and their concentration directly affects the reduction efficiency of CO2 [47,48,49]. However, excessively high T1 (>1400 °C) may lead to material sintering or structural collapse, reducing the stability of active sites. The high SHAP values of T1 in the positive contribution region confirm the importance of its optimization range.
Similarly, when C_b1 is between 0.6 and 0.8, it shows a more significant SHAP value (Figure 12), indicating that the promotion of CO yield is most pronounced in this range. This aligns with the trend of CO yield changing with C_b1, as shown in Figure 13. When C_b1 is adjusted within the range of 0–1, CO yield first increases and then decreases, reaching its peak when the stoichiometric number of B1 sites is 0.6–0.8. Experimental data in Figure 14 demonstrate that, under constant experimental conditions, with C_b1 as the independent variable and CO yield as the dependent variable, the CO yield of Sr0.6Ce0.4Mn0.8Al0.2O3 (C_b1 = 0.8) is significantly higher (799.34 μmol/g) compared to Sr0.6Ce0.4Mn0.2Al0.8O3 (C_b1 = 0.2) (302.61 μmol/g). During the CO2 reduction process, the stoichiometric ratio of B1 sites significantly influences the catalyst’s electronic structure, oxygen vacancy concentration, and the distribution of surface-active sites [50]. When C_b1 is between 0.6 and 0.8, perovskite oxides can form an appropriate amount of oxygen vacancies, which not only serve as adsorption and activation centers for CO2 molecules but also optimize the stability of CO2 reduction intermediates (such as COOH) by providing additional charge transport pathways, thereby accelerating CO generation. Additionally, the electronic structure adjustment at the B1 site may optimize the electron filling of the d-orbital, facilitating CO2 adsorption on the catalyst surface and promoting C-O bond cleavage, thereby improving the conversion efficiency of CO2 to CO.
Figure 15 shows the change in CO yield under the combined influence of T1 and C_b1. When T1 > 1300 °C and C_b1 is between 0.6 and 0.8, the CO yield reaches its peak. This indicates that high temperature, in combination with the appropriate stoichiometric number of the B1 site, promotes CO yield. The synergistic effect between T1 and C_b1 arises from the enhanced oxygen vacancy formation at high temperatures (T1 > 1300 °C), combined with the optimized electronic structure of the B1 site (C_b1 = 0.6–0.8), which collectively facilitates CO2 adsorption and C–O bond cleavage. This finding is consistent with the conclusion drawn from the SHAP dependence plot.
It should be noted that SHAP values reflect associations learned from the available dataset and do not establish causality. The identified key features may also act as proxies for unmeasured factors (e.g., morphology, synthesis atmosphere, or correlated compositional descriptors). To provide an additional, model-consistent visualization of the interaction highlighted by SHAP, we include a two-dimensional partial dependence plot (2D PDP) for T1 and C_b1 (Figure 16). The 2D PDP shows that CO yield is predicted to increase markedly when T1 exceeds 1300 °C and C_b1 falls within approximately 0.6–0.8, indicating a strong coupled effect between reaction temperature and the B-site stoichiometric descriptor. This interaction-based visualization supports SHAP interpretation by revealing the high-yield region of the model’s response surface, while causal mechanisms remain to be confirmed through controlled experiments.

4. Conclusions

In this work, we curated 227 literature-reported ABO3 perovskite catalysts for thermochemical CO2 splitting and constructed a dataset containing compositional descriptors, atomic parameters, and experimental conditions. After feature screening and embedded selection, Random Forest was identified as the most robust model among the five evaluated regressors. Evolutionary hyperparameter optimization further improved generalization, yielding a test-set performance of R2 = 0.910 and MAE = 41.528 μmol/g-material. SHAP and partial dependence analyses consistently indicate that the thermal reduction temperature (T1) and the B1-site stoichiometric fraction (C_b1) dominate the model predictions, with higher CO yields associated with T1 > 1300 °C and C_b1 ≈ 0.6–0.8. These results provide a data-driven basis for prioritizing perovskite compositions and operating conditions in catalyst screening, while mechanistic causality should be further validated experimentally.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/en19061388/s1. Table S1. Key dependencies for machine learning. Table S2. AB-site cation atomic parameters. Table S3. Summary statistics of the Goldschmidt tolerance factor (τ) for the test set. Table S4. Competitive comparison of feature-selection schemes using a fixed Random Forest model. Table S5. Interval-wise MAPE (%) for the optimized Random Forest model. Figure S1. SHAP analysis of other important features. Figure S2. Top 15 pairwise SHAP interaction strengths measured by mean (|ϕij|).

Author Contributions

Conceptualization, K.W.; Methodology, H.X.; Formal analysis, J.C.; Investigation, K.L.; Data curation, K.M.; Writing—original draft, J.C.; Writing—review and editing, K.W.; Funding acquisition, K.W. and H.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (52276102 and 52476181).

Data Availability Statement

The data supporting the findings of this study, along with the code used for model training and evaluation, are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sarkodie, S.A.; Owusu, P.A.; Leirvik, T. Global Effect of Urban Sprawl, Industrialization, Trade and Economic Development on Carbon Dioxide Emissions. Environ. Res. Lett. 2020, 15, 34049. [Google Scholar] [CrossRef]
  2. Capellán-Pérez, I.; Arto, I.; Polanco-Martínez, J.M.; González-Eguino, M.; Neumann, M.B. Likelihood of Climate Change Pathways Under Uncertainty on Fossil Fuel Resource Availability. Energy Environ. Sci. 2016, 9, 2482–2496. [Google Scholar] [CrossRef]
  3. Li, D.; Kassymova, M.; Cai, X.; Zang, S.-Q.; Jiang, H.-L. Photocatalytic CO2 Reduction over Metal-Organic Framework-Based Materials. Coord. Chem. Rev. 2020, 412, 213262. [Google Scholar] [CrossRef]
  4. Ola, O.; Maroto-Valer, M.M. Review of Material Design and Reactor Engineering on TiO2 Photocatalysis for CO2 Reduction. J. Photochem. Photobiol. C 2015, 24, 16–42. [Google Scholar] [CrossRef]
  5. Erb, T.J.; Zarzycki, J. Biochemical and Synthetic Biology Approaches to Improve Photosynthetic CO2-Fixation. Curr. Opin. Chem. Biol. 2016, 34, 72–79. [Google Scholar] [CrossRef]
  6. Appel, A.M.; Bercaw, J.E.; Bocarsly, A.B.; Dobbek, H.; DuBois, D.L.; Dupuis, M.; Ferry, J.G.; Fujita, E.; Hille, R.; Kenis, P.J.A.; et al. Frontiers, Opportunities, and Challenges in Biochemical and Chemical Catalysis of CO2 Fixation. Chem. Rev. 2013, 113, 6621–6658. [Google Scholar] [CrossRef]
  7. Qiao, J.L.; Liu, Y.Y.; Hong, F.F.; Zhang, J. A Review of Catalysts for the Electroreduction of Carbon Dioxide to Produce Low-Carbon Fuels. Chem. Soc. Rev. 2014, 43, 631–675. [Google Scholar] [CrossRef]
  8. Hao, Y.; Steinfeld, A. Fuels from Water, CO2 and Solar Energy. Sci. Bull. 2017, 62, 1099–1101. [Google Scholar] [CrossRef]
  9. Kodama, T. High-Temperature Solar Chemistry for Converting Solar Heat to Chemical Fuels. Prog. Energy Combust. Sci. 2003, 29, 567–597. [Google Scholar] [CrossRef]
  10. Yadav, D.; Banerjee, R. A Review of Solar Thermochemical Processes. Renew. Sustain. Energy Rev. 2016, 54, 497–532. [Google Scholar] [CrossRef]
  11. Loutzenhiser, P.G.; Meier, A.; Steinfeld, A. Review of the Two-Step H2O/CO2-Splitting Solar Thermochemical Cycle Based on Zn/ZnO Redox Reactions. Materials 2010, 3, 4922–4938. [Google Scholar] [PubMed]
  12. Scheffe, J.R.; Steinfeld, A. Oxygen Exchange Materials for Solar Thermochemical Splitting of H2O and CO2: A Review. Mater. Today 2014, 17, 341–348. [Google Scholar] [CrossRef]
  13. Riaz, A.; Kreider, P.; Kremer, F.; Tabassum, H.; Yeoh, J.S.; Lipiński, W.; Lowe, A. Electrospun Manganese-Based Perovskites as Efficient Oxygen Exchange Redox Materials for Improved Solar Thermochemical CO2 Splitting. ACS Appl. Energy Mater. 2019, 2, 2494–2505. [Google Scholar] [CrossRef]
  14. Tabish, A.; Varghese, A.M.; Wahab, M.A.; Karanikolos, G.N. Perovskites in the Energy Grid and CO2 Conversion: Current Context and Future Directions. Catalysts 2020, 10, 95. [Google Scholar] [CrossRef]
  15. Zhang, K.; Sunarso, J.; Shao, Z.; Zhou, W.; Sun, C.; Wang, S.; Liu, S. Research Progress and Materials Selection Guidelines on Mixed Conducting Perovskite-Type Ceramic Membranes for Oxygen Production. RSC Adv. 2011, 1, 1661–1676. [Google Scholar] [CrossRef]
  16. Nair, M.M.; Abanades, S. Correlating Oxygen with Thermochemical CO2-Splitting Efficiency in A-Site Substituted Manganite Perovskites. Sustain. Energy Fuels 2021, 5, 4570–4574. [Google Scholar] [CrossRef]
  17. Jing, Y.; Aluru, N.R. The Role of A-Site Ion on Proton Diffusion in Perovskite Oxides (ABO3). J. Power Sources 2020, 445, 227327. [Google Scholar] [CrossRef]
  18. Nair, M.M.; Abanades, S. Experimental Screening of Perovskite Oxides as Efficient Redox Materials for Solar Thermochemical CO2 Conversion. Sustain. Energy Fuels 2018, 2, 843–854. [Google Scholar] [CrossRef]
  19. Dey, S.; Naidu, B.S.; Rao, C.N.R. Ln0.5A0.5MnO3 (Ln = Lanthanide, A = Ca, Sr) Perovskites Exhibiting Remarkable Performance in the Thermochemical Generation of CO and H2 from CO2 and H2O. Chem. Eur. J. 2015, 21, 7077–7081. [Google Scholar] [CrossRef]
  20. McDaniel, A.H.; Ambrosini, A.; Coker, E.; Miller, J.; Chueh, W.; O’hAyre, R.; Tong, J. Nonstoichiometric Perovskite Oxides for Solar Thermochemical H2 and CO Production. Energy Procedia 2014, 49, 2009–2018. [Google Scholar] [CrossRef]
  21. Gokon, N.; Hara, K.; Ito, N.; Sawaguri, H.; Bellan, S.; Kodama, T.; Cho, H.-S. Thermochemical H2O Splitting Using LaSrMnCrO3 of Perovskite Oxides for Solar Hydrogen Production. In Proceedings of the SOLARPACES 2019: International Conference on Concentrating Solar Power and Chemical Energy Systems, Daegu, Republic of Korea, 1–4 October 2019; p. 170007. [Google Scholar]
  22. Gokon, N.; Hara, K.; Sugiyama, Y.; Bellan, S.; Kodama, T.; Hyun-Seok, C. Thermochemical Two-Step Water Splitting Cycle Using Perovskite Oxides Based on LaSrMnO3 Redox System for Solar H2 Production. Thermochim. Acta 2019, 680, 178374. [Google Scholar] [CrossRef]
  23. Demont, A.; Abanades, S. High Redox Activity of Sr-Substituted Lanthanum Manganite Perovskites for Two-Step Thermochemical Dissociation of CO2. RSC Adv. 2014, 4, 54885–54891. [Google Scholar] [CrossRef]
  24. Rao, C.N.R.; Dey, S. Generation of H2 and CO by Solar Thermochemical Splitting of H2O and CO2 by Employing Metal Oxides. J. Solid State Chem. 2016, 242, 107–115. [Google Scholar] [CrossRef]
  25. McDaniel, A.H.; Miller, E.C.; Arifin, D.; Ambrosini, A.; Coker, E.N.; O’Hayre, R.; Chueh, W.C.; Tong, J. Sr- and Mn-Doped LaAlO3−δ for Solar Thermochemical H2 and CO Production. Energy Environ. Sci. 2013, 6, 2424. [Google Scholar] [CrossRef]
  26. Dey, S.; Naidu, B.S.; Rao, C.N.R. Beneficial Effects of Substituting Trivalent Ions in the B-Site of La0.5Sr0.5Mn1−xAxO3 (A = Al, Ga, Sc) on the Thermochemical Generation of CO and H2 from CO2 and H2O. Dalton Trans. 2016, 45, 2430–2435. [Google Scholar] [CrossRef] [PubMed]
  27. Takalkar, G.; Bhosale, R.; AlMomani, F. Combustion Synthesized A0.5Sr0.5MnO3−δ Perovskites (Where, A = La, Nd, Sm, Gd, Tb, Pr, Dy, and Y) as Redox Materials for Thermochemical Splitting of CO2. Appl. Surf. Sci. 2019, 489, 80–91. [Google Scholar] [CrossRef]
  28. Bork, A.H.; Kubicek, M.; Struzik, M.; Rupp, J.L.M. Perovskite La0.6Sr0.4Cr1−xCoxO3−δ Solid Solutions for Solar-Thermochemical Fuel Production: Strategies to Lower the Operation Temperature. J. Mater. Chem. A 2015, 3, 15546–15557. [Google Scholar] [CrossRef]
  29. Takalkar, G.; Bhosale, R.R.; AlMomani, F.; Kumar, A.; Banu, A.; Ashok, A.; Rashid, S.; Khraisheh, M.; Shakoor, A.; al Ashraf, A. Thermochemical Splitting of CO2 Using Solution Combustion Synthesized LaMO3 (Where, M = Co, Fe, Mn, Ni, Al, Cr, Sr). Appl. Surf. Sci. 2020, 509, 144908. [Google Scholar] [CrossRef]
  30. Demont, A.; Abanades, S. Solar Thermochemical Conversion of CO2 into Fuel via Two-Step Redox Cycling of Non-Stoichiometric Mn-Containing Perovskite Oxides. J. Mater. Chem. A 2015, 3, 3536–3546. [Google Scholar] [CrossRef]
  31. Agrawal, A.; Choudhary, A. Perspective: Materials Informatics and Big Data: Realization of the “Fourth Paradigm” of Science in Materials Science. APL Mater. 2016, 4, 053208. [Google Scholar] [CrossRef]
  32. Zhou, T.; Song, Z.; Sundmacher, K. Big Data Creates New Opportunities for Materials Research: A Review on Methods and Applications of Machine Learning for Materials Design. Engineering 2019, 5, 1017–1026. [Google Scholar] [CrossRef]
  33. Sashidhar, D.; Kutz, J.N. Bagging, Optimized Dynamic Mode Splitting for Robust, Stable Forecasting with Spatial and Temporal Uncertainty Quantification. Philos. Trans. R. Soc. A 2022, 380, 20210198. [Google Scholar] [CrossRef] [PubMed]
  34. Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947–1958. [Google Scholar] [CrossRef]
  35. Melanson, D. Extremely Randomized Trees with Multiparty Computation. Ph.D. Thesis, University of Washington Tacoma, Tacoma, WA, USA, 2020. [Google Scholar]
  36. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  37. Picard, R.R.; Cook, R.D. Cross-Validation of Regression Models. J. Am. Stat. Assoc. 1984, 79, 575–583. [Google Scholar] [CrossRef]
  38. Vega García, M.; Aznarte, J.L. Shapley Additive Explanations for NO2 Forecasting. Ecol. Inform. 2020, 56, 101039. [Google Scholar] [CrossRef]
  39. Li, J.; Pan, L.; Suvarna, M.; Tong, Y.W.; Wang, X. Fuel Properties of Hydrochar and Pyrochar: Prediction and Exploration with Machine Learning. Appl. Energy 2020, 269, 115166. [Google Scholar] [CrossRef]
  40. Zhu, X.; Wang, X.; Ok, Y.S. The Application of Machine Learning Methods for Prediction of Metal Sorption onto Biochars. J. Hazard. Mater. 2019, 378, 120727. [Google Scholar] [CrossRef]
  41. Mu, L.; Wang, Z.; Wu, D.; Zhao, L.; Yin, H. Prediction and Evaluation of Fuel Properties of Hydrochar from Waste Solid Biomass: Machine Learning Algorithm Based on Proposed PSO–NN Model. Fuel 2022, 318, 123644. [Google Scholar] [CrossRef]
  42. Sampson, J.R. Adaptation in Natural and Artificial Systems (John H. Holland). SIAM Rev. 1976, 18, 529–530. [Google Scholar] [CrossRef]
  43. Carrillo, A.J.; Bork, A.H.; Moser, T.; Sediva, E.; Hood, Z.D.; Rupp, J.L.M. Modifying La0.6Sr0.4MnO3 Perovskites with Cr Incorporation for Fast Isothermal CO2-Splitting Kinetics in Solar-Driven Thermochemical Cycles. Adv. Energy Mater. 2019, 9, 1803886. [Google Scholar] [CrossRef]
  44. Sawaguri, H.; Gokon, N.; Hayashi, K.; Iwamura, Y.; Yasuhara, D. Two-Step Thermochemical CO2 Splitting Using Partially-Substituted Perovskite Oxides of La0.7Sr0.3Mn0.9X0.1O3 for Solar Fuel Production. Front. Energy Res. 2022, 10, 872959. [Google Scholar] [CrossRef]
  45. Luciani, G.; Landi, G.; Aronne, A.; Di Benedetto, A. Partial Substitution of B Cation in La0.6Sr0.4MnO3 Perovskites: A Promising Strategy to Improve the Redox Properties Useful for Solar Thermochemical Water and Carbon Dioxide Splitting. Sol. Energy 2018, 171, 1–7. [Google Scholar] [CrossRef]
  46. Yang, C.; Yin, L.-L.; Bebensee, F.; Buchholz, M.; Sezen, H.; Heissler, S.; Chen, J.; Nefedov, A.; Idriss, H.; Gong, X.-Q.; et al. Chemical Activity of Oxygen Vacancies on Ceria: A Combined Experimental and Theoretical Study on CeO2(111). Phys. Chem. Chem. Phys. 2014, 16, 24165–24168. [Google Scholar] [CrossRef] [PubMed]
  47. Schaub, R.; Thostrup, P.; Lopez, N.; Lægsgaard, E.; Stensgaard, I.; Nørskov, J.K.; Besenbacher, F. Oxygen Vacancies as Active Sites for Water Dissociation on Rutile TiO2(110). Phys. Rev. Lett. 2001, 87, 266104. [Google Scholar] [CrossRef]
  48. Daza, Y.A.; Maiti, D.; Kent, R.A.; Bhethanabotla, V.R.; Kuhn, J.N. Isothermal Reverse Water Gas Shift Chemical Looping on La0.75Sr0.25Co1-xFeO3 Perovskite-Type Oxides. Catal. Today 2015, 258, 691–698. [Google Scholar] [CrossRef]
  49. Kozokaro, V.F.; Biswas, S.; Toroker, M.C. Changing Your Tune on Catalytic Efficiency: Tuning Cr Concentration in La0.3Sr0.7Fe1-xCrxO3−δ Perovskite as a Cathode in Solid Oxide Electrolysis Cell. Comput. Mater. Sci. 2022, 210, 111462. [Google Scholar] [CrossRef]
  50. Connor, B.A.; Smaha, R.W.; Li, J.; Gold-Parker, A.; Heyer, A.J.; Toney, M.F.; Lee, Y.S.; Karunadasa, H.I. Alloying a Single and a Double Perovskite: A Cu+/2+ Mixed-Valence Layered Halide Perovskite with Strong Optical Absorption. Chem. Sci. 2021, 12, 8689–8697. [Google Scholar] [CrossRef]
Figure 1. ML algorithm flow in this work.
Figure 1. ML algorithm flow in this work.
Energies 19 01388 g001
Figure 2. Violin plots of experimental conditions and CO yield distribution. (a) HR; (b) T1; (c) T2; (d) PCO2; (e) t1; (f) t2; (g) GF; (h) CO Production.
Figure 2. Violin plots of experimental conditions and CO yield distribution. (a) HR; (b) T1; (c) T2; (d) PCO2; (e) t1; (f) t2; (g) GF; (h) CO Production.
Energies 19 01388 g002
Figure 3. Ranking of mRMR scores for 55 initial characteristic variables.
Figure 3. Ranking of mRMR scores for 55 initial characteristic variables.
Energies 19 01388 g003
Figure 4. Pearson correlation matrices before and after preliminary screening (55 features vs. 34 retained features).
Figure 4. Pearson correlation matrices before and after preliminary screening (55 features vs. 34 retained features).
Energies 19 01388 g004
Figure 5. Feature selection results of five ML models.
Figure 5. Feature selection results of five ML models.
Energies 19 01388 g005
Figure 6. Verification results of 10-fold cross-validation of five regression models. (a) Comparison of R2 values of the five models; (b) Comparison of MAE values of the five models.
Figure 6. Verification results of 10-fold cross-validation of five regression models. (a) Comparison of R2 values of the five models; (b) Comparison of MAE values of the five models.
Energies 19 01388 g006
Figure 7. Evolutionary algorithm model.
Figure 7. Evolutionary algorithm model.
Energies 19 01388 g007
Figure 8. Random Forest model performance before and after optimization.
Figure 8. Random Forest model performance before and after optimization.
Energies 19 01388 g008
Figure 9. Learning curve of the optimized Random Forest model based on multi-seed hold-out validation.
Figure 9. Learning curve of the optimized Random Forest model based on multi-seed hold-out validation.
Energies 19 01388 g009
Figure 10. Importance ranking of Random Forest features based on SHAP.
Figure 10. Importance ranking of Random Forest features based on SHAP.
Energies 19 01388 g010
Figure 11. SHAP value visualization of the Random Forest model. (a) SHAP summary plot of the top 10 features; (b) Top 10 feature importances of the Random Forest model.
Figure 11. SHAP value visualization of the Random Forest model. (a) SHAP summary plot of the top 10 features; (b) Top 10 feature importances of the Random Forest model.
Energies 19 01388 g011
Figure 12. Dependence plots of T1 and C_b1 features on CO yield. (a) SHAP dependence plot of T1; (b) SHAP dependence plot of C_b1.
Figure 12. Dependence plots of T1 and C_b1 features on CO yield. (a) SHAP dependence plot of T1; (b) SHAP dependence plot of C_b1.
Energies 19 01388 g012
Figure 13. LOESS scatter plots of T1 and C_b1. (a) Relationship between T1 and CO yield; (b) Relationship between C_b1 and CO yield. In each panel, the solid line represents the LOESS-fitted trend.
Figure 13. LOESS scatter plots of T1 and C_b1. (a) Relationship between T1 and CO yield; (b) Relationship between C_b1 and CO yield. In each panel, the solid line represents the LOESS-fitted trend.
Energies 19 01388 g013
Figure 14. Partial experimental data plot of the effect of T1 and C_b1 on CO yield.
Figure 14. Partial experimental data plot of the effect of T1 and C_b1 on CO yield.
Energies 19 01388 g014
Figure 15. Interaction between T1 and C_b1 affecting the CO yield diagram.
Figure 15. Interaction between T1 and C_b1 affecting the CO yield diagram.
Energies 19 01388 g015
Figure 16. Two-dimensional partial dependence plot (2D PDP) showing the interaction between T1 and C_b1 on the predicted CO yield.
Figure 16. Two-dimensional partial dependence plot (2D PDP) showing the interaction between T1 and C_b1 on the predicted CO yield.
Energies 19 01388 g016
Table 1. Atomic parameters and their physical significance.
Table 1. Atomic parameters and their physical significance.
No.CharacteristicsPhysical Meaning
1C_xStoichiometric number of X site (0–1)
2Z_xAtomic number of the X site
3m_xRelative atomic mass of the X site
4rf_xVan der Waals radius of the X site (pm)
5rc_xCovalent radius of the X site (pm)
6X_xPauling electronegativity of the X site
7IEf_xFirst ionization energy of the X site (kJ/mol)
8IEs_xSecond ionization energy of the X site (kJ/mol)
9IEt_xThird ionization energy of the X site (kJ/mol)
10Hf_xFusion enthalpy of the X site (kJ/mol)
11Hv_xVaporization enthalpy of the X site (kJ/mol)
12Ha_xAtomization enthalpy of the X site (kJ/mol)
Table 2. Experimental conditions and their physical significance.
Table 2. Experimental conditions and their physical significance.
No.CharacteristicsPhysical Meaning
1HRHeating rate (°C/min)
2T1TR temperature (°C)
3T2CDS temperature (°C)
4PCO2CO2 partial pressure (atm)
5t1TR duration (min)
6t2CDS duration (min)
7GFCO2 gas flowrate (mL/min)
Table 3. Optimal feature subsets selected by the embedded scheme for each regression model.
Table 3. Optimal feature subsets selected by the embedded scheme for each regression model.
AlgorithmNumber of Features (K)Best Feature Subset
Decision Tree
GBR
Random Forest
29rf_b1, t1, C_b1, X_a2, rc_b2, Hf_a2, rc_a2, rc_b1, Z_b1, t2, Hv_a2, IEf_b2, Z_a2, IEf_b1, Z_a1, HR, rf_a2, Z_b2, IEs_b2, T2, C_a1, Hv_b2, T1, IEs_a1, Hf_b2, IEf_a2, X_b1, PCO2, rf_b2
Extra Trees25rf_b1, t1, C_b1, X_a2, rc_b2, Hf_a2, rc_a2, rc_b1, Z_b1, t2, Hv_a2, IEf_b2, Z_a2, IEf_b1, Z_a1, HR, rf_a2, Z_b2, IEs_b2, T2, Hv_b2, C_a1, T1, IEs_a1, PCO2
Bagging31rf_b1, t1, C_b1, X_a2, rc_b2, Hf_a2, rc_a2, rc_b1, Z_b1, t2, Hv_a2, IEf_b2, Z_a2, IEf_b1, Z_a1, HR, rf_a2, Z_b2, IEs_b2, T2, C_a1, Hv_b2, T1, IEs_a1, Hf_b2, IEf_a2, X_b1, PCO2, rf_b2, IEt_b2, Hf_b1
Table 4. 10-fold cross-validation results of different algorithms.
Table 4. 10-fold cross-validation results of different algorithms.
AlgorithmR2MAE
Decision Tree0.87244.296
Bagging0.74157.796
Random Forest0.87049.758
Extra Trees0.88044.789
GBR0.76670.427
Table 5. Optimal hyperparameter values of the Random Forest model.
Table 5. Optimal hyperparameter values of the Random Forest model.
ParameterOptimum Value
Max depth20
Estimators62
Min samples split2
Min samples leaf1
Max featuresSqrt
BootstrapFalse
Min impurity decrease0.3
Random state44
Table 6. Performance of the Random Forest model.
Table 6. Performance of the Random Forest model.
R2MAE
Pre-optimization training set0.94634.155
Pre-optimization test set0.87049.758
Training set after optimization0.9967.700
Test set after optimization0.91041.528
Table 7. Comparison between experimental and predicted CO yields for six representative test samples.
Table 7. Comparison between experimental and predicted CO yields for six representative test samples.
MaterialsTR Temperature (°C)CDS Temperature (°C)Experimental CO Yield (μmol/g-Material)Predicted CO Yield (μmol/g-Material)References
La0.6Sr0.4Mn0.5Cr0.5O312001200122.32125.16[44]
La0.7Sr0.3Mn0.9Co0.1O31400800325317.17[45]
La0.6Sr0.4Mn0.6Al0.4O314001050196207.26[30]
La0.5Sr0.5Mn0.75Al0.25O314001100330.00342.50[26]
La0.6Sr0.4MnO31100110083.9367.90[44]
La0.6Sr0.4Mn0.8Fe0.2O313501000329.9317.50[46]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, J.; Wang, K.; Xie, H.; Ma, K.; Li, K. Performance Prediction of Perovskite-Catalyzed CO2 Decomposition Based on Machine-Learning Method. Energies 2026, 19, 1388. https://doi.org/10.3390/en19061388

AMA Style

Chen J, Wang K, Xie H, Ma K, Li K. Performance Prediction of Perovskite-Catalyzed CO2 Decomposition Based on Machine-Learning Method. Energies. 2026; 19(6):1388. https://doi.org/10.3390/en19061388

Chicago/Turabian Style

Chen, Jiayi, Kun Wang, Huaqing Xie, Kerong Ma, and Kunlun Li. 2026. "Performance Prediction of Perovskite-Catalyzed CO2 Decomposition Based on Machine-Learning Method" Energies 19, no. 6: 1388. https://doi.org/10.3390/en19061388

APA Style

Chen, J., Wang, K., Xie, H., Ma, K., & Li, K. (2026). Performance Prediction of Perovskite-Catalyzed CO2 Decomposition Based on Machine-Learning Method. Energies, 19(6), 1388. https://doi.org/10.3390/en19061388

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop