Analysis of the Descriptors for the Oxidative Coupling of Methane Reaction, Using Varying Machine Learning Approaches

Ugwu, Lord; Morgan, Yasser; Ibrahim, Hussameldin

doi:10.3390/engproc2024076100

Open AccessProceeding Paper

Analysis of the Descriptors for the Oxidative Coupling of Methane Reaction, Using Varying Machine Learning Approaches^†

by

Lord Ugwu

¹

,

Yasser Morgan

² and

Hussameldin Ibrahim

^1,*

¹

Clean Energy Technologies Research Institute (CETRI), Process Systems Engineering, Faculty of Engineering and Applied Science, University of Regina, 3737 Wascana Parkway, Regina, SK S4S 0A2, Canada

²

Department of Systems and Enterprises, Stevens Institute of Technology, 1 Castle Point, Hoboken, NJ 07030, USA

^*

Author to whom correspondence should be addressed.

^†

Presented at the 1st International Conference on Industrial, Manufacturing, and Process Engineering (ICIMP-2024), Regina, Canada, 27–29 June 2024.

Eng. Proc. 2024, 76(1), 100; https://doi.org/10.3390/engproc2024076100

Published: 5 December 2024

(This article belongs to the Proceedings of 1st International Conference on Industrial, Manufacturing, and Process Engineering (ICIMP-2024))

Download

Browse Figures

Versions Notes

Abstract

The fusion of catalytic and electronic properties, coupled with empirical data, provides enriched perspectives into catalyst evaluation and design, thus propelling advancement and innovation in the domain of heterogeneous catalytic reactions, including the oxidative coupling of methane (OCM) reaction. Comparative assessment of various machine learning methodologies on OCM reaction datasets reveals that the Random Forest regression (RFR) model excels in C₂H₄ and C₂H₆ combined yield (C₂y) predictive accuracy, boasting an average R² value of 0.98. The hierarchy of modeling performance stands as follows: RFR > XGBR > SVR > DNN. The MSE and MAE metrics of the RFR models were observed to be lower compared to alternative models, ranging from 0.12 to 9.03 for MSE and 0.21 to 2.02 for MAE. Model accuracy follows the order of C₂H₆y > C₂H₄y > C₂y > CO₂y > CH₄_conv (methane conversion). When examining the influence of model features, C₂y increases proportionally with an augmentation in dataset attributes, including the quantity of alkali/alkali-earth metal moles in the catalyst (13.69%), the atomic number (6.24%) of the catalyst promoter, and the Fermi energy of the metal, with a less pronounced impact compared to the case of temperature (33.70%). This suggests a highly nonlinear correlation between combined ethylene and ethane yield and temperature. Other factors, such as the bandgap of the active metal oxide and the support, as well as the Fermi energy of the catalyst support, were observed to have a relatively modest effect on the predictive models for combined ethylene and ethane yield and methane conversion.

Keywords:

methane; catalyst; random forest regression; machine learning; comparison

1. Introduction

In pursuing the aspiration to develop efficient and novel catalysts for the OCM reaction, the exploration of computational approaches has recently surfaced as a strategy for examining the OCM reaction. Fang et al. [1] have defined the Mn/K₂MoO₄/Al₂O₃ catalyst as a leading OCM catalyst, achieving a notable 67% C₂ selectivity coupled with 38% CH₄ conversion. The innovation of new catalysts for OCM stands to significantly enhance the process’s feasibility, providing a viable approach for utilizing the abundant CH₄ greenhouse gas to produce the essential petrochemical, ethylene.

Employing artificial intelligence (AI) in the form of Machine Learning (ML) continues to support various OCM-related studies. ML facilitates the investigation of complex reaction pathways in oxidative coupling of methane (OCM) by training algorithms on diverse datasets, encompassing both experimental and computational information such as Quantum Mechanics (QM) data in the form of Density Functional Theory (DFT).

To improve OCM reaction efficiency at lower temperatures, ML and data reconstruction supported the targeted development of catalysts. Ohyama et al. [2] synthetically produced and evaluated 63 OCM catalysts, utilizing unsupervised ML to categorize datasets. The examination revealed a subset of catalysts demonstrating effectiveness at low temperatures, subsequently confirmed through experimentation. Discovery and validation of three previously undisclosed low-temperature OCM catalysts showcased the capacity of AI to assist in catalyst synthesis.

This study aims to leverage a fusion of catalyst electronic characteristics and extensive experimental data to construct and compare predictive models and pinpoint predominant catalytic-electronic-based descriptors for efficient catalyst design.

2. Methodology

The collection of data for this study involves merging High Throughput (HTP) Experimental information retrieved from the Catalyst Acquisition by Data Science (CADS) repository [3] with electronically calculated properties by Ugwu et al. [4,5], encompassing aspects like the Fermi energy, bandgap energy, and Magnetic Moment of the catalyst constituents—including the catalyst promoter, active metallic/bimetallic oxide, and the catalyst support. Alongside the electronically computed properties, the HTP OCM dataset aligns experimental conditions with their corresponding reaction results.

The dataset covers 12,708 reactions across 59 distinct catalyst compositions and a control sample, and contains the electronic characteristics of the catalyst components. It incorporates 34 attributes, comprising 8 DFT-computed electronic properties for the catalyst promoter, active metal oxide, and the support, and 26 features derived from the HTP experimental data. Several machine learning (ML) methodologies were assessed for analyzing the dataset:

Deep Neural Networks: encompassing Deep Feed-Forward Neural Networks (DNN)
Random Forest (RFR)
Support Vector Regression (SVR)
Extreme Gradient Boost Regression (XGBR)

The ML models, aimed at predicting specific targets/labels such as CO₂, C₂H₄, C₂H₆, C₂ (combined C₂H₄ and C₂H₆ yields), and CH₄ conversion, underwent training using a designated portion of the data for training and validation (80% of the dataset). Model optimization techniques like hyperparameter tuning and cross-validation were employed. The trained models were then assessed using a separate set of test data (20% of the dataset) that was not utilized during training. Various performance metrics, including the coefficient of determination (R²), Mean Squared Error (MSE), and Mean Absolute Error (MAE), were calculated to evaluate the models. Model comparison was based on their respective error rates.

3. Results and Discussion

3.1. Analysis of the SVR Models

An SVR model was devised for forecasting C₂ using only reaction conditions as dataset features (D1). This model, employing linear kernel and radial basis function, concurred with Ohayama et al.’s findings [6] based on R², affirming the nonlinear correlation between C₂ and dataset features. The radial basis function method, incorporating various polynomial kernels with different degrees, was utilized. Model scores indicated an enhancement from 0.77 when considering only reaction conditions (D1) to 0.93 when incorporating reaction conditions and catalyst electronic properties (D2), affirming the improvement upon including catalysts’ electronic properties in the dataset. The residual plot (Figure 1a) depicted a relatively normal data distribution in D2, comparing train and test data. A parity plot (Figure 1b) illustrated the alignment of real data with predicted data. R² for SVR (rbf) models ranged from 0.76 (C₂H₄y) to 0.93 (C₂y), with MSE and MAE varying from 2.83 to 2.44 and 1.11 to 0.95, respectively, in the following order: C₂y > CH₄_conv > C₂H₆y > CO₂y > C₂H₄y.

3.2. Analysis of the RFR Models

The RFR model analyzed diverse datasets, optimizing the number of trees to 300 for the entire dataset. Figure 2a,b showcased the visual illustration of the distribution of the predicted values against the residual values for the C₂y predictive model along with the parity plot. Like SVR, R² for D2 (0.97) surpassed D1 (0.85), consistent with Ohayama et al.’s results [6] (0.78), affirming the model’s validity and the positive impact of catalyst electronic properties. The accuracy of the RFR models is obviously better than that of SVR, as Ohayama et al. [6] noted. RFR predictive models for C₂H₄y, C₂H₆y, CO₂y, and CH₄_conv using D2 exhibited high values. The order of R² for labels was CH4_conv > C₂H₆y > CO₂y > C₂H₄y > C₂y. MSE and MAE for RFR models were higher than SVR, ranging from 0.12 to 9.03 for MSE and 0.21 to 2.02 for MAE.

3.3. Analysis of the DNN Models

In a bid to optimize the neural network configuration, various architectures were evaluated. A configuration with four layers was found effective, with normalized data and an ReLU activation function. The Adam optimizer was chosen for network optimization. Dropout layers were incorporated to tackle overfitting. R² for the C₂y predictive model was 0.88, with other labels ranging from 0.84 to 0.92. MAE and MSE varied across labels, similar to RFR and XGBR. Figure 3a,b compare the loss and validation loss plots in the overfitted model and the model with dropout layers implemented.

3.4. Analysis of the XGBR Models

The XGBR model’s performance for C₂y prediction was compared with Ohayama et al.’s results [6]. R² for D1 was 0.83, and for D2, it was 0.92, indicating the improvement upon including catalyst electronic properties. Model performance for other labels was also noteworthy. The model comparison suggested similar performances for D1 and D2 in XGBR and RFR. SVR and DNN performance, particularly in terms of R², were comparable. The overall performance order was RFR > XGBR > SVR > DNN. Figure 4a,b display the visual illustration of the distribution of the predicted values against the residual values for the C₂y predictive model, along with a parity plot.

3.5. Model Assessment

An evaluation of model performance on datasets D1 and D2 indicates comparable results between XGBR and RFR for both datasets across five of the target labels. Conversely, SVR and DNN models on D2 exhibit similar performance, particularly in R² comparisons. For predicting C₂y from D2, the model ranking based on R² > MSE > MAEis RFR > XGBR > SVR > DNN, as detailed in Figure 5. Regarding data fitting, the order for the labels is C₂H₆y > C₂H₄y > C₂y > CO₂y > CH₄_conv, depicted in Figure 6. Overall model performance ranking is RFR > XGBR > SVR > DNN.

3.6. Feature Impact Assessment

Common impactful features across RFR and XGBR include temperature, the CH₄/O₂ ratio, the number of moles of the alkali/alkali-earth metal in the catalyst, the bandgap of the active metal oxide, the atomic number of the catalyst promoter and the Fermi energy of the metal, with near-linear relationships to C₂y (Figure 7a,b). Figure 8a,b are bar charts representing the relative impact of the different dataset features on the C₂y RFR and XGBR predictive models, respectively.

Comparative 3D surface plots (Figure 9a–c) indicate the impact of M2 moles and promoter Fermi energy on C₂y at various temperatures. Higher M2 moles amplify the effect of the atomic number and Fermi energy of the promoter, indicating increased reactivity with more alkali/alkaline earth metals.

4. Conclusions

The amalgamation of catalyst electronic characteristics and reaction parameters within the dataset utilized for forecasting reaction circumstances, encompassing C₂H₆y, C₂H₄y, C₂y, CO₂y, and CH₄_conv, elevates model efficacy by roughly 10%, as evidenced by the comparison of R² values across predictive models for C₂y employing SVR, RFR, DNN, and XGBR. Comparative analysis across diverse ML methodologies indicates that RFR models, boasting an average R² of 0.98 for predictive models concerning the five reaction outcomes and labels, exhibit superior efficiency and accuracy over XGBR, SVR, and DNN (sequentially). Generally, the data fitting sequence for labels with respect to the employed modeling techniques was C₂H₆y > C₂H₄y > C₂y > CO₂y > CH₄_conv. The MSE and MAE metrics of RFR models tend to be lower compared to alternative modeling techniques, registering figures ranging from 0.12 to 9.03 for MSE and 0.21 to 2.02 for MAE. Numerous reaction circumstances and catalyst electronic properties wielded a considerable influence on the C₂y predictive model, including the alkali/alkali-earth moles count (13.69%), temperature (33.70%), catalyst promoter’s atomic number (6.24%), active metal oxide’s bandgap, promoter’s Fermi energy (4.31%) and methane to oxygen ratio.

Author Contributions

Conceptualization, L.U. and H.I.; methodology, L.U., Y.M. and H.I.; software, L.U.; validation, L.U. and H.I.; formal analysis, L.U.; investigation, L.U.; resources, L.U., H.I. and Y.M.; data curation, L.U.; writing—original draft preparation, L.U.; writing—review and editing, L.U. and H.I.; visualization, L.U.; supervision, H.I. and Y.M.; project administration, H.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Sciences and Engineering Research Council of Canada (NSERC DG: RGPIN-2024-04760), Canada Foundation for Innovation (CFI JELF: 37758), and the VPR Discretionary Fund at the U of R, which are gratefully acknowledged.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data used in this study are open-source data available for reuse at the Catalyst Acquisition by Data Science (CADS) repository—https://cads.eng.hokudai.ac.jp/datamanagement/datasources/21010bbe-0a5c-4d12-a5fa-84eea540e4be/ (accessed on 10 February 2024) and [4].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Geerts, J.W.M.H.; Chen, Q.; van Kasteren, J.M.N.; van der Wiele, K. Thermodynamics and Kinetic Modeling of the Homogeneous Gas Phase Reactions of the Oxidative Coupling of Methane. Catal. Today 1990, 6, 519–526. [Google Scholar] [CrossRef]
Fallah, B.; Falamaki, C. A New Nano-(2Li₂O/MgO) Catalyst/Porous Alpha-Alumina Composite for the Oxidative Coupling of Methane Reaction. AIChE J. 2010, 56, 717–728. [Google Scholar] [CrossRef]
Fujima, J.; Tanaka, Y.; Miyazato, I.; Takahashi, L.; Takahashi, K. Catalyst Acquisition by Data Science (CADS): A Web-Based Catalyst Informatics Platform for Discovering Catalysts. React. Chem. Eng. 2020, 5, 903–911. [Google Scholar] [CrossRef]
Ugwu, L.; Morgan, Y.; Ibrahim, H. Enhancing Ethene Production through Low-Temperature Oxidative Coupling of Methane: Leveraging DFT and Data Analysis for Crafting Innovative and Efficient Catalyst Compositions. Ind. Eng. Chem. Res. 2023, 62, 19658–19673. [Google Scholar] [CrossRef]
Ugwu, L.I.; Morgan, Y.; Ibrahim, H. Increasing Ethene Yield via Oxidative Coupling of Methane at Low Temperature: An Application of Machine Learning and DFT in the Design and Innovation of Effective Catalyst Compositions. In Proceedings of the 2023 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), Regina, SK, Canada, 24–27 September 2023; pp. 342–347. [Google Scholar] [CrossRef]
Ohyama, J.; Nishimura, S.; Takahashi, K. Data Driven Determination of Reaction Conditions in Oxidative Coupling of Methane via Machine Learning. ChemCatChem 2019, 11, 4307–4313. [Google Scholar] [CrossRef]

Figure 1. (a) Residual plot of C₂y SVR (rbf) model (D2); (b) parity plot of C₂y SVR (rbf) model (D2).

Figure 2. (a) a visual illustration of the residuals against the predicted values from the model for both the training and test data for label C2y; (b) a parity plot that compares the predictions to real values for D2.

Figure 3. (a) loss and validation loss for the C₂y predictive model over 200 epochs, and (b) the plot of the loss and validation loss of the overfitted C₂y predictive model.

Figure 4. (a) the visual distribution of the residual (training) data against the predicted (residual) data of the XGBR C₂y predictive model using reaction conditions and DFT-computed catalyst electronic properties as dataset features (D2); (b) a parity plot comparing the real data to the predicted data from the model.

Figure 5. C₂y SVR (rbf) model (D2) residual plot.

Figure 6. Comparison of predictive models based on the order of data fit.

Figure 7. (a) Fermi energy of the catalyst promoter metal vs. C₂y; (b) bandgap of the active metal oxide vs. C₂y.

Figure 8. The feature impact of the C2y predictive (a) RFR model, as well as that of (b) the XGBR.

Figure 9. Three-dimensional surface plots of C₂y against number of moles of M2, atomic number of M1, and fermi energy of M1; (a) 700 °C, (b) 800 °C, and (c) 900 °C.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ugwu, L.; Morgan, Y.; Ibrahim, H. Analysis of the Descriptors for the Oxidative Coupling of Methane Reaction, Using Varying Machine Learning Approaches. Eng. Proc. 2024, 76, 100. https://doi.org/10.3390/engproc2024076100

AMA Style

Ugwu L, Morgan Y, Ibrahim H. Analysis of the Descriptors for the Oxidative Coupling of Methane Reaction, Using Varying Machine Learning Approaches. Engineering Proceedings. 2024; 76(1):100. https://doi.org/10.3390/engproc2024076100

Chicago/Turabian Style

Ugwu, Lord, Yasser Morgan, and Hussameldin Ibrahim. 2024. "Analysis of the Descriptors for the Oxidative Coupling of Methane Reaction, Using Varying Machine Learning Approaches" Engineering Proceedings 76, no. 1: 100. https://doi.org/10.3390/engproc2024076100

APA Style

Ugwu, L., Morgan, Y., & Ibrahim, H. (2024). Analysis of the Descriptors for the Oxidative Coupling of Methane Reaction, Using Varying Machine Learning Approaches. Engineering Proceedings, 76(1), 100. https://doi.org/10.3390/engproc2024076100

Article Menu

Analysis of the Descriptors for the Oxidative Coupling of Methane Reaction, Using Varying Machine Learning Approaches^†

Abstract

1. Introduction

2. Methodology