Next Article in Journal
Boosting Toluene Oxidation over Ru-Doped CoMn2O4 Spinel Catalysts by Constructing Ru–O–Mn/Co Chains
Previous Article in Journal
A Cascade Process for CO2 to Methanol Driven by Non-Thermal Plasma: A Techno-Economic Assessment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning-Guided Inverse Analysis for Optimal Catalytic Pyrolysis Parameters in Hydrogen Production from Biomass

by
Vishal V. Persaud
1,*,
Abderrachid Hamrani
2,
Medeba Uzzi
1 and
Norman D. H. Munroe
1
1
Department of Mechanical and Materials Engineering, Florida International University, Miami, FL 33174, USA
2
Independent Researcher, Miami, FL 33165, USA
*
Author to whom correspondence should be addressed.
Catalysts 2026, 16(1), 105; https://doi.org/10.3390/catal16010105
Submission received: 13 December 2025 / Revised: 14 January 2026 / Accepted: 19 January 2026 / Published: 21 January 2026

Abstract

Catalytic pyrolysis (CP) of biomass is a promising method for producing sustainable hydrogen because lignocellulosic biomass is widely available, renewable, and approximately carbon-neutral. CP of biomass is influenced by complex, interdependent process parameters, making optimization challenging and time-consuming using traditional methods. This study investigated a two-stage machine learning (ML) framework fortified with Bayesian optimization to enhance hydrogen production from CP. The ML models were used to classify and predict hydrogen yield using a dataset of 306 points with 14 input features. The classification stage identified conditions favorable for good hydrogen yield, while the regression model (second stage) quantitatively predicted hydrogen yield. The random forest classifier and regressor demonstrated superior capabilities, achieving R2 scores of 1.0 and 0.8, respectively. The model demonstrated strong agreement with experimental data and effectively captured the key factors driving hydrogen production. Shapley Additive exPlanation (SHAP) identified temperature and catalyst properties (nickel loading) as the most influential parameters. The inverse analysis framework validated the model’s ability to determine optimal conditions for predicting targeted hydrogen yields by comparing it to experimental data reported in the literature. This AI-driven approach provides a scalable and data-efficient tool for optimizing processes in sustainable hydrogen production.

Graphical Abstract

1. Introduction

Global energy demand is currently on the rise and is expected to increase significantly to support the projected 9.8 billion people by 2033 [1]. Approximately 80% of the world’s energy currently comes from fossil fuels, which are significant contributors to pollution, greenhouse gas emissions, and climate change [2]. The need for more environmentally friendly energy sources is greater than ever. Hydrogen (H2) has emerged as a key energy vector that meets both current and future sustainable energy demands. However, current hydrogen production methods, although cost-effective, are far from carbon-free and primarily rely on fossil fuel sources (gray hydrogen). As the world transitions to more carbon-free hydrogen (green hydrogen), catalytic pyrolysis (CP) and electrolysis are seen as potential methods for facilitating this transformation [3]. Table 1 clearly outlines the current and potential hydrogen production technologies, their cost per kilogram (kg) of hydrogen, and the carbon dioxide (CO2) emissions associated with each. From Table 1, it is evident that both pyrolysis and gasification of biomass are key players to auger the transformation to ‘green hydrogen’. Global biomass production, estimated at 170 billion metric tons annually, serves as a critical renewable and carbon-neutral resource [4].
As summarized in Table 1, biomass-based pyrolysis and gasification exhibit hydrogen production cost in the range of approximately $2–$6 per kg H2 with carbon footprints below 1 kgCO2/kgH2, which makes them competitive with other low carbon pathways such as natural gas pyrolysis and renewable energy powered electrolysis, though steam methane reforming still offers lower direct cost at the expense of substantially higher emissions.
Catalytic pyrolysis of biomass involves the thermal degradation of biomass through anaerobic catalysis to produce three main products: gas, bio-oil, and bio-char [5]. It is deemed a promising method for producing carbon-neutral hydrogen [6]. The process is governed by a series of complex, cascading reactions, including cracking, steam reforming, and the water–gas shift reaction [7]. The products of these reactions are directly tied to process parameters such as feedstock composition, nature and composition of catalyst, and operational parameters (e.g., temperature and time) [8]. Biomass composition can be divided into three main components, namely cellulose (40–50%), hemicellulose (25–35%), and lignin (15–25%). Each of these degrades differently and produces different products that can be further cracked to enhance hydrogen production. The latter process is dependent on the properties of the catalyst. Cellulose, with its β-1, 4-glycosidic bonds, experiences depolymerization and dehydration at 315–355 °C to produce levoglucosan and other organic acids. Hemicellulose, with its branched structure, is more thermally labile and decomposes at temperatures between 230 and 315 °C to produce acetic acid and light oxygenates (CO2, CO, CH4, and C2-C3 hydrocarbons). Lignin, on the other hand, is more thermally stable and degrades over a broader range of temperatures (280–500 °C) through radical mechanisms to produce aromatic hydrocarbons that contribute to char formation [9]. The chemical reaction of CP occurs in two main stages: primary pyrolysis and secondary cracking. During primary pyrolysis, the biomass components are degraded to become feedstock for the secondary reactions, as shown in the following chemical equations [9,10].
Primary reactions:
Cellulose and hemicellulose:
( C 6 H 10 O 5 ) n H e a t V o l a t i l e   g a s e s + B i o o i l + C h a r
Lignin:
L i g n i n h e a t A r o m a t i c s + C h a r + G a s e s
During the secondary reactions in CP, the catalyst is the heart of the process and directly influences product quality, specifically the selectivity of hydrogen from the biomass. The type of catalyst chosen for the process is vital, as it offers a variety of benefits but also has its limitations. Transition metals such as nickel, cobalt, and iron are the most used due to their cost, thermal stability, and surface mobility [11]. However, they can be poisoned by coke formation and sintering. Zeolite catalysts are also very common in biomass CP as they offer shape selectivity due to their molecular sieving properties and promotion of Brønsted and Lewis acid sites. However, they have limited bifunctionality, which is crucial for enhancing hydrogen production through the water–gas shift reaction [12]. Thus, tailoring the catalysts for the secondary reactions shown in Equations (3)–(6) is critical in achieving optimal hydrogen production [13].
Catalytic cracking:
V o l a t i l e s ( C x H y O z ) C a t a l y s t C O + C O 2 + H 2 + C H 4 + C 2 H 4
Water–gas shift (WGS) reaction:
C O + H 2 O C O 2 + H 2 ( Δ H = 41   k J m o l )
Tar reforming:
T a r ( C m H n ) + n H 2 O n C O + [ n + ( m 2 ) ] H 2
Methane reforming:
C H 4 + H 2 O C O + 3 H 2 ( Δ H = 206.4   k J m o l )
In the secondary stage of CP, nickel-based catalysts are particularly effective for promoting the steam reforming of hydrocarbons and tars, the water–gas shift reaction, and methane reforming. The high intrinsic activity of metallic nickel for C-C and C-h bond scission accelerates the conversion of oxygenated volatiles and heavier tars into lighter gases, thereby increasing the hydrogen content of the syngas. Nickel catalysts also facilitate the water–gas shift reaction by enhancing Co adsorption and the subsequent reaction with steam to reform CO2 and H2.
CP products are directly influenced by process operational parameters such as reaction temperature, time, and the catalyst-to-biomass ratio. Temperature is crucial in determining reaction kinetics and pathways. Lower temperatures (250–450 °C) result in incomplete primary pyrolysis, thus producing more char and bio-oil, and less volatiles for reforming [14]. Intermediate temperatures (500–650 °C) provide optimal conditions for reforming reactions (WGS and methane) with good reaction rates and catalyst stability but can also lead to coke formation. Higher temperatures (700–1000 °C) enhance tar cracking and hydrogen evolution; however, they lead to catalyst deactivation through sintering [15,16,17]. The residence time is responsible for the trade-off between conversion and the extent of each reaction [3]. Short residence time limits tar conversion but also reduces coke formation. Optimal residence time enables excellent volatiles reforming and reduces undesirable H2-consuming reactions, such as carbon monoxide (CO) methanation. A longer residence time promotes better overall conversion of the biomass but can lead to decreased hydrogen selectivity and increased coke formation [18]. Catalyst-to-biomass ratios directly govern the reaction selectivity, the active sites available for biomass conversion, and hydrogen selectivity. Low ratios result in poor conversion and higher tar content. Producing hydrogen from biomass involves a complex, multidimensional optimization process, as the parameters are interdependent, exhibit nonlinear interactions, and change over time [19]. Traditional optimization approaches are labor-intensive and time-consuming, relying primarily on single-factor-at-a-time (SFAT) experimentation, which fails to capture the complex interactions between parameters. Thus, advanced computational methodologies, such as machine learning (ML), are needed to efficiently navigate the dynamic parameter space and identify the optimal parameters required for enhanced hydrogen production from the process.
ML is a subfield of artificial intelligence (AI), from which predictive models can be developed with experimental data [20]. ML models can capture complex relationships from multidimensional processes, such as CP, better than kinetic and computational fluid dynamics models [21]. Over the last six years, the application of ML models in the field of hydrogen production has grown significantly, and regression models have been at the forefront of the research. Models such as random forest regression (RFR), artificial neural networks (ANN), support vector regression (SVR), and physics-informed neural networks (PINN) are among the most popular in predicting hydrogen yield, reaction kinetics, and materials discovery. However, while these models offer a prediction in the forward pass, understanding the optimal parameters for desired outputs, such as maximum H2 yield and selectivity, presents another challenge to be overcome [22].
The recent advancements of ML in CP have enabled tremendous growth in predictive modelling and process optimization. A.B. Ehinmowo et al. investigated the hydrogen production optimization and methane conversion from methane derived from biomass using ML models. The Bayesian-optimized CatBoost regressor was found to be exceptional in predicting H2 yield with an accuracy score (R2) of 96.3% and the firefly-optimized SVR showed better promise for modeling methane conversion with an R2 of 95.5%. This study elucidated that reduction temperatures (500–700 °C) and calcination temperatures (400–600 °C), as well as catalyst weight (0.05–1.00 g), played a crucial role in achieving a high hydrogen yield. Smith et al. established that the reaction temperature of CP affected the water–gas shift reaction (WGS) by employing ANN and principal component analysis (PCA) [20]. Q. Tang et al. employed ML models to predict pyrolytic gas yield and its composition, with R2 greater than 85% and a root mean square error (RMSE) of less than 5.7%. This research established that temperature played the most significant role in predicting hydrogen yield, as determined by the feature importance analysis function for the models [22]. H.K. Balsora et al. predicted biomass pyrolysis kinetics with an R2 of 0.99 by integrating thermogravimetric analysis (TGA) into an ANN, highlighting the importance of biomass composition in estimating kinetics [21]. C. Ding et al. investigated hydrogen yield from biogas direct reforming using an automated ML algorithm (AutoGluon) to optimize process parameters. A temperature of 900–950 °C, a pressure of 0.15–0.3 bar, and a water flow rate of 24 g/h were predicted to improve the yield from 63.45% to 67.69%. The study employed Shapley Additive exPlanations (SHAP) for interpretability and multi-objective particle swarm optimization (MOPSO) for parameter optimization. This established ML as a pathway for enhancing hydrogen yield while reducing operational cost [23].
Unlike previous ML studies that focus solely on forward prediction of hydrogen yield from given process parameters, this study presents a novel two-stage inverse analysis ML framework adopted from cold spray technologies [24] and modified to determine the optimal process parameters for enhancing hydrogen yield through the CP of biomass. Experimental datasets were analyzed using supervised ML models. The dataset consisted of 14 input features, including catalyst properties, biomass composition, and reaction conditions, with the singular output being hydrogen yield. In the first stage, classification models identified operating regions that yielded ‘good’ hydrogen production (≥20%). In the second stage, regression algorithms quantitatively predicted hydrogen yield, while SHAP analysis was used to highlight the importance of the input features on hydrogen yield. The algorithm with the highest R2 value was chosen for the final model to develop the inverse ML framework. Bayesian optimization with Monte Carlo perturbations inverted the final model to determine the optimal operating conditions for the desired hydrogen yield. This AI-driven approach provides a practical solution for enhancing hydrogen production through guided experimental design, thereby reducing costs and labor. It also presents a unique approach to accelerate catalytic pyrolysis as the premier ‘green hydrogen’ production method.

2. Results

2.1. Model Performance

In this study, a comprehensive assessment of ML models was conducted with a two-stage predictive framework encompassing both classification and regression analyses. In stage 1, eight classification models were examined to demonstrate their ability to predict good or poor hydrogen yield, thereby establishing their effectiveness in discerning process feasibility. In stage 2, several regression models were rigorously tested, providing quantitative insights into their performance. The integrated forward modelling strategy, which couples classification and regression, offers a unique perspective on the model’s ability to predict hydrogen yield as a function of CP process parameters. This framework confirms the suitability of ML models for real-world hydrogen production applications. Finally, the inverse analysis framework is validated against experimental ground-truth data, demonstrating its robustness and practical applicability for optimizing CP parameters to achieve a desired hydrogen yield.

2.1.1. Classification Models

The classification stage evaluated eight ML models, including the random forest classifier (RFC), support vector machines (SVM), AdaBoost classifier (AB), logistic regression (LR), K-nearest neighbors (KNN), decision tree (DT), gradient boosting (GB), and naïve Bayes (NB). To ensure statistical robustness and mitigate overfitting, each model was validated through 10-fold cross-validation (CV). The models’ performances were assessed using six metrics, including accuracy, area under the curve (AUC), precision, recall, F1 score, and Matthews correlation coefficient (MCC), shown in Figure 1. This evaluation was necessary for determining the most suitable model for predicting good or poor hydrogen yield.
The accuracy metric first assessed the overall proportion of correct predictions (both positive and negative) for each model. The heatmap in Figure 1 shows that RFC (1.0) and AB (0.97) performed best. The AUC indicates the probability that the model ranks a randomly chosen positive instance higher than a negative one. Thus, a higher AUC indicates stronger discrimination between good and poor hydrogen yields. In summary, the RFC, logistic regression (LR), and decision tree (DT) classifiers outperformed the other classifiers, achieving values of 1.0 and 0.99, respectively. The algorithm’s ability to predict true positives and avoid false positives is measured by precision metrics. In this study, RFC achieved perfect precision (1.0), followed by DT (0.98), naïve Bayes (NB) (0.98), and AB (0.97), demonstrating RFC’s ability to predict cases with good hydrogen yield. The recall metric measures the model’s ability to identify the actual positive cases consistently. All algorithms showed superb performance, achieving perfect recall across all cases, predicting good hydrogen yields over poor ones except for NB, which only achieved a 0.88 recall. The F1 score is the harmonic mean of precision and recall, thus balancing the two metrics. All models performed well in balancing precision and recall, with RFC having the best outcome, as shown in Figure 1. The final performance metric MCC establishes the correlation between predicted values and actual outcomes. This vital metric for binary data classification showed best performance by RFC, which dominated with a value of 1.0.
Overall, RFC was identified as the best-performing model due to its high accuracy, robustness, and excellent F1 and MCC scores, which are all crucial metrics for the classification stage. The ensemble nature of RFC allows it to handle data variability while reducing overfitting, especially for complex processes like CP. This clearly indicates that the model delivers accurate predictions with minimal error, which is vital for reliable hydrogen-yield classification and enhanced hydrogen-yield prediction in the regression stage.

2.1.2. Regression Models

In the regression stage, several algorithms were employed and assessed using 10-fold CV to enhance their robustness and reduce overfitting. The performances of the models are displayed in Figure 2. Six key performance metrics were used to assess each model’s ability to predict hydrogen yield. Figure 2 shows that the best-performing models were RFR, extra trees (ET), and decision trees (DT).
The ensemble models dominated by achieving a high accuracy score (R2), low root mean squared error (RMSE), low mean absolute error (MAE), and short training time (TT). RFR was identified as the best model because it provided a reliable balance of accuracy and computational efficiency, making it ideal for hydrogen yield prediction in CP applications. ET and DT demonstrated excellent performance; however, their accuracy, error rates, and TT were higher than those of RFR, as shown in Figure 2. Conversely, models such as SVR, AB, and several linear models showed lower predictive accuracy, indicating limitations in capturing the complex relationships among CP parameters and hydrogen yield.
The RFR model was chosen for this study because its ensemble architecture effectively captures the complex nonlinear relationships among process parameters for CP. Thus, it is ideal for this two-stage approach and aids in achieving the main goal of improving hydrogen yield by optimizing CP process parameters.

2.2. Feature Importance and SHAP Analysis

The Shapley Additive explanation (SHAP) methodology was adopted for this study to provide a fundamental understanding of the feature importance and interpretability of both the RF classifier and regressor. The SHAP framework quantitatively assesses the contribution of each feature to the model’s predictions by averaging its marginal effects across all possible feature combinations. The framework elucidates how each input feature impacts the model, which in turn provides a deeper understanding of the CP process. Figure 3 provides a detailed overview of the key features that significantly influence the model and the process.
SHAP analysis of the classification model (Figure 3a) revealed that the final temperature (Ftemp) of the process had the most significant impact on the model’s ability to classify good or poor hydrogen yield with a SHAP value of 0.031. This indicated that shifts in the final temperature significantly affect the model’s result, identifying it as a critical parameter for enhancing hydrogen yield from CP. Nickel (Ni) loading in the catalyst also plays a vital role in improving hydrogen yield (SHAP value of 0.027), and this is followed by hydrogen (H) content in the biomass (SHAP value of 0.013). Ash content (AC), initial temperature (Itemp), and catalyst support (bio-char (BC) and NiO) also played an essential role in the model’s classification, albeit to a lesser extent, with SHAP values of 0.012, 0.010, and 0.008, respectively. The distribution indicates that while several parameters influence hydrogen yield, temperature and catalyst properties have the most significant impact.
The regression model (Figure 3b) focused on predicting hydrogen yield. Here, Ni loading emerged as the most influential feature with a SHAP value of 5.99, indicating its importance in achieving improved hydrogen yield. This was followed by the hydrogen (H) content in the biomass, and then the final temperature, with values of 2.97 and 1.37, respectively. While there is a shift in the order of the feature importance from the classification to the regression, the top three (3) features remain the same in both models, establishing their critical importance on hydrogen yield via CP.
In both models, catalyst (support) and biomass properties (AC, O, VM, and MC) play crucial roles in classifying and predicting hydrogen yield, even though to a slightly lesser extent shown by lower SHAP values, thus indicating their relevance to the CP process. One critical parameter not shown in the SHAP analysis was the catalyst’s calcination temperature; this is likely due to the limited variation in calcination temperatures across the dataset.
The SHAP framework analysis provided the fundamental understanding of which process parameters play a critical role in hydrogen yield prediction. It also highlights the importance of parameters like temperature, nickel loading, and biomass properties in maximizing hydrogen yield in CP applications.

2.3. Inverse Analysis

This section presents the results of the inverse analysis conducted to determine the optimal CP process parameters necessary to achieve the targeted hydrogen yield. A systematic series of parametric studies was conducted to optimize specific parameters and achieve the targeted hydrogen yield (H yield). This optimization framework was guided by experimental data curated from the current literature (Table 2), enabling the validation and benchmarking of the predicted results. This approach provides a systematic investigation of how critical parameters, such as temperature, Ni loading, biomass properties, catalyst support, and calcination temperature, influence hydrogen yield enhancement.

2.3.1. Temperature Optimization

Reaction temperature has been identified as a critical parameter at both the classification and regression stages, demonstrating its profound influence on hydrogen yield from CP applications. The predicted optimal temperatures for the targeted hydrogen yield strongly align with the values reported in the literature, as shown in Figure 4. The model predicted an optimal temperature of 537 °C for achieving a 60.1 vol% H yield, which is marginally lower than the ground truth of 550 °C but still within the standard deviation (SD) range. Also, for an H yield of 80.4 vol%, the model predicted a temperature of 783 °C, which is moderately higher but still within the standard deviation range. Most of the model’s predictions fall within the acceptable SD range for the targeted hydrogen yield. The slight variations in the ranges for the predicted and ground-truth temperatures may be attributed to other dependent parameters, such as catalyst, Ni loading, and moisture content in biomass, as well as inherent uncertainties in the ML model. These discrepancies illuminate opportunities for CP process modifications to enhance hydrogen yield, and the need for model refinement to improve prediction accuracy.

2.3.2. Nickel Loading Optimization

Nickel (Ni) loading was identified as a key factor in both stages of the model development and the CP process for increasing hydrogen yield. The optimal Ni loading for CP of biomass to boost hydrogen yield was determined using inverse analysis methodology and compared with experimental data from the literature. The prediction shows a good agreement between the optimal Ni loading and the ground truth for most of the targeted hydrogen yields, as shown in Figure 5. For a hydrogen yield of 65.1 vol%, the optimal Ni loading was 19 wt%, which is a 4.9 wt% increase from the reported experimental value of 14.1 wt%. For the targeted yields of 70.1 vol% and 93.7 vol%, the predicted optimal values were 32 wt% and 38 wt%, respectively. In contrast, the literature reported ground-truth values for both of these targeted yields as 35 wt%. The minor discrepancies between the predicted values and those reported in the literature establish the need for CP process adjustments for enhancing hydrogen yield in practical applications. It also provides keen insights into the need for refining the ML model. Overall, the inverse analysis methodology effectively captures the necessary nickel loading needed to obtain the preset targeted hydrogen yield.

2.3.3. Hydrogen Content in Biomass

Figure 6 presents the correlation between targeted hydrogen yield and hydrogen content (H%) in biomass, comparing ML-optimized predictions with experimental (ground-truth) values. The predicted H% increase overall was from 5% for a 42.5 vol% hydrogen yield to 11% for a 93 vol% hydrogen yield, revealing a positive dependency between hydrogen production and H% in biomass. The ground truths, however, range from 6.1% to 8.2%, noting marginal discrepancies between the predicted yield and the reported literature. This underscores that ML-guided optimization identifies conditions with enhanced hydrogen production potential. Notably, the standard deviation expanded with increasing yield, from ± 0.8% at low yields to approximately ± 1.5% at high yields, indicating growing uncertainty likely due to the nonlinear nature of CP. This suggests that parameters such as temperature and catalyst properties play a decisive role in enhancing hydrogen evolution from biomass. The model’s slight deviation from the ground truth reveals the need for model refinement and also unexplored parametric regions for maximizing hydrogen yield from CP applications.

2.3.4. Calcination Temperature Optimization

The catalyst calcination temperature plays a crucial role in enhancing the catalyst’s ability to facilitate hydrogen production from biomass. Figure 7. Shows the correlation between catalyst calcination temperature and hydrogen yield, comparing the data measured experimentally to that of the model’s prediction. For targeted hydrogen yields of 70.0 vol%, 75.8 vol%, and 80.4 vol%, the model predicted optimal calcination temperatures of 796 °C, 900 °C, and 783 °C, respectively. These predicted values exhibit strong agreement with experimental observations, as evidenced by minimal deviations and consistent uncertainty bounds clearly depicted in Figure 7. This consistency highlights the model’s reliability in estimating the optimal calcination temperature for enhancing the catalyst, thereby improving the overall hydrogen yield.

2.3.5. Catalyst Support Optimization

The comparison between experimental and predicted catalyst supports for targeted hydrogen yield is presented in Table 3. Hydrogen yield ranged from 42.5 vol% to 93.7 vol%, with calcium oxide (CaO) producing the lowest yields and carbon nanotubes (CNTs) producing the highest yields. Notably, the models accurately predicted the high-performing CNT support for three samples, aligning with the trend toward increased hydrogen yield. Discrepancies between the predicted and experimental results for intermediate yields suggest areas where the model needs further improvement. This also opens opportunities to explore catalyst optimization to boost hydrogen production. These findings offer quantitative evidence of the model’s usefulness in guiding support selection to maximize hydrogen yield from CP.

3. Discussion

This study demonstrates a novel approach that highlights the essential role of machine learning in overcoming the limitations of traditional trial-and-error optimization of reaction variables for catalytic pyrolysis. A strong ML-based inverse analysis framework was developed to determine optimal operating parameters for the catalytic pyrolysis of biomass. This initiative paves the way for scalable and sustainable hydrogen production. The ML framework integrates classification and regression models, enabling the prediction of both high yields and specific, quantitative hydrogen outputs. The findings show how effective this methodology is in providing crucial insights into the complex interactions between process parameters and hydrogen yield. The models developed through this methodology show good alignment between predictions and the experimental data available in the current literature. This approach offers a data-driven method to maximize hydrogen yield from CP through effective parameter optimization.
The classification model demonstrated exceptional predictive capability and robustness in identifying good hydrogen yield. The optimized RF classifier performed consistently well across all evaluation metrics, reaffirming the proven strength of ensemble-based models for complex classification problems associated with CP [24]. Its ability to discriminate between good and poor hydrogen yields provides a solid footing for the regression stage, thereby streamlining the overall two-stage modelling framework and improving computational efficiency by guiding the regression analysis exclusively toward good hydrogen yields.
The regression model built on the RF regressor displayed good accuracy in predicting hydrogen yield. It achieved a high R2 (0.8) and low RMSE (8.1) across cross-validation folds, indicating strong reliability and consistency. The ensemble design of the model allowed it to capture the complex nonlinear nature of the CP effectively [30]. It demonstrated robustness and adaptability by providing keen insights into the intricate relationships between process parameters and hydrogen yields. The results align with previous research, which indicates that ensemble methods are particularly effective in predicting yields from hydrogen production processes [21,23,31].
The SHAP analysis revealed that nickel (Ni) loading of the catalyst and temperature (Ftemp) are key drivers of hydrogen production and process behavior in catalytic pyrolysis of biomass. The findings are chemically intuitive as both parameters directly control the reaction kinetics and thermodynamic favorability of hydrogen evolution from biomass [32]. The Ni loading had the highest mean SHAP value as shown in Figure 3, thus underscoring its pivotal role in enhancing catalytic activity by increasing the number of active sites that promote the cracking of biomass-derived hydrocarbons and accelerating the reforming reaction for hydrogen evolution [33]. The temperature is similarly significant, and its SHAP value highlights its critical role in driving the endothermic cracking, tar decomposition, and gas-phase reforming processes essential for improving hydrogen yields [7]. SHAP also revealed that variables such as hydrogen content in biomass (H-content) and catalyst support (dolomite and CNTs) play a moderate role in feedstock reactivity and the catalytic dispersion of products. These findings are consistent with established CP mechanisms, where metal loading, support, and temperature synergistically improve reaction kinetics and gas quality [28,29]. Overall, SHAP analysis provides an effective strategy for enhancing hydrogen production by optimizing specific parameters in the CP process.
The inverse analysis revealed fundamental insights into the thermochemical landscape of the CP of biomass for hydrogen production. Temperature plays a key role in driving hydrogen production from biomass. The ML framework presented in this study successfully predicts the optimal temperature range between 550 and 900 °C, directly corresponding to the activation energy requirements of key hydrogen-producing reactions [16,34]. At initial temperatures of 550–600 °C, primary pyrolysis dominates through the depolymerization and fragmentation of cellulose, hemicellulose, and lignin, as shown in Equations (1) and (2), producing volatile gases and other byproducts with hydrogen yields of about 40–50% [33]. This is exhibited in the model’s predictions, which have narrow confidence intervals, indicating predictable first-order decomposition kinetics. Hydrogen output increases at intermediate temperature ranges (600–750 °C) where secondary gas-phase reactions such as catalytic cracking and the water–gas shift (Equations (3) and (4)) become more dominant. This is shown with the model predicting an optimal temperature of 738 °C for a hydrogen yield of 80.4 vol%.
The model predicts well; however, the widening of the standard deviation ranges suggests the presence of competing reactions, such as tar cracking, methanation, and possible coke formation. The convergence towards predictions of optimal temperatures of 850–950 °C, at the highest yields, reflects thermodynamic constraints that favor endothermic reforming reactions for maximizing hydrogen yields [15,35]. Thus, it captures the complexity where multiple temperature–catalyst combinations are required to achieve yields comparable to those of the ground truths. This inverse analysis approach provides a data-driven pathway to optimize the CP process temperature, thereby maximizing hydrogen yield and reducing both cost and time.
The optimization results highlight the complex interplay between catalyst design parameters and CP of biomass performance. Machine learning effectively captures the nonlinear relationships impacting hydrogen production, particularly across variations in Ni loading, calcination temperature, and support selection. As shown in Figure 2, the model accurately predicts the optimal Ni loading for different hydrogen yield targets. Lower yields (42–53 vol%) are achieved with moderate Ni loadings of about 5–10 wt%, while higher yields (>70 vol%) require substantially greater loadings of 32–38 wt%. The optimization of calcination temperature (Figure 7) reveals a narrow yet critical range (450–580 °C) reflecting the delicate thermal balance needed for catalyst activation. Temperatures below this range lead to poorly crystallized, weakly bound Ni species with limited activity, while excessive calcination triggers sintering and pore collapse [17,36,37], thus reducing the active site density, and accessibility, which are essential for tar cracking, methane reforming, and the water–gas shift reaction [38].
Catalyst supports comprising traditional oxides such as CaO, Al2O3, LaCoO3, MCM-41, and SiO2 generally produce moderate hydrogen yields (42–65 vol%), attributed to their surface acidity, metal–support interactions, and phase stability during calcination. In contrast, carbon-based supports (CNT, and dolomite) achieve higher yields (70–94%) through different mechanisms [39]. CNTs leverage their exceptional electron conductivity; thermal resilience that minimizes calcination-induced structural changes; and graphitic surfaces that enhance Ni reducibility while resisting carbon deposition [26,28]. Dolomite, on the other hand, decomposes during calcination to form a bifunctional CaO–MgO matrix that not only catalyzes reforming reactions but also captures CO2, effectively regenerating active sites and shifting reaction equilibria toward hydrogen production [8,29,40].
The widening prediction intervals for higher, targeted hydrogen yields and intermediate Ni loadings indicate greater chemical sensitivity and nonlinear behavior within this range. In this context, the precise balance of calcination-induced crystallinity, optimal metal dispersion, and support basicity becomes crucial. A small variation of 20–30 °C or a 2–3 wt% change in loading can shift the balance between effective steam reforming and harmful coke formation [17,41,42]. These findings explain why certain parameter combinations reach breakthrough performance while others plateau. Overall, this study demonstrates how machine learning can uncover hidden relationships between structure and performance, providing valuable insights to guide the design of catalysts for sustainable hydrogen production.
The hydrogen (H) content in biomass plays a vital role in the catalytic cracking of volatiles to produce a more hydrogen-rich (H2) syngas. The model predicted the optimal H-content range between 10 and 11% with a relatively small standard deviation, as shown in Figure 6. Biomass with this H-content has the necessary hydrogen-to-carbon (H/C) ratio that efficiently promotes the C-C bond scission over undesirable condensation reactions [21,43]. A higher H-content is essential for stabilizing free radicals produced during the primary pyrolysis reaction and helps direct the carbon flow toward gas formation instead of tar and char [35,44]. The contrast between the predicted and ground-truth H-content reveals that naturally occurring biomass is predominantly hydrogen-deficient and requires augmentation with other process parameters to improve overall hydrogen production. These findings provided fundamental insights into the need for engineered energy crops and the optimization of complementary parameters for sustainable hydrogen production.
The results of this study provide a solid foundation for improving the hydrogen yield through parameter optimization utilizing an AI-driven approach. The minor variations from the predicted optimal parameters to the ground truths present the opportunity for future research to refine the model through expansion of the dataset and process parameters for CP. This inverse modelling approach provides an excellent framework to advance the scalability of CP for sustainable hydrogen production.

4. Materials and Methods

4.1. Data Collection

Experimental data were collected and curated from central databases, including Web of Science, Scopus, and ScienceDirect, through specific keyword searches as shown in Figure 8. A dataset of 306 data points was compiled from the literature published between 2007 and 2025. The dataset comprised solely experimental data extracted from research published during this period. The variables required for developing the ML model were divided into two dimensions: input features and output target. The dataset consisted of 14 input features divided into three (3) categories: biomass composition (proximate and ultimate analyses), catalyst properties (nickel loading, calcination temperature, and support), and reaction conditions (temperature and time). There was one output target, which was the hydrogen yield.

Data Description

The curated dataset comprises 14 input features, each describing a distinct physicochemical property of the CP of biomass, as detailed in Appendix A. Biomass composition is separated into two main sets: proximate analysis, which consists of moisture content (MC), volatile matter (VM), ash content (AC), and fixed carbon (FC). The set of biomass properties focused on was determined by the ultimate analysis and consisted of carbon (C), hydrogen (H), oxygen (O), and any residual nitrogen (N). The catalyst attributes included nickel loading (5–38 wt%), calcination temperature (400–800 °C), and support type (e.g., Al2O3, CaO, carbon nanotubes, dolomite, MCM-41, and SiO2). The core reaction conditions that were chosen for this study included reaction temperature, divided into starting temperature (Itemp), 250–600 °C, and final temperature (Ftemp), 500–900 °C. Residence time was the final input feature selected, ranging from 30 to 180 min in the dataset. This multivariate dataset provided the necessary breadth to develop this two-stage process while preserving the intricate, nonlinear interdependencies governing hydrogen yield in CP.

4.2. Data Preprocessing

The dataset included multidimensional features, comprising both numerical variables (such as biomass properties and reaction conditions) and categorical variables (catalyst support), all of which affected the ML model’s performance. To make the categorical data compatible with the regression algorithm, one-hot encoding (OHE) was applied. This technique transforms each category into a binary vector, where a ‘1’ indicates the presence of that category and a ‘0’ indicates absence, maintaining mutual exclusivity without implying any order [45]. Numerical features remained continuous but were normalized with min/max normalization to reduce outlier effects and ensure all features contributed equally during training. The following equation was used to calculate the normalized data.
x n o r m = x x m i n x m a x x m i n
x: Original value of the feature.
xmin: Minimum value of the feature.
xmax: Maximum value of the feature.
xnorm: Normalized value of the feature.

4.3. Model Development

A two-stage ML framework was developed to determine optimal parameters for enhancing hydrogen yield. The two stages consisted of a classification model followed by a forward-regression model trained on the known dataset to predict hydrogen yield. This was followed by an inverse analysis using Bayesian optimization to explore the parameter space and identify the optimal parameter combination for H2 yield. At each stage of model development, SHAP analysis was used to interpret the contributions of each feature.

4.3.1. Stage 1—Classification

In the classification stage of the framework, ML models focused on determining whether a good hydrogen yield was achieved based on the input parameters. This stage provided the fundamental foundation for further analysis in predicting hydrogen yield in the regression stage.
A comprehensive ML approach was employed to determine the best-performing model for predicting H2 yield from CP. Classification models such as random forest classifier (RFC), support vector machines (SVM), AdaBoost (AB), logistic regression (LR), K-nearest neighbor (KNN), decision tree (DT), gradient boosting (GB) and naïve Bayes (NB) were evaluated using metrics such as accuracy (R2 score), area under the curve (AUC), recall, precision, F1-score, and Matthews correlation coefficient (MCC). The dataset, which includes process parameters and their corresponding hydrogen yields, was converted into a binary classification problem. Hydrogen yields below 20% were labeled as “poor” (or 0 for binary), and those with yields of 20% or higher were labeled as “good” (or 1 for binary). This reflected the chemical and operational thresholds for effective catalytic conversion, enabling the model to focus on achieving a good hydrogen yield over specific H2 values. The dataset was split into training (80%) and validation (20%) sets, with the model optimized with accuracy, precision, and recall metrics to minimize misclassification. Hyperparameters for each model were optimized through 10-fold cross-validation, with the 20% validation set used to assess the models’ ability to predict hydrogen yield under unseen conditions.

4.3.2. Stage 2—Regression

In the second stage of this study, the ML framework focused on predicting specific H2 yield values when CP was classified as good. This regression step expanded on the initial classification findings, providing a more detailed analysis of how process parameters affected the precise H2 yield, which is vital for process optimization.
Regression models, including RFR, decision trees (DT), SVR, extra trees (ET), ridge, laso, Bayesian ridge, AdaBoost artificial neural network (ANN), and k-neighbors were systematically evaluated and ranked using performance metrics such as R2, RMSE, and mean absolute error (MAE), shown in Equations (8)–(11). Model training utilized the same dataset as the classification stage but focused specifically on cases where good hydrogen production (>20%) was predicted with H2 yield as the continuous target feature. The data was split into training and validation sets (80% and 20%, respectively), and K-fold (10-fold) cross-validation was employed to fine-tune the hyperparameters and mitigate overfitting. The model’s predictive accuracy was assessed on the validation set, establishing a robust basis for determining the optimal parameters from the inverse analysis stage.
R 2 = 1 i = 1 n ( y i e x p y i P r e d ) 2 i = 1 n ( y i e x p y ¯ a v g e x p ) 2
M A E = 1 n i = 1 n | y i e x p y i p r e d |
R M S E = 1 n i = 1 n ( y i e x p y i p r e d ) 2
where y i e x p represents the experimentally reported hydrogen yield and y i p r e d represents the model’s prediction.

4.4. Forward Modelling Approach (FMA)

The FMA in this study integrated both stages of the ML framework to predict hydrogen yield for CP using input process parameters. In the first stage, the classification model determined whether good H2 production was likely under specific process conditions, including catalyst type, nickel loading, reaction temperature, biomass composition, and residence time. If a good yield was predicted, the model advanced to the second stage, where the regression model estimated the exact hydrogen yield. This sequential method ensured computational efficiency by applying the regression model only to relevant cases, and further, combining the two ML models enabled more accurate, targeted predictions.

4.5. Inverse Analysis Through Bayesian Optimization

Bayesian optimization is a powerful method for optimizing complex, black-box functions that are costly to assess, making it ideal for inverse analysis of CP process parameters. The main objective of this study was to identify the optimal combination of parameters (catalyst support, nickel loading, reaction temperature, biomass composition, and residence time) to achieve a targeted H2 yield. In this approach, the objective function was defined as the difference between the predicted yield (from the ML framework) and the desired yield.
E r r o r = | H 2   d e s i r e d H 2   p r e d i c t e d |
The optimization process utilized Bayesian optimization with a Gaussian process (GP) model to efficiently explore the parameter space. It iteratively selected different combinations of process parameters that minimize the difference between the predicted and desired hydrogen yield. This approach struck a balance between exploration (searching new areas of parameter space) and exploitation (focusing on promising regions) Figure 9. The method provided a practical parameter-optimization framework that significantly reduced the number of required experiments, providing a more efficient alternative to conventional trial-and-error methods.
Monte Carlo simulations were incorporated into the Bayesian optimization framework to account for uncertainties in CP process parameters during H2 production. Real-world variability was modeled by adding Gaussian noise to reaction temperatures and nickel loading. For each evaluation of the objective function, multiple predictions were produced, and the mean error was calculated, ensuring that the suggested parameters were resilient to practical fluctuations. This combined approach enhanced the reliability of identifying the process parameters that consistently produced the target hydrogen yield under varying conditions.

5. Conclusions

Machine learning models present a transformative opportunity to accelerate data-driven experimental design by uncovering complex nonlinear relationships among process variables. This study demonstrates that a data-driven two-stage ML framework can reliably predict and optimize hydrogen production from CP of biomass. First, a random forest classifier identifies operating regions that yield ‘good’ hydrogen production (≥20%) with an accuracy of 1.0, AUC of 1.0, and a MCC of 1.0. Subsequently, the random forest regressor provided accurate quantitative prediction (R2 = 0.80 and RMSE = 8.1) hydrogen yield on unseen data. SHAP analysis identified final reaction temperature and nickel loading as the dominant drivers of both classification and regression performance. This was followed by biomass hydrogen content catalyst support and calcination temperature. By coupling the forward models with Bayesian optimization, the inverse design model generated optimal operating conditions that closely matched experimentally reported across a wide range of target yields. This is established where the model predicted reaction temperature of 537 °C for 60% yield versus the literature value of 550 °C; optimal nickel loading of 19 wt% for 65% yield compared with reported 14.1 wt%. Similar agreement was observed for catalyst calcination temperature, hydrogen content in biomass, and support. Importantly, Monte Carlo perturbations were embedded within the Bayesian-optimization loop to propagate realistic experimental uncertainty and produce confidence intervals for the suggested parameters, thereby enhancing the robustness of the inverse design recommendations. This study presents a robust, data-driven approach for optimizing thermochemical processes and promoting sustainability.
To our knowledge, this is the first study to apply such a two-stage inverse analysis for enhancing hydrogen production. The results from this study provide actionable design guidance that are cost- and time-efficient alternatives to conventional experimental methods. However, limited data availability remains a key constraint to building highly generalizable models. Future research should aim to expand this framework with more diverse datasets focusing on biomass nature and various catalyst formulations and properties. This will provide deeper insights into parameter interactions, thereby enhancing its prediction accuracy and guiding the rational design of scalable CP applications.
This study delivers a fundamental AI-enabled workflow that captures the complex, nonlinear interactions governing CP and offers a practical pathway for accelerating the development of sustainable high-yield green hydrogen processes from biomass.

Author Contributions

Conceptualization, V.V.P.; methodology, V.V.P. and A.H.; software, V.V.P. and A.H.; validation, M.U. and N.D.H.M.; formal analysis, V.V.P.; investigation, V.V.P. and A.H.; resources, V.V.P. and N.D.H.M.; data curation, V.V.P.; writing—original draft preparation, V.V.P.; writing—reviewing and editing, N.D.H.M., V.V.P., M.U. and A.H.; visualization, V.V.P.; supervision, N.D.H.M. and A.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that support the findings for this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors gratefully acknowledge the technical and financial support provided by the Department of Mechanical and Materials Engineering at Florida International University (FIU). This work was also supported by the Florida International University Graduate School through Dissertation and Presidential Fellowships.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Input features for ML model development.
Table A1. Input features for ML model development.
FeatureMeaningUnitsRange
Biomass properties
MCMoisture contentwt%0–23.1
VMVolatile matterwt%0–93.37
FCFixed carbonwt%0–22
ACAsh contentwt%0–18
CCarbonwt%29.86–61.33
HHydrogenwt%5.2–9.95
OOxygenwt%32.16–59.73
NNitrogenwt%0.1–2.26
Reaction conditions
ItempStarting temperature°C250–600
FtempFinal/ending temperature°C500–900
tTimeminutes30–180
Catalyst properties
Ni_loadingNickel loadingwt%5–38
Cal_TempCalcination temperature°C400–800
SupportCatalyst support--

References

  1. IEA. Recommendations of the Global Commission on People-Centred Clean Energy Transitions. Available online: https://www.iea.org/reports/recommendations-of-the-global-commission-on-people-centred-clean-energy-transitions (accessed on 8 January 2026).
  2. International Energy Agency. Global Hydrogen Review 2022. Available online: https://www.iea.org/reports/global-hydrogen-review-2022 (accessed on 8 January 2026).
  3. Maniscalco, M.P.; Longo, S.; Cellura, M.; Miccichè, G.; Ferraro, M. Critical Review of Life Cycle Assessment of Hydrogen Production Pathways. Environments 2024, 11, 108. [Google Scholar] [CrossRef]
  4. Lopez, G.; Santamaria, L.; Lemonidou, A.; Zhang, S.; Wu, C.; Sipra, A.T.; Gao, N. Hydrogen Generation from Biomass by Pyrolysis. Nat. Rev. Methods Primers 2022, 2, 20. [Google Scholar] [CrossRef]
  5. Vuppaladadiyam, A.K.; Vuppaladadiyam, S.S.V.; Awasthi, A.; Sahoo, A.; Rehman, S.; Pant, K.K.; Murugavelh, S.; Huang, Q.; Anthony, E.; Fennel, P.; et al. Biomass Pyrolysis: A Review on Recent Advancements and Green Hydrogen Production. Bioresour. Technol. 2022, 364, 128087. [Google Scholar] [CrossRef] [PubMed]
  6. Morya, R.; Raj, T.; Lee, Y.; Kumar Pandey, A.; Kumar, D.; Rani Singhania, R.; Singh, S.; Prakash Verma, J.; Kim, S.-H. Recent Updates in Biohydrogen Production Strategies and Life–Cycle Assessment for Sustainable Future. Bioresour. Technol. 2022, 366, 128159. [Google Scholar] [CrossRef]
  7. Blanquet, E.; Williams, P.T. Biomass Pyrolysis Coupled with Non-Thermal Plasma/Catalysis for Hydrogen Production: Influence of Biomass Components and Catalyst Properties. J. Anal. Appl. Pyrolysis 2021, 159, 105325. [Google Scholar] [CrossRef]
  8. Yue, W.; Ma, X.; Yu, Z.; Liu, H.; Li, M.; Lu, X. Ni-CaO Bifunctional Catalyst for Biomass Catalytic Pyrolysis to Produce Hydrogen-Rich Gas. J. Anal. Appl. Pyrolysis 2023, 169, 105872. [Google Scholar] [CrossRef]
  9. Chen, W.-H.; Farooq, W.; Shahbaz, M.; Naqvi, S.R.; Ali, I.; Al-Ansari, T.; Saidina Amin, N.A. Current Status of Biohydrogen Production from Lignocellulosic Biomass, Technical Challenges and Commercial Potential through Pyrolysis Process. Energy 2021, 226, 120433. [Google Scholar] [CrossRef]
  10. Huang, F.; Baird, R.; Yi, W.; Sanna, A. Hydrogen Production by Sorption Enhanced Catalytic Pyrolysis of Lignin Waste in Presence of Novel Potassium Stannate. Int. J. Hydrogen Energy 2025, 103, 255–267. [Google Scholar] [CrossRef]
  11. Magoua Mbeugang, C.F.; Mahmood, F.; Ali, M.; Tang, J.; Li, B. H2-Rich Syngas Production and Tar Removal over Biochar-Supported Ni-Fe Bimetallic Catalysts during Catalytic Pyrolysis-Gasification of Biomass. Renew. Energy 2025, 243, 122547. [Google Scholar] [CrossRef]
  12. Xie, Y.; Zhang, Y.; He, L.; Jia, C.Q.; Yao, Q.; Sun, M.; Ma, X. Anti-Deactivation of Zeolite Catalysts for Residue Fluid Catalytic Cracking. Appl. Catal. A Gen. 2023, 657, 119159. [Google Scholar] [CrossRef]
  13. Efika, C.E.; Wu, C.; Williams, P.T. Syngas Production from Pyrolysis-Catalytic Steam Reforming of Waste Biomass in a Continuous Screw Kiln Reactor. J. Anal. Appl. Pyrolysis 2012, 95, 87–94. [Google Scholar] [CrossRef]
  14. Compagnoni, M.; Tripodi, A.; Di Michele, A.; Sassi, P.; Signoretto, M.; Rossetti, I. Low Temperature Ethanol Steam Reforming for Process Intensification: New Ni/MxO–ZrO2 Active and Stable Catalysts Prepared by Flame Spray Pyrolysis. Int. J. Hydrogen Energy 2017, 42, 28193–28213. [Google Scholar] [CrossRef]
  15. Tomczyk, A.; Sokołowska, Z.; Boguta, P. Biochar Physicochemical Properties: Pyrolysis Temperature and Feedstock Kind Effects. Rev. Environ. Sci. Bio.-Technol. 2020, 19, 191–215. [Google Scholar] [CrossRef]
  16. Waheed, Q.M.K.; Williams, P.T. Hydrogen Production from High Temperature Pyrolysis/Steam Reforming of Waste Biomass: Rice Husk, Sugar Cane Bagasse, and Wheat Straw. Energy Fuels 2013, 27, 6695–6704. [Google Scholar] [CrossRef]
  17. Ochoa, A.; Bilbao, J.; Gayubo, A.G.; Castaño, P. Coke Formation and Deactivation during Catalytic Reforming of Biomass and Waste Pyrolysis Products: A Review. Renew. Sustain. Energy Rev. 2020, 119, 109600. [Google Scholar] [CrossRef]
  18. Liu, H.; Tang, Y.; Ma, X.; Yue, W.; Chen, W. Catalytic Pyrolysis of Corncob with Ni/CaO Dual Functional Catalysts for Hydrogen-Rich Gas. J. Taiwan Inst. Chem. Eng. 2023, 150, 105059. [Google Scholar] [CrossRef]
  19. Chen, F.; Wu, C.; Dong, L.; Vassallo, A.; Williams, P.T.; Huang, J. Characteristics and Catalytic Properties of Ni/CaAlOx Catalyst for Hydrogen-Enriched Syngas Production from Pyrolysis-Steam Reforming of Biomass Sawdust. Appl. Catal. B 2016, 183, 168–175. [Google Scholar] [CrossRef]
  20. Ehinmowo, A.B.; Nwaneri, B.I.; Olaide, J.O. Predictive Modeling of Hydrogen Production and Methane Conversion from Biomass-Derived Methane Using Machine Learning and Optimisation Techniques. Next Energy 2025, 7, 100229. [Google Scholar] [CrossRef]
  21. Balsora, H.K.; Kartik, A.; Dua, V.; Joshi, J.B.; Kataria, G.; Sharma, A.; Chakinala, A.G. Machine Learning Approach for the Prediction of Biomass Pyrolysis Kinetics from Preliminary Analysis. J. Environ. Chem. Eng. 2022, 10, 108025. [Google Scholar] [CrossRef]
  22. Tang, Q.; Chen, Y.; Yang, H.; Liu, M.; Xiao, H.; Wang, S.; Chen, H.; Raza Naqvi, S. Machine Learning Prediction of Pyrolytic Gas Yield and Compositions with Feature Reduction Methods: Effects of Pyrolysis Conditions and Biomass Characteristics. Bioresour. Technol. 2021, 339, 125581. [Google Scholar] [CrossRef] [PubMed]
  23. Ding, C.; Zhang, Y.; Lu, B.; Feng, Y.; Li, W.; Peng, J.; Huang, H.; Cheng, Z.; Li, L.; Li, Y.; et al. AI Data-Driven Based In-Depth Interpretation and Inverse Design for Hydrogen Yield from Biogas Direct Reforming. ACS Sustain. Resour. Manag. 2024, 1, 2384–2393. [Google Scholar] [CrossRef]
  24. Hamrani, A.; Medarametla, A.; John, D.; Agarwal, A. Machine-Learning-Driven Optimization of Cold Spray Process Parameters: Robust Inverse Analysis for Higher Deposition Efficiency. Coatings 2024, 15, 12. [Google Scholar] [CrossRef]
  25. Salehi, E.; Azad, F.S.; Harding, T.; Abedi, J. Production of Hydrogen by Steam Reforming of Bio-Oil over Ni/Al2O3 Catalysts: Effect of Addition of Promoter and Preparation Procedure. Fuel Process. Technol. 2011, 92, 2203–2210. [Google Scholar] [CrossRef]
  26. Wu, C.; Wang, Z.; Huang, J.; Williams, P.T. Pyrolysis/Gasification of Cellulose, Hemicellulose and Lignin for Hydrogen Production in the Presence of Various Nickel-Based Catalysts. Fuel 2013, 106, 697–706. [Google Scholar] [CrossRef]
  27. Hu, Y.; Yu, Z.; Yue, W.; You, Z.; Ma, X. NiO-LaCoO3 Catalysts for Biomass Pyrolysis to Hydrogen-Rich Gas. J. Ind. Eng. Chem. 2024, 143, 382–391. [Google Scholar] [CrossRef]
  28. Hou, T.; Yuan, L.; Ye, T.; Gong, L.; Tu, J.; Yamamoto, M.; Torimoto, Y.; Li, Q. Hydrogen Production by Low-Temperature Reforming of Organic Compounds in Bio-Oil over a CNT-Promoting Ni Catalyst. Int. J. Hydrogen Energy 2009, 34, 9095–9107. [Google Scholar] [CrossRef]
  29. Fabrice Magoua Mbeugang, C.; Li, B.; Xie, X.; Wei, J.; Isa, Y.M.; Kozlov, A.; Penzik, M. Catalysis/Sorption Enhanced Pyrolysis-Gasification of Biomass for H2-Rich Gas Production: Effects of Various Nickel-Based Catalysts Addition and the Combination with Calcined Dolomite. Fuel 2024, 372, 132195. [Google Scholar] [CrossRef]
  30. Hamrani, A.; Agarwal, A.; Allouhi, A.; McDaniel, D. Applying Machine Learning to Wire Arc Additive Manufacturing: A Systematic Data-Driven Literature Review. J. Intell. Manuf. 2024, 35, 2407–2439. [Google Scholar] [CrossRef]
  31. Bilgiç, G.; Bendeş, E.; Öztürk, B.; Atasever, S. Recent Advances in Artificial Neural Network Research for Modeling Hydrogen Production Processes. Int. J. Hydrogen Energy 2023, 48, 18947–18977. [Google Scholar] [CrossRef]
  32. Liu, H.; Tang, Y.; Ma, X.; Yue, W. Catalytic Pyrolysis of Corncob with Ni/CaO Catalysts for Hydrogen-Rich Gas: Synthesis Modes and Catalyst/Biomass Ratios. J. Ind. Eng. Chem. 2023, 123, 51–61. [Google Scholar] [CrossRef]
  33. Akubo, K.; Nahil, M.A.; Williams, P.T. Pyrolysis-Catalytic Steam Reforming of Agricultural Biomass Wastes and Biomass Components for Production of Hydrogen/Syngas. J. Energy Inst. 2019, 92, 1987–1996. [Google Scholar] [CrossRef]
  34. Qinglan, H.; Chang, W.; Dingqiang, L.; Yao, W.; Dan, L.; Guiju, L. Production of Hydrogen-Rich Gas from Plant Biomass by Catalytic Pyrolysis at Low Temperature. Int. J. Hydrogen Energy 2010, 35, 8884–8890. [Google Scholar] [CrossRef]
  35. Ren, J.; Cao, J.-P.; Zhao, X.-Y.; Liu, Y.-L. Fundamentals and Applications of Char in Biomass Tar Reforming. Fuel Process. Technol. 2021, 216, 106782. [Google Scholar] [CrossRef]
  36. Yang, S.; Chen, L.; Sun, L.; Xie, X.; Zhao, B.; Si, H.; Zhang, X.; Hua, D. Novel Ni–Al Nanosheet Catalyst with Homogeneously Embedded Nickel Nanoparticles for Hydrogen-Rich Syngas Production from Biomass Pyrolysis. Int. J. Hydrogen Energy 2021, 46, 1762–1776. [Google Scholar] [CrossRef]
  37. Ochoa, A.; Barbarias, I.; Artetxe, M.; Gayubo, A.G.; Olazar, M.; Bilbao, J.; Castaño, P. Deactivation Dynamics of a Ni Supported Catalyst during the Steam Reforming of Volatiles from Waste Polyethylene Pyrolysis. Appl. Catal. B 2017, 209, 554–565. [Google Scholar] [CrossRef]
  38. García-Gómez, N.; Valecillos, J.; Remiro, A.; Valle, B.; Bilbao, J.; Gayubo, A.G. Effect of Reaction Conditions on the Deactivation by Coke of a NiAl2O4 Spinel Derived Catalyst in the Steam Reforming of Bio-Oil. Appl. Catal. B 2021, 297, 120445. [Google Scholar] [CrossRef]
  39. Ren, J.; Cao, J.-P.; Zhao, X.-Y.; Yang, F.-L.; Wei, X.-Y. Recent Advances in Syngas Production from Biomass Catalytic Gasification: A Critical Review on Reactors, Catalysts, Catalytic Mechanisms and Mathematical Models. Renew. Sustain. Energy Rev. 2019, 116, 109426. [Google Scholar] [CrossRef]
  40. Yue, W.; Ma, X.; Yu, Z.; Liu, H.; Li, W.; Li, C. CaMoO4-Enhanced Ni-CaO Bifunctional Catalyst for Biomass Pyrolysis to Produce Hydrogen-Rich Gas. Fuel Process. Technol. 2023, 250, 107900. [Google Scholar] [CrossRef]
  41. Li, P.; Wang, B.; Hu, J.; Zhang, Y.; Chen, W.; Chang, C.; Pang, S. Research on the Kinetics of Catalyst Coke Formation during Biomass Catalytic Pyrolysis: A Mini Review. J. Energy Inst. 2023, 110, 101315. [Google Scholar] [CrossRef]
  42. Fernandez, E.; Santamaria, L.; García, I.; Amutio, M.; Artetxe, M.; Lopez, G.; Bilbao, J.; Olazar, M. Elucidating Coke Formation and Evolution in the Catalytic Steam Reforming of Biomass Pyrolysis Volatiles at Different Fixed Bed Locations. Chin. J. Catal. 2023, 48, 101–116. [Google Scholar] [CrossRef]
  43. Pecha, M.B.; Arbelaez, J.I.M.; Garcia-Perez, M.; Chejne, F.; Ciesielski, P.N. Progress in Understanding the Four Dominant Intra-Particle Phenomena of Lignocellulose Pyrolysis: Chemical Reactions, Heat Transfer, Mass Transfer, and Phase Change. Green. Chem. 2019, 21, 2868–2898. [Google Scholar] [CrossRef]
  44. Saravana Sathiya Prabhahar, R.; Jeyasubramanian, K.; Nagaraj, P.; Sakthivel, A. Catalytic Pyrolysis of Rice Husk with Nickel Oxide Nano Particles: Kinetic Studies, Pyrolytic Products Characterization and Application in Composite Plates. Biomass Convers. Biorefinery 2024, 14, 2849–2866. [Google Scholar] [CrossRef]
  45. Persaud, V.V.; Hamrani, A.; Uzzi, M.; Munroe, N.D.H. Machine Learning-Guided Optimization of Nickel-Based Catalysts for Enhanced Biohydrogen Production through Catalytic Pyrolysis of Biomass. Int. J. Hydrogen Energy 2025, 144, 1085–1094. [Google Scholar] [CrossRef]
Figure 1. Heatmap of the classification models’ performances for this study.
Figure 1. Heatmap of the classification models’ performances for this study.
Catalysts 16 00105 g001
Figure 2. Radar plot of regression models performances in this study.
Figure 2. Radar plot of regression models performances in this study.
Catalysts 16 00105 g002
Figure 3. Global feature importance using SHAP for the (a) RF classifier and (b) RF regressor.
Figure 3. Global feature importance using SHAP for the (a) RF classifier and (b) RF regressor.
Catalysts 16 00105 g003aCatalysts 16 00105 g003b
Figure 4. Model predicted optimal vs. ground-truth reaction temperatures for different hydrogen targeted yields.
Figure 4. Model predicted optimal vs. ground-truth reaction temperatures for different hydrogen targeted yields.
Catalysts 16 00105 g004
Figure 5. Model-predicted optimal vs. ground-truth Ni loading for different hydrogen targeted yields.
Figure 5. Model-predicted optimal vs. ground-truth Ni loading for different hydrogen targeted yields.
Catalysts 16 00105 g005
Figure 6. Model-predicted optimal vs. ground-truth hydrogen content in biomass for different targeted hydrogen yields.
Figure 6. Model-predicted optimal vs. ground-truth hydrogen content in biomass for different targeted hydrogen yields.
Catalysts 16 00105 g006
Figure 7. Model-predicted optimal vs. ground-truth catalyst calcination temperature for different targeted hydrogen yields.
Figure 7. Model-predicted optimal vs. ground-truth catalyst calcination temperature for different targeted hydrogen yields.
Catalysts 16 00105 g007
Figure 8. Database and model selection and construction.
Figure 8. Database and model selection and construction.
Catalysts 16 00105 g008
Figure 9. Flowchart of the inverse analysis using Bayesian optimization.
Figure 9. Flowchart of the inverse analysis using Bayesian optimization.
Catalysts 16 00105 g009
Table 1. Hydrogen production technologies’ cost and carbon emissions.
Table 1. Hydrogen production technologies’ cost and carbon emissions.
TechnologyCost per kg/H2 (USD)Carbon Footprint (kgCO2/kgH2)
Steam Methane Reforming$1.21–$2.628–12
Natural Gas Pyrolysis$1.50–$2.50<1
Coal Gasification$1.50–$3.5019–22
Renewable Energy Electrolysis$2.00–$4.002–3
Biomass (Pyrolysis and Gasification)$2.00–$6.00<1
Electrolysis$7.00–$8.009.3
Table 2. The curated literature data and H yield for the validation of inverse analysis.
Table 2. The curated literature data and H yield for the validation of inverse analysis.
H Yield
(vol%)
Ni Loading
(wt%)
SupportCal. Temp (°C)Biomass H Content (wt%)Temperature (°C)Ref.
42.510CaO6506.1600[8]
51.810.7Al2O37007.2850[25]
55.120Al2O38005.7800[26]
60.124LaCoO38006.1550[27]
65.114Al2O37007.2850[25]
70.035CNT6008.2450[28]
75.815Dolomite8506.1650[29]
80.45CNT6008.2550[28]
87.115Dolomite8506.1650[29]
93.735CNT6008.2550[28]
Table 3. Predicted vs. experimental catalyst support for targeted hydrogen yields.
Table 3. Predicted vs. experimental catalyst support for targeted hydrogen yields.
CaseExperimental
Support
Predicted SupportH Yield Vol%
1CaOMCM-4142.5
2Al2O3SiO251.8
3Al2O3MCM-4155.1
4LaCoO3CNT60.1
5Al2O3dolomite65.1
6CNTCNT70.0
7dolomiteCNT75.8
8CNTCNT80.4
9dolomiteCNT87.1
10CNTCNT93.7
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Persaud, V.V.; Hamrani, A.; Uzzi, M.; Munroe, N.D.H. Machine Learning-Guided Inverse Analysis for Optimal Catalytic Pyrolysis Parameters in Hydrogen Production from Biomass. Catalysts 2026, 16, 105. https://doi.org/10.3390/catal16010105

AMA Style

Persaud VV, Hamrani A, Uzzi M, Munroe NDH. Machine Learning-Guided Inverse Analysis for Optimal Catalytic Pyrolysis Parameters in Hydrogen Production from Biomass. Catalysts. 2026; 16(1):105. https://doi.org/10.3390/catal16010105

Chicago/Turabian Style

Persaud, Vishal V., Abderrachid Hamrani, Medeba Uzzi, and Norman D. H. Munroe. 2026. "Machine Learning-Guided Inverse Analysis for Optimal Catalytic Pyrolysis Parameters in Hydrogen Production from Biomass" Catalysts 16, no. 1: 105. https://doi.org/10.3390/catal16010105

APA Style

Persaud, V. V., Hamrani, A., Uzzi, M., & Munroe, N. D. H. (2026). Machine Learning-Guided Inverse Analysis for Optimal Catalytic Pyrolysis Parameters in Hydrogen Production from Biomass. Catalysts, 16(1), 105. https://doi.org/10.3390/catal16010105

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop