Machine Learning Applications in Gray, Blue, and Green Hydrogen Production: A Comprehensive Review

Du, Xuejia; Gao, Shihui; Yang, Gang

doi:10.3390/gases5020009

Open AccessReview

Machine Learning Applications in Gray, Blue, and Green Hydrogen Production: A Comprehensive Review

by

Xuejia Du

¹

,

Shihui Gao

^2,*

and

Gang Yang

³

¹

Department of Petroleum Engineering, Cullen College of Engineering, University of Houston, Houston, TX 77204, USA

²

College of Natural Sciences, The University of Texas at Austin, Austin, TX 78712, USA

³

C&C Reservoirs, Houston, TX 77040, USA

^*

Author to whom correspondence should be addressed.

Gases 2025, 5(2), 9; https://doi.org/10.3390/gases5020009

Submission received: 31 March 2025 / Revised: 3 May 2025 / Accepted: 15 May 2025 / Published: 17 May 2025

Download

Browse Figures

Versions Notes

Abstract

Hydrogen is increasingly recognized as a key contributor to a low-carbon energy future, and machine learning (ML) is emerging as a valuable tool to optimize hydrogen production processes. This review presents a comprehensive analysis of ML applications across various hydrogen production pathways, including gray, blue, and green hydrogen, with additional insights into pink, turquoise, white, and black/brown hydrogen. A total of 51 peer-reviewed studies published between 2012 and 2025 were systematically reviewed. Among these, green hydrogen—particularly via water electrolysis and biomass gasification—received the most attention, reflecting its central role in decarbonization strategies. ML algorithms such as artificial neural networks (ANNs), random forest (RF), and gradient boosting regression (GBR) have been widely applied to predict hydrogen yield, optimize operational conditions, reduce emissions, and improve process efficiency. Despite promising results, real-world deployment remains limited due to data sparsity, model integration challenges, and economic barriers. Nonetheless, this review identifies significant opportunities for ML to accelerate innovation across the hydrogen value chain. By highlighting trends, key methodologies, and current gaps, this study offers strategic guidance for future research and development in intelligent hydrogen systems aimed at achieving sustainable and cost-effective energy solutions.

Keywords:

machine learning; green hydrogen; blue hydrogen; gray hydrogen; comprehensive review

1. Introduction

1.1. Background of Hydrogen Production

Sustainable development and a high quality of life rely on clean, safe, and reliable energy supplies. However, meeting this growing demand—driven by population and economic growth—places increasing pressure on fossil fuels, which remain dominant but contribute significantly to greenhouse gas emissions and resource depletion. These challenges underscore the urgent need to transition to renewable energy sources. According to the IEA (International Energy Agency) report, the share of fossil fuels in the global energy mix has gradually decreased over the last decade, from 82% in 2013 to 80% in 2023. Energy demand has increased by 15% over this period, and 40% of this growth (Figure 1) has been met by clean energy, i.e., renewables in the power and end-use sectors, nuclear energy, and low-emission fuels, including carbon capture, utilization, and storage (CCUS) [1].

Hydrogen is gaining global recognition as a versatile energy carrier, extending beyond its traditional applications. Unlike synthetic carbon-based fuels, it offers the potential to be truly carbon-neutral, or even carbon-negative, throughout its entire life cycle. One of its primary advantages lies in its versatility, as it can be used across multiple sectors, including transportation, industrial heating, power generation, and energy storage [2]. Additionally, hydrogen exhibits high energy density, making it a suitable alternative to conventional fossil fuels, particularly in hard-to-abate industries such as steelmaking, cement production, and long-haul transportation. When produced using renewable energy sources, such as electrolysis powered by wind or solar, hydrogen generates no direct carbon emissions, contributing to global decarbonization efforts and supporting net-zero targets [3]. Furthermore, hydrogen can be stored for extended periods and transported via pipelines, liquid carriers, or ammonia conversion, enhancing energy security and grid flexibility by compensating for the intermittency of renewable power sources. In fuel cell applications, hydrogen produces only water and heat as byproducts, making it an environmentally sustainable solution for both stationary and mobile energy systems [4]. Moreover, emerging pathways such as blue and turquoise hydrogen offer transitional solutions by reducing carbon emissions through CCUS or methane pyrolysis [5]. As advancements in production, storage, and distribution technologies continue, hydrogen plays a crucial role in sustainable energy systems, industrial transformation, and global efforts to combat climate change.

Global hydrogen production reached 97 Mt in 2023, an increase of almost 2.5% compared to 2022 [6], and this is expected to rise to a minimum of 105 million tons by 2030 (Figure 2). Hydrogen has been utilized across a wide range of industries, including steel and fertilizer production. In 2023, China is the largest consumer, accounting for 29% of total usage. North America and the Middle East follow, representing 16% and 14% of global hydrogen consumption, respectively. Other regions, including India (9%), Europe (8%), and the rest of the world (24%), contribute to the remaining share, reflecting the global diversification of hydrogen demand [6].

Hydrogen production can be categorized into several types based on the feedstock and carbon emissions associated with each process, as shown in Figure 3 [7]. Gray hydrogen, the most common form, is produced from natural gas through steam methane reforming (SMR) or autothermal reforming (ATR), releasing significant amounts of CO₂ into the atmosphere. Blue hydrogen follows a similar production pathway but incorporates CCUS to reduce emissions, making it a lower-carbon alternative. Black or brown hydrogen, derived from coal gasification, is among the most carbon-intensive forms due to the high emissions generated during the process. In contrast, green hydrogen is produced via water electrolysis using renewable electricity (e.g., wind, solar, hydro), emitting no CO₂ and producing only oxygen and water as byproducts, making it the most sustainable option. Pink hydrogen, another low-emission alternative, is also generated through electrolysis but powered by nuclear energy, ensuring minimal carbon emissions. Turquoise hydrogen is obtained through methane pyrolysis, which splits methane into solid carbon and hydrogen, resulting in lower emissions than blue hydrogen while offering potential carbon storage in solid form. Lastly, white hydrogen refers to naturally occurring hydrogen found in geological formations, which, if extracted properly, can be a zero-emission resource [8]. The selection of hydrogen production methods significantly influences its viability as a clean energy carrier and its role in decarbonizing industrial processes, transportation, and energy systems. Among these methods, three routes are of interest: gray, blue, and green hydrogen.

Figure 4 illustrates the carbon intensity of various hydrogen production methods, expressed in kg CO₂ equivalent per kg of H₂ produced. Green hydrogen pathways—including wind-, hydro-, and solar-powered electrolysis—exhibit the lowest carbon intensities, ranging from 0.4 to 1.6 kg CO₂/kg H₂, with biomass gasification potentially achieving negative emissions. Blue hydrogen, produced via fossil fuel reforming with carbon capture (e.g., ATR and SMR), shows moderate emissions (2.8–7.0 kg CO₂/kg H₂), while coal-based routes remain significantly higher at 11.8. Gray, brown, and black hydrogen represent the most carbon-intensive methods, exceeding 9 kg CO₂/kg H₂. In contrast, nuclear-derived pink hydrogen has low emissions (0.4), and turquoise hydrogen—produced via methane pyrolysis—falls between 1.9 and 4.8. The value for white hydrogen is shown as 0.0 and marked as assumed, reflecting the current lack of standardized data for naturally occurring H₂.

1.2. Incorporating Machine Learning with Hydrogen Production

Machine learning (ML) is a branch of artificial intelligence that enables computers to analyze vast datasets, identify patterns, and make data-driven predictions or decisions with minimal human intervention. One of the key benefits of ML is its ability to optimize complex systems, improve efficiency, and reduce operational costs by continuously learning from data and adjusting processes in real time [10,11]. As a result, ML has been widely applied in various industries, including healthcare, finance, manufacturing, and renewable energy [11], to enhance productivity, predictive maintenance, and decision-making. In the field of hydrogen production, ML offers significant advantages by improving process efficiency, reducing energy consumption, and optimizing system performance. For example, Shahin and Simjoo [12] demonstrated the practical capabilities of ChatGPT-4 through four detailed case studies, each addressing a critical aspect of hydrogen energy development. Kwon et al. [13] used ML methods to predict hydrogen demand from 2020 to 2030 with an R² value of 0.9936. An increasing number of papers have been published that focus on applying ML in hydrogen production. Figure 5 shows the most frequently used keywords for ML applications in hydrogen production (based on co-occurrence analysis with “all keywords” as a unit of analysis using the VOSviewer tool, version 1.6.20). ML can be employed in various aspects of hydrogen production. In electrolysis, ML algorithms can be used to enhance reaction efficiency, electrode material selection, and power usage, leading to increased hydrogen yield with minimal energy loss. In SMR and coal gasification, ML can help optimize reaction conditions, reduce carbon emissions, and predict equipment failures, ensuring more sustainable and cost-effective hydrogen production. Additionally, ML-driven models can facilitate the integration of green hydrogen into the energy grid by forecasting renewable energy availability and dynamically adjusting electrolysis operations. By utilizing advanced data analytics and automation, ML plays a crucial role in advancing hydrogen production technologies, reducing costs, and accelerating the transition to a cleaner energy future.

1.3. Motivation of This Review

To the best of our knowledge, the in-depth technical review of ML within blue, green, and gray hydrogen production is limited. Most of the work focuses on reviewing the process of different production methods [2,3,4,5,14]. Davies et al. [7] conducted a review on how ML applications are used on blue hydrogen only. Alagumalai et al. [15] and Sharma et al. [16] summarized the ML applications for biohydrogen only, which is one aspect of green hydrogen. Bassey et al. [17] presented a review of recent ML applications on green hydrogen with a main focus on water electrolysis. Allal et al. [18] provided comprehensive coverage of ML applications in hydrogen energy systems, but it lacks differentiation regarding how these techniques are applied across the various hydrogen classifications (e.g., green, blue, gray), which limits insights into color-specific optimization strategies. With hydrogen being identified as a key renewable energy source for achieving a low-carbon economy, it is essential to further develop and optimize novel low-carbon hydrogen technologies and explore how ML can be utilized in their development and deployment. This work provides a comprehensive technical review of the literature on ML within hydrogen production, focusing particularly on blue, green, and gray hydrogen, as well as a general overview of pink, turquoise, black, and white hydrogen. Other hydrogen production methods, such as solar-driven hydrogen production but with biomass as a feedstock [19], converting plastic waste into clean hydrogen via gasification [20], and municipal sludge gasification-based hydrogen production [21,22], are not included here. This review aims to offer insights into how ML can address common limitations of conventional process modeling techniques and support its integration into intelligent monitoring and control systems.

This review adopts a narrative and technical synthesis approach to evaluate ML applications in hydrogen production. A structured literature search was conducted using the Scopus database (primary) and Google Scholar (supplementary), covering the period from 2012 to 2025. The following Boolean search query was used: (“machine learning”) AND (“hydrogen production”) AND (“gray” OR “blue” OR “green” OR “electrolysis” OR “gasification”).

A total of 172 articles were initially identified. After the removal of duplicates and screening for relevance, 51 peer-reviewed studies were included based on the following criteria: (1) application of ML to hydrogen production processes, (2) focus on specific hydrogen types (gray, blue, green, etc.), and (3) availability of methodological detail. Review articles, non-ML studies, and inaccessible or off-topic papers were excluded.

The remainder of this paper is organized as follows: Section 2 reviews common ML methods and algorithms relevant to hydrogen production. Section 3 through Section 6 systematically explore the role of ML in gray, blue, and green hydrogen production, as well as in other hydrogen types, including pink, turquoise, white, and black. Section 7 highlights the benefits and limitations of ML techniques in hydrogen production and discusses emerging challenges and opportunities. Finally, Section 8 concludes this review and offers perspectives on future research directions in ML-driven hydrogen energy systems.

2. Overview of Machine Learning

Machine learning (ML) enables computers to learn patterns from data and make decisions or predictions without explicit programming. Unlike traditional rule-based programming, where outcomes depend on predefined instructions, ML models analyze large datasets, identify correlations, and improve their performance over time. This data-driven approach has led to breakthroughs in automation, data analysis, and decision-making across various fields. ML is particularly valuable in complex problem-solving, where traditional methods struggle due to large-scale data, nonlinear relationships, and the need for real-time adaptation.

2.1. Brief History of ML

The roots of ML can be traced back to the mid-20th century, with the development of foundational mathematical theories and early computational models. In 1950, Alan Turing introduced the Turing Test, which laid the groundwork for AI research. The first ML algorithm, the Perceptron, was developed in 1958 by Frank Rosenblatt, marking the birth of artificial neural networks (ANNs) [23,24]. During the 1980s and 1990s, advances in computational power and algorithmic improvements, such as the backpropagation algorithm, enabled the resurgence of neural networks. By the 2000s, the rise of big data and deep learning models—particularly with architectures like convolutional neural networks (CNNs) and recurrent neural networks (RNNs)—revolutionized applications in image recognition, natural language processing, and autonomous systems. Today, ML is an essential part of scientific research, industry automation, and technological innovation, with continuous advancements in quantum computing, reinforcement learning, and generative AI models.

2.2. Categories of ML

ML algorithms are broadly categorized into three main types, each distinguished by how the model learns from data [25,26]:

Supervised learning: In this approach, models are trained on labeled data, meaning each input is paired with the correct output. The algorithm learns by minimizing errors and improving accuracy through iterative training. Examples include linear regression, decision trees, and neural networks, which are commonly used in applications such as fraud detection, medical diagnosis, and stock price prediction.
Unsupervised learning: This method deals with unlabeled data, where the algorithm identifies hidden patterns or structures without explicit output labels. Clustering and association rule learning are common techniques with applications in customer segmentation, anomaly detection, and market analysis. Examples include k-means clustering and principal component analysis (PCA).
Reinforcement learning: Unlike the previous categories, reinforcement learning (RL) is based on reward-based learning, where an agent interacts with an environment and learns through trial and error to maximize cumulative rewards. RL is widely used in robotics, gaming (e.g., AlphaGo), and autonomous systems. Algorithms such as Q-learning and deep Q-networks (DQNs) power advanced decision-making systems.

2.3. Common ML Algorithms

The following table (Table 1) summarizes some of the common ML techniques currently in use, along with their common applications, advantages, and limitations. As shown in the table, these diverse techniques have proved successful in various applications. That was the driving force to utilize them in the field of hydrogen production modeling, simulation, and optimization. There are various ML algorithms, each suited for different types of tasks [11,26,27].

ML has demonstrated remarkable success across various domains, such as finance, biology, geosciences, healthcare analytics, materials science, and engineering [28,29,30,31,32], due to its capability to model complex, nonlinear relationships within large datasets. Its ability to uncover hidden patterns and optimize predictive accuracy makes it a powerful tool for addressing intricate problems. Given these advantages, the application of ML in forecasting hydrogen production is expected to gain increasing attention, emerging as a pivotal trend in the pursuit of efficient and sustainable energy solutions.

3. Blue Hydrogen Production and ML Applications

3.1. Blue Hydrogen Production

Blue hydrogen has gained significant attention as a viable alternative for large-scale hydrogen production, particularly in industrial manufacturing, transportation, and power generation sectors. It is a low-carbon hydrogen production method that relies on fossil fuels while incorporating CCUS to reduce greenhouse gas emissions. Blue hydrogen is primarily produced through SMR or ATR of natural gas, as well as coal gasification, with carbon capture systems preventing most of the CO₂ emissions from entering the atmosphere. The details of each process are introduced below.

3.1.1. Steam Methane Reforming (SMR)

SMR was first industrially implemented in 1936 at the Billingham site in the United Kingdom. This development was a result of collaborative efforts and technological advancements in the early 20th century [33]. SMR itself was developed to meet industrial demands for hydrogen, like in ammonia production for fertilizers, using natural gas and high-temperature steam to produce hydrogen and carbon dioxide. The “blue” twist came later, as climate concerns grew in the 2000s when engineers started pairing SMR with CCS technology.

Nowadays, SMR is the most widely used process for large-scale hydrogen production, particularly in industrial applications. It involves the reaction of methane (CH₄), the primary component of natural gas, with steam (H₂O) at high temperatures, typically ranging between 700 and 1000 °C [34], in the presence of a nickel-based catalyst. The reaction takes place in a steam reformer, which consists of catalyst-filled tubes heated externally by burning natural gas. Figure 6 shows a simplified process of SMR.

The primary reaction converts methane and steam into hydrogen and carbon monoxide through the following endothermic reaction:

CH₄ + H₂O→CO + 3H₂,

(1)

Since this reaction requires heat input, it is conducted in high-temperature reactors where the necessary heat is supplied externally. The produced carbon monoxide undergoes a subsequent process called the water–gas shift reaction (WGSR), where it reacts with additional steam to produce more hydrogen and CO₂:

CO + H₂O→CO₂ + H₂,

(2)

This two-step reaction process—methane reforming followed by the water–gas shift reaction—maximizes the hydrogen yield while generating CO₂ as a byproduct. In traditional gray hydrogen production, this CO₂ is released into the atmosphere, significantly contributing to greenhouse gas emissions. However, in blue hydrogen production, up to 90% of the CO₂ emissions are captured, preventing their release and making the hydrogen production process significantly cleaner [35].

3.1.2. Autothermal Reforming (ATR)

Autothermal reforming (ATR) is a hydrogen production process that combines elements of SMR and partial oxidation (POX) to convert natural gas (methane) into hydrogen, carbon monoxide, and carbon dioxide. Unlike SMR, which requires an external heat source, ATR is self-sustaining, generating the necessary heat through a controlled oxidation reaction [36]. This makes it particularly suitable for large-scale hydrogen production with CCUS, positioning ATR as a key technology for blue hydrogen development.

ATR technology has evolved alongside other hydrogen and syngas production methods. The concept of POX of hydrocarbons dates back to the early 20th century, with significant industrial developments occurring in the 1950s and 1960s. Early ATR designs were primarily used for syngas (CO + H₂) production in the chemical and petrochemical industries, particularly for ammonia and methanol synthesis. In the 1990s and 2000s, the increasing demand for low-carbon hydrogen and the advancement of carbon capture technologies led to ATR being considered as a viable alternative to SMR for blue hydrogen production. Today, ATR is gaining traction in projects where integrated CCUS solutions are prioritized, as it offers a more efficient carbon capture process compared to SMR.

ATR operates in a single high-pressure reactor where methane reacts with oxygen or air and steam (H₂O) to produce hydrogen-rich syngas. Figure 7 exhibits a simplified process of ATR.

The process consists of three main steps:

POX Reaction

Methane is partially oxidized using pure oxygen or air, producing carbon monoxide and hydrogen while releasing heat:

CH₄ + 1/2O₂→CO + 2H₂,

(3)

This reaction is exothermic, meaning it generates heat, allowing the process to be self-sustaining.

2.: Steam Reforming Reaction

The remaining methane reacts with steam at high temperatures (900–1100 °C) in the presence of a nickel-based catalyst, producing more hydrogen and carbon monoxide:

CH₄ + H₂O→CO + 3H₂,

(4)

This is endothermic, meaning it absorbs heat, balancing the overall energy requirement of the ATR reactor.

3.: Water–Gas Shift Reaction (WGSR)

The carbon monoxide from the first two reactions undergoes a water–gas shift reaction, where it reacts with steam to produce carbon dioxide and additional hydrogen:

CO + H₂O→CO₂ + H₂

(5)

The resulting CO₂ is captured using CCUS, and the remaining hydrogen is purified for industrial applications.

ATR offers several advantages that make it a promising method for blue hydrogen production, particularly in applications where CCUS is integrated. One of its key benefits is its higher carbon capture efficiency, as ATR operates at higher pressures than SMR, allowing for easier and more cost-effective CO₂ separation, with capture rates reaching up to 90% [37]. Additionally, ATR is a self-sustaining process, as the POX of methane generates the necessary heat, eliminating the need for external fuel combustion, thereby reducing overall CO₂ emissions. The technology is also highly scalable, making it suitable for large-scale hydrogen production in refineries and industrial applications where a consistent hydrogen supply is required. However, despite these advantages, ATR has some limitations that impact its widespread adoption. A major challenge is the requirement for pure oxygen instead of air, which necessitates using energy-intensive air separation units (ASUs), resulting in increased capital and operational costs. Furthermore, ATR systems operate at higher pressures and temperatures, requiring specialized equipment and reactor designs, which adds to the complexity and cost of implementation. Additionally, compared to SMR, ATR is less established in existing hydrogen infrastructure, meaning that industries may face higher initial investments and technical challenges when adopting this technology [38]. Despite these hurdles, ATR remains a viable and efficient pathway for low-carbon hydrogen production, particularly in projects where high CO₂ capture rates and large-scale hydrogen output are essential. Table 2 provides a brief comparison of ATM, SMR, and coal gasification.

3.2. ML Application for Blue Hydrogen

In blue hydrogen production, key technical challenges include optimizing carbon capture efficiency, minimizing energy consumption during SMR or ATR, and enhancing the purity and recovery of hydrogen in gas separation processes. ML models have been instrumental in predicting performance under varying operating conditions and optimizing parameters such as purge-to-feed ratios, adsorption pressures, and membrane configurations. ML-driven surrogate models enable the rapid evaluation of complex PSA and SE-SMR systems, reducing computational time and improving design optimization. Moreover, ML helps identify optimal catalyst formulations and operating strategies that enhance methane conversion and CO₂ capture, thereby supporting more efficient and economically viable blue hydrogen production.

This section explores the application of ML across various aspects of hydrogen production, from material screening and development to full-scale plant optimization. Through this analysis, we highlight current implementations of ML in blue hydrogen production and identify opportunities for broader adoption. A summary of the relevant studies, including key methods and findings, is presented in Table 3 and will be discussed in greater detail in the following sections. It highlights the ML algorithms used, input parameters, predicted outputs, and key findings. The majority of the reviewed studies focus on SMR, reflecting its industrial maturity and wide adoption. In contrast, relatively few studies target ATR. This observation is consistent with the work of Howarth and Jacobson [40], who argue that blue hydrogen may not yet offer a truly low-emission alternative to gray hydrogen due to its unresolved technical and environmental challenges.

In terms of algorithm usage, an ANN is by far the most frequently adopted technique, appearing in over 70% of the studies. Variants of ANNs—such as FFBPNN, DNN, ASNN, and hybrid models like ANN-GA or ANN-DE—demonstrate the method’s versatility and adaptability to complex nonlinear systems. While ANNs dominate the reviewed studies, their black-box nature limits interpretability, which is critical in safety-focused industries like hydrogen production. Some studies (e.g., Vo et al. [41], Yu et al. [42]) reported high R² values (>0.99), but these validations were based mainly on train–test splits without independent test sets or cross-validation, raising concerns about overfitting. Some dataset sizes were generally small (<500 samples), largely due to experimental constraints, limiting model generalizability. For example, the studies by Tong et al. [43] and Streb and Mazzotti [45] optimized PSA processes but did not evaluate robustness across different feed gas compositions or operating ranges. Moreover, while MSE and R² were reported, their practical meaning (e.g., impact of a 2% error on H₂ purity for industrial feasibility) was rarely discussed.

Common input parameters used across studies fall into three main categories: (1) operating conditions—such as adsorption pressure, steam-to-carbon ratio, temperature, and purge-to-feed ratio; (2) material properties—including molecular descriptors, catalyst composition, and concentrations of amine solvents; (3) process variables—such as feed flow rates, membrane area, and reactor parameters. Outputs typically focus on hydrogen purity, hydrogen recovery/yield, CO₂ capture efficiency, energy consumption, and economic metrics like H₂ production cost.

4. Gray Hydrogen Production and ML Applications

4.1. Gray Hydrogen Production Process

Gray hydrogen represents the most prevalent form of hydrogen production in the current global energy landscape. It is primarily derived from fossil fuels—most notably natural gas—through processes that emit substantial quantities of CO₂ into the atmosphere. Based on the latest data from the IEA [6], global hydrogen demand reached 97 Mt in 2023, with low-emission hydrogen production accounting for less than 1% of this total. Most specifically, almost all the hydrogen still comes from fossil fuels (83%), with 62% from gray hydrogen, followed by 19% from a combination of brown and black hydrogen, 0.7% from blue hydrogen, and only 0.04% from green hydrogen [6]. The rest was produced as a byproduct in the chemical industry. The widespread adoption of gray hydrogen can be attributed to its relatively low production cost and the well-established infrastructure supporting its generation and distribution [35]. However, despite its economic advantages, the environmental consequences of gray hydrogen production pose significant challenges to global decarbonization goals. It is estimated to be responsible for about 2% of global CO₂ emissions, representing around 830 Mt of CO₂ yearly [52]. As the world shifts toward a more sustainable energy future, there is increasing pressure to decarbonize gray hydrogen production, given its dominant role and high environmental impact.

4.2. ML Application for Gray Hydrogen

Gray hydrogen production, primarily from coal gasification or SMR without carbon capture, faces major challenges such as controlling syngas composition, minimizing carbon emissions, and maintaining catalyst stability. Given the significant share of gray hydrogen in global production and its considerable environmental footprint, improving the efficiency and sustainability of this process is critical for near-term decarbonization. However, optimizing gray hydrogen production is inherently complex due to the interplay of numerous operational parameters, such as feedstock quality, reaction kinetics, temperature, pressure, and catalyst performance. ML applications in gray hydrogen production have focused on modeling and predicting syngas outputs (e.g., H₂, CO, CH₄, CO₂) based on variable feedstock properties and operating conditions. Techniques such as ANNs, GPR, and ensemble models have been used to optimize gasification parameters, maximize hydrogen yield, and reduce undesirable byproducts. In addition, ML facilitates real-time monitoring and anomaly detection in gasifier operations, enhancing system efficiency and minimizing environmental impacts. Table 4 reviews recent advancements in applying ML techniques to gray hydrogen production. Information such as ML models, input variables, target outputs, and their impact on process enhancement is also included.

Similar to the statistical summary of blue hydrogen, ANNs are the most widely used algorithm, appearing in over 60% of the studies. Variants such as ANN-GA, ANN-MLP, DNN, and hybrid models like DNN-PSO reflect the flexibility of neural architectures in capturing the nonlinear dynamics of hydrogen production systems. Other frequently employed algorithms include SVR, DT, GPR, and ensemble methods like RF and GBR. Regarding input features, studies generally use 6–12 variables that can be grouped into three categories: (1) feedstock and fuel properties—e.g., fixed carbon, volatile matter, elemental composition (C, H, O, N, S), moisture, and ash content; (2) reaction and process conditions—e.g., temperature, steam-to-carbon ratio, oxygen/air flow rate, pressure, and gasifier bed temperature; (3) economic or system-level inputs—such as compressor costs, catalyst configurations, and energy inputs. The most commonly predicted output variables include hydrogen yield, CO₂ emission rate, syngas composition (H₂, CO, CH₄, CO₂), heating value, and carbon conversion efficiency. Many models also evaluate optimization trade-offs, such as improving H₂ yield while minimizing emissions or catalyst degradation.

5. Green Hydrogen Production and ML Applications

Green hydrogen is produced using renewable energy sources with zero direct CO₂ emissions. Unlike gray hydrogen, which is generated from fossil fuels and emits significant amounts of CO₂, green hydrogen is entirely carbon-free, making it a critical component in the transition toward sustainable energy systems. The fundamental principle behind green hydrogen production lies in the electrolysis of water, which can be achieved through different electrolyzer technologies, each offering distinct advantages in terms of efficiency, operational conditions, and scalability.

5.1. Green Hydrogen Production Process

This section discusses the primary methods of green hydrogen production, including water electrolysis, biomass gasification, photoelectrochemical (PEC) water splitting, and biological hydrogen production.

5.1.1. Water Electrolysis

The primary method for producing green hydrogen is water electrolysis, which splits water into hydrogen and oxygen using electricity. The reaction occurs in an electrolyzer and is given by the following:

2H₂O(l)→2H₂(g) + O₂(g)

(6)

Three primary electrolyzer technologies are used for green hydrogen production: alkaline electrolysis (AEL), proton exchange membrane (PEM) electrolysis, and solid oxide electrolysis cells (SOECs). These technologies differ in operating temperature, efficiency, and material requirements. AEL is the most commercially available and widely used technology, benefiting from lower costs due to its reliance on non-precious-metal catalysts. However, it suffers from lower efficiency and slower response times. PEM electrolyzers, in contrast, provide higher efficiency and faster dynamic response, making them suitable for fluctuating renewable energy sources such as wind and solar power. SOECs, operating at high temperatures, offer the highest efficiency by utilizing thermal energy to reduce electrical input requirements, but they face challenges in material degradation and high capital costs [66,67]. Table 5 provides a comparison among different electrolyzers.

Water electrolysis for green hydrogen production relies on electricity generated from renewable sources such as solar photovoltaic (PV) systems, wind turbines, or hydropower plants. These clean energy inputs ensure that the hydrogen produced is free from carbon emissions.

5.1.2. Biomass Gasification

Beyond water electrolysis, green hydrogen can be produced from biomass-based waste and wet organic materials (such as agricultural waste, forestry, or animal residues) through thermochemical and biochemical processes that convert carbon-rich feedstocks into hydrogen-rich gases [68]. This process is called biomass gasification. It typically occurs at high temperatures exceeding 500 °C and involves reacting organic or fossil-based carbonaceous materials with a controlled amount of O₂ and/or steam to produce CO, H₂, and CO₂ [69]. The key reaction is as follows:

C_xH_yO_z + H₂O→H₂ + CO₂ + CO + CH₄

(7)

The combination of biomass with other methods is also novel. For example, Pan et al. [70] present an ML-driven framework for optimizing biomass–coal co-gasification, targeting green hydrogen-rich syngas and liquid fuel production.

5.1.3. PEC Water Splitting

PEC water splitting is a method that directly converts solar energy into H₂ and O₂ using a semiconductor-based system. The PEC system mimics natural photosynthesis but uses artificial photoelectrodes. It has a photoanode (usually an n-type semiconductor) that absorbs light and oxidizes water to generate O₂ and a photocathode (usually a p-type semiconductor) or a metal electrode that collects electrons to reduce protons (H⁺) into H₂. The success of PEC water splitting is critically dependent on the properties of the photoelectrode materials, particularly their band gap, band edge alignment, light absorption, charge carrier mobility, and chemical stability in aqueous environments [71]. A wide range of semiconductor materials—such as TiO₂, BiVO₄, Fe₂O₃, WO₃, and Cu₂O—have been investigated, with various strategies employed to overcome inherent limitations, including low solar-to-hydrogen efficiency and photoelectrode degradation. ML offers a transformative opportunity for accelerating the discovery and optimization of metal oxide photoelectrodes, enhancing efficiency, and reducing costs in PEC water splitting [72].

5.1.4. Biohydrogen Production

Biohydrogen production is a method that utilizes microorganisms to produce hydrogen gas, offering a potentially sustainable and environmentally friendly approach to green hydrogen production. This process utilizes various microorganisms, including green algae, cyanobacteria, photosynthetic bacteria, and dark fermentative bacteria, to produce H₂ from organic substrates or water. The common methods for biohydrogen production are microbial electrolysis, photobiological hydrogen production, dark fermentation, and photofermentative hydrogen production [73]. ML applications in biohydrogen production involve modeling and optimizing the process by identifying patterns in complex biological data, predicting process performance, and selecting optimal conditions to maximize hydrogen yield. It enables data-driven control of fermentation, microbial behavior, and feedstock utilization, accelerating the design of cost-effective and scalable biohydrogen systems [15].

5.2. ML Application for Green Hydrogen

ML also plays an increasingly critical role in advancing green hydrogen production. In water electrolysis, ML algorithms help predict hydrogen yield, optimize operational parameters (e.g., current density, temperature, electrode material), and detect system degradation or faults. Beyond general process optimization, ML also addresses technology-specific challenges in different types of electrolyzers. For instance, in PEM electrolysis, ML models have been developed to predict membrane degradation and design operational strategies that extend membrane lifetime, which is essential due to the high cost and sensitivity of PEM membranes. In AEL systems, where catalyst deactivation and scaling are major concerns, ML techniques are used to model catalyst aging and recommend optimal operating conditions to maintain performance. In SOEC technologies, which operate at elevated temperatures, ML assists in predicting material degradation and thermal stress, enabling the development of improved operational protocols to enhance durability.

ML applications are also rapidly expanding in other green hydrogen production routes. In biomass gasification, ML models are employed to predict syngas composition, identify optimal gasification parameters, and classify biomass feedstocks based on hydrogen production potential. In PEC water splitting, ML aids in the discovery and optimization of semiconductor materials by uncovering complex relationships between material properties and photocurrent density. For biohydrogen production, ML models facilitate hydrogen yield prediction from processes like dark fermentation and microbial electrolysis, assist in reactor control, and reveal key influencing variables such as pH, substrate concentration, and microbial community behavior. Table 6 provides selected examples of ML applications in each category of green hydrogen production.

Among the 21 studies surveyed, ANNs and their variants (e.g., MLP, BPNN, RBF) were the most commonly used algorithms (11 studies have applied ANNs), followed closely by ensemble methods such as RF, GBR, and SVR. Inputs varied widely across studies but were generally categorized into operational conditions (e.g., temperature, pressure, flow rates), feedstock properties (e.g., elemental composition, moisture, ash content), and environmental or time-series data (e.g., solar irradiance, humidity, timestamps). Most studies targeted hydrogen yield or production rate as primary outputs, with several also predicting associated variables such as syngas composition, CO₂ yield, photocurrent density, and gasification efficiency. Time-series models like LSTM and hybrid models (e.g., ANN-GA, LSTM-CNN) demonstrated strong performance in forecasting and dynamic control tasks. Overall, the findings underscore ML’s versatility in capturing the complex, nonlinear relationships inherent to green hydrogen processes and its potential to optimize system performance, guide catalyst/material selection, and reduce experimental costs across diverse production pathways.

6. Other Hydrogen Production Pathways and Their ML Applications

While green, blue, and gray hydrogen represent the dominant and most widely studied production pathways, several other methods—such as pink, turquoise, white, and black/brown hydrogen—have also emerged in recent years. Although these technologies are either still in the early stages of development or currently lack widespread adoption, they offer unique advantages and potential for low- or zero-carbon hydrogen production. For the sake of completeness and to provide a comprehensive overview, this section includes a brief discussion of these alternative methods, along with their associated ML applications.

6.1. Pink Hydrogen Production

Pink hydrogen refers to hydrogen produced through water electrolysis powered by nuclear energy. It emits no direct CO₂ during hydrogen generation, making it an attractive option for countries with established nuclear infrastructure. The process is similar to that of green hydrogen, where water is split into hydrogen and oxygen using electricity. However, in the case of pink hydrogen, the electricity is generated from nuclear power plants rather than renewable sources such as wind or solar [52].

Despite its potential as a clean and reliable hydrogen source, pink hydrogen remains in the early stages of commercial deployment. One reason for this is the ongoing debate over the sustainability and public perception of nuclear energy. While some concerns exist regarding radioactive waste and nuclear safety, it is important to note that modern nuclear reactors utilize only small amounts of radioactive fuel and have benefited from significant advances in safety protocols, reactor design, and waste management technologies [8].

Recent developments highlight the growing interest in this hydrogen pathway. For example, the U.S. Department of Energy and Constellation Energy Group have launched the nation’s first pink hydrogen demonstration system at the Nine Mile Point Nuclear Plant in New York (Figure 8). This pilot project produces approximately 560 kg of hydrogen per day using just 1.25 MW of the plant’s 1907 MW nuclear output [93]. Moreover, Constellation has announced plans to scale up commercial hydrogen production by 2026, signaling increased momentum for pink hydrogen deployment.

On the academic front, Fernández-Arias et al. [94] conducted a bibliometric analysis of 550 research papers over a 13.6-year period and found that scientific interest in pink hydrogen is steadily increasing, with an annual growth rate of 5.58%. These findings suggest a rising awareness of nuclear-powered hydrogen as a viable decarbonization strategy, especially in countries that already rely on nuclear power for electricity generation. As global efforts to diversify clean hydrogen sources intensify, pink hydrogen could play a complementary role alongside green and blue hydrogen.

6.2. Turquoise Hydrogen Production

Turquoise hydrogen is produced through a process called methane pyrolysis, where natural gas (methane) is thermally decomposed into hydrogen gas and solid carbon in the absence of oxygen:

CH₄→C (solid) + 2H₂

(8)

Unlike SMR, which produces CO₂ as a byproduct, turquoise hydrogen avoids direct carbon emissions by generating solid carbon, which can be stored or utilized in various industries (e.g., carbon black, battery materials, construction additives). This makes it a potentially low-carbon or even carbon-neutral pathway, depending on the energy source used for pyrolysis [95].

Methane pyrolysis processes described in the literature can be classified into three categories: catalytic, thermal, and plasma decomposition. In catalytic pyrolysis using nickel, methane conversion begins at around 500 °C [96]. Without a suitable catalyst, thermal decomposition starts above 700 °C [97]. To achieve technically relevant reaction rates and methane conversion rates, higher temperatures are required, i.e., typically above 800 °C for catalytic processes, over 1000 °C for the thermal processes, and up to 2000 °C when using plasma torches [98]. The need for advanced reactor designs and the immature commercial infrastructure for handling and monetizing solid carbon pose challenges to the widespread application of turquoise hydrogen production. Moreover, catalyst deactivation and reactor fouling due to carbon buildup remain key technical issues.

6.3. White Hydrogen Production

White hydrogen refers to naturally occurring hydrogen found in underground deposits, such as in geological formations, in volcanic systems, or along fault zones. Unlike other types of hydrogen, white hydrogen is not produced through industrial processes—it is naturally formed through geochemical reactions like serpentinization (water reacting with iron-rich rocks) or radiolysis (water molecules split by natural radiation). Because it exists in nature without carbon emissions, white hydrogen is considered a clean and renewable energy source, if it can be extracted economically [8]. However, it remains largely untapped and underexplored, and ongoing research is focused on identifying viable reservoirs, assessing environmental impacts, and developing technologies for efficient extraction.

6.4. Black/Brown Hydrogen Production

Black/brown hydrogen refers to hydrogen produced through the gasification of coal. The key difference between brown and black hydrogen lies in the type of coal used—brown hydrogen is produced from lignite (low-grade, moisture-rich coal), while black hydrogen is derived from bituminous or anthracite coal, which has higher carbon content and energy density [8]. During this process, coal reacts with oxygen and steam at high temperatures to generate syngas (Figure 9). Black hydrogen is often contrasted with gray hydrogen, which is produced from natural gas via SMR. Although both methods emit substantial CO₂, gray hydrogen typically has a lower carbon footprint than black hydrogen, as natural gas has a higher hydrogen-to-carbon ratio and burns more cleanly than coal. As such, black hydrogen is considered one of the least environmentally friendly hydrogen production routes in the absence of CCS. In coal gasification, coal is heated in the presence of steam to produce hydrogen and CO, which is then further processed to yield hydrogen and CO₂. This method is even more carbon-intensive than SMR, emitting approximately 19 kg of CO₂ per kilogram of hydrogen [6].

The concept of coal gasification dates back to the 19th century when it was used to produce town gas for lighting and heating in urban areas. Early gasification plants operated using coal-derived syngas, which contained a mixture of H₂, CO, CH₄, and other gases. By the 20th century, gasification technology had advanced, enabling the large-scale production of synthetic fuels and chemicals. This advancement was particularly notable during World War II, when Germany utilized coal gasification to produce liquid fuels in the absence of crude oil. In the 1950s and 1960s, further advancements in catalysts and high-pressure gasifiers improved hydrogen production efficiency, making coal gasification attractive for the chemical and fertilizer industries [99]. With the rise of climate concerns in the 21st century, the focus shifted toward low-emission coal utilization. Countries like China, the United States, and Australia continue to invest in clean coal technologies, exploring coal gasification as a means to produce hydrogen while reducing CO₂ emissions through capture and storage [38].

Coal gasification presents both opportunities and challenges, particularly in regions with abundant coal reserves. One of its primary advantages is its ability to utilize widely available coal resources, providing an alternative to natural-gas-based hydrogen production [100]. Moreover, coal has the largest reserves of any fossil fuel in the world; especially in China, this method is frequently used and generates a substantial quantity of hydrogen [101]. Additionally, when coupled with CCUS, coal gasification can significantly reduce CO₂ emissions, making it a more sustainable approach compared to traditional coal combustion. The process also allows for high hydrogen yields, as syngas can be further processed to maximize hydrogen production through the water–gas shift reaction. However, despite these benefits, coal gasification remains energy-intensive, requiring high temperatures and pressures, which increase operational costs. The process also generates significant amounts of solid waste, such as slag and ash, which require proper disposal and environmental management. Additionally, capital costs are high due to the complexity of gasification reactors and the need for extensive gas purification systems to remove sulfur, nitrogen compounds, and particulates [102]. While CCUS can mitigate carbon dioxide emissions, it further adds to the cost and infrastructure requirements. As a result, while coal gasification remains a viable option, its long-term feasibility depends on advancements in carbon capture technology, regulatory policies, and economic incentives for low-carbon hydrogen production.

6.5. ML Application Summary

ML application for pink, turquoise, white, and black/brown hydrogen faces diverse technical challenges, largely due to the immaturity of these processes. For example, in turquoise hydrogen production via methane pyrolysis, maintaining plasma stability and preventing catalyst deactivation are major concerns; ML models are applied to predict plasma behavior from emission spectra and to optimize catalyst formulations for sustained high methane conversion. In pink hydrogen, ML assists in forecasting hydrogen production costs under varying operational and regulatory scenarios. For white hydrogen, ML models help predict subsurface thermodynamic behavior and phase stability to guide resource development. In black/brown hydrogen from coal gasification, ML supports optimizing gasification parameters to improve syngas quality and reduce pollutant emissions.

Although the application of ML in these emerging hydrogen pathways remains at an early stage, initial studies demonstrate its potential to accelerate experimental discovery, optimize process conditions, improve model accuracy, and enhance feasibility assessments under complex and uncertain conditions. Table 7 highlights selected examples of ML applications across these lesser-explored hydrogen production technologies, underscoring the growing interest in expanding ML’s role throughout the hydrogen value chain.

7. Key Challenges, Opportunities, and Future Work

As ML continues to gain insights into hydrogen production research and applications, it is important to assess both the limitations and opportunities associated with its deployment. This section summarizes the key challenges that hinder the widespread implementation of ML in real-world hydrogen systems, as well as the emerging opportunities that highlight ML’s potential to enhance efficiency, reduce costs, and accelerate innovation across the hydrogen value chain.

7.1. Key Challenges

Despite rapid advancements in hydrogen production and the increasing integration of ML across various hydrogen pathways, several key challenges must be addressed to enable large-scale deployment and economic viability. These challenges span technical, economic, and operational domains, while also presenting opportunities for innovation and strategic growth.

One of the primary challenges is the high cost of hydrogen production, particularly for low-carbon pathways. As of 2025, gray hydrogen remains the least expensive, with production costs ranging from 0.7 to 2.3 USD/kg H₂, owing to mature infrastructure and low natural gas prices. Blue hydrogen, which incorporates carbon capture, is moderately more expensive at 1.4–3.2 USD/kg H₂, depending on the capture efficiency and technology used (e.g., SMR vs. ATR). Green hydrogen is the most variable and costly, with prices spanning 1.9–8.2 USD/kg H₂, influenced by factors such as electrolyzer type, electricity source, and scale [9]. These economic disparities pose a major barrier to market competitiveness, especially in regions where fossil-fuel-based hydrogen remains dominant.

In addition to cost, technical complexity remains a significant issue. Each production method presents unique optimization challenges—from managing catalyst degradation in turquoise and gray hydrogen processes to dealing with intermittent renewable input in green hydrogen electrolysis. Moreover, emerging methods like white and pink hydrogen require further exploration into geological extraction and nuclear integration, respectively, which are currently limited by data availability, infrastructure readiness, and public acceptance.

Deployment of ML in hydrogen production environments introduces several domain-specific challenges. Sensor reliability is a concern due to the high-temperature, high-pressure, and corrosive conditions typical of reformers and electrolysis units, often leading to degraded or missing data that can impair model performance. Additionally, concept drift caused by catalyst aging, membrane fouling, and feedstock variability can reduce accuracy over time, requiring adaptive or online learning methods. Integration with industrial control systems (e.g., SCADA, DCS) also presents hurdles, as these systems are not readily compatible with modern ML tools. To ensure safe and effective deployment, ML models must be interpretable, auditable, and capable of functioning within hybrid frameworks alongside traditional physics-based controls.

ML offers promising solutions to many of these challenges, yet its implementation in industrial settings is still nascent. While models such as ANNs, RF, and XGBoost have shown impressive accuracy in laboratory studies (often with R² > 0.95), real-world deployment is limited by data sparsity, a lack of standardization, and integration challenges with legacy systems. Additionally, model interpretability and trustworthiness are critical issues, especially in safety-critical applications like reactor control or carbon capture optimization.

Another challenge identified in this review is the lack of explicit criteria for ML algorithm selection in many surveyed studies. While ANNs dominate much of the literature—likely due to their ability to model complex nonlinear relationships typical of hydrogen production processes—the reasons for selecting particular models are often not clearly stated. In some cases, the prevalence of certain algorithms appears to be influenced by historical trends or researcher familiarity rather than systematic benchmarking. This lack of transparency makes it difficult to assess the true suitability of models for different hydrogen production contexts.

Additionally, certain patterns were observed across different hydrogen production pathways. In blue hydrogen (e.g., SMR and ATR) and gray hydrogen (e.g., coal gasification), the underlying chemical and thermodynamic processes involve complex, highly nonlinear interactions among multiple variables, making ANNs particularly suitable. In green hydrogen production, especially in water electrolysis and biomass gasification, time-dependent factors such as renewable energy variability and biomass heterogeneity further favor the use of flexible, nonlinear models like ANNs and ensemble methods such as RF. For emerging pathways like pink, turquoise, and white hydrogen, where datasets are often small and experimental conditions vary widely, simpler or ensemble models are sometimes preferred to enhance robustness and avoid overfitting.

7.2. Opportunities

On the opportunity side, the application of ML across the hydrogen value chain opens doors for optimization, cost reduction, and accelerated innovation. One major opportunity lies in predictive maintenance and fault detection. Hydrogen production facilities often experience costly downtimes due to equipment degradation (e.g., electrolyzer membrane wear, catalyst deactivation). ML can forecast failures based on sensor trends and historical patterns, enabling proactive interventions and minimizing unplanned shutdowns.

ML also enhances process optimization and real-time control, particularly for systems with fluctuating inputs—such as green hydrogen production linked to variable renewable energy. By dynamically adjusting operating conditions (e.g., voltage, temperature, flow rates), ML algorithms can maximize hydrogen yield, energy efficiency, and system lifetime. In multi-step processes like SMR or gasification, ML can support end-to-end optimization across heat exchangers, reactors, and gas separation units.

In materials discovery and system design, ML accelerates the identification of optimal catalysts, membranes, or sorbents by learning from experimental and simulation data. This reduces the need for exhaustive lab testing and guides researchers toward promising formulations faster. For example, ML-assisted screening has been applied to metal oxide photoelectrodes in PEC water splitting and to novel catalysts in methane pyrolysis.

7.3. Future Work

Future research should prioritize the development of robust and adaptive ML models capable of operating under realistic, time-varying conditions across diverse hydrogen production pathways. For green hydrogen, ML frameworks that dynamically respond to intermittent renewable energy inputs will be critical for optimizing electrolyzer performance and reducing operational variability. In blue and gray hydrogen systems, integrating long-term effects such as catalyst degradation, the low efficiency of carbon capture, and process instability remains a key challenge. Additionally, the lack of standardized, high-quality datasets, particularly for emerging methods such as turquoise, pink, and white hydrogen, continues to hinder generalizable model development. Coupling ML with physics-informed modeling may enhance interpretability and reliability, which are essential for industrial uptake. Moreover, future research should explore the integration of machine learning with techno-economic analysis (TEA) and life-cycle assessment (LCA) frameworks to better quantify the cost impacts of ML-driven optimization. Currently, only a few studies, such as that of Kim et al. (2022) [103] on pink hydrogen, have attempted to estimate hydrogen production costs using ML. Expanding this intersection could enable more accurate predictions of CAPEX, OPEX, and LCOH under optimized operating scenarios, thereby supporting more informed investment and policy decisions. Finally, ensuring model interpretability, uncertainty quantification, and integration with industrial control systems will be vital for bridging the gap between algorithm development and large-scale, real-world deployment.

In summary, while hydrogen production poses technical and economic challenges, the synergy between hydrogen production and ML presents a powerful opportunity to reshape the global energy landscape. Strategic investment in research, infrastructure, and policy—alongside continued ML innovation—will be critical to unlocking the full potential of a low-carbon hydrogen economy.

8. Conclusions

This review systematically explored the role of ML in hydrogen production, focusing on a wide range of production pathways, including green, blue, and gray, and emerging methods such as pink, turquoise, white, and black/brown hydrogen. This paper highlights how ML techniques have been employed to improve process efficiency, predict hydrogen yield, optimize operational parameters, and reduce environmental impact. The findings underscore the growing intersection between data-driven methods and hydrogen technologies, offering insights into current trends, prevailing challenges, and future directions. The key conclusions are as follows:

A total of 51 peer-reviewed papers from 2012 to 2025 were analyzed, covering ML applications across multiple hydrogen production pathways.
Green hydrogen received the most ML attention, especially in water electrolysis and biomass gasification, driven by the global shift toward carbon-neutral energy systems.
ANNs and their variants (e.g., MLP, BPNN, RBF) were the most frequently used models, applied in over 60% of the studies.
Ensemble learning methods like RF, GBR, and XGBoost demonstrated high predictive accuracy and are increasingly used in catalyst screening, syngas modeling, and multi-variable optimization.
Time-series models (e.g., LSTM, Bi-LSTM) were effectively employed in forecasting applications, such as renewable-energy-driven electrolysis and biohydrogen production.
Common input variables included process parameters (temperature, pressure, flow rates), feedstock properties (elemental composition, ash, moisture), and environmental conditions (solar irradiance, weather data).
Key predicted outputs included hydrogen yield, CO₂ capture or emission rates, syngas composition, and economic metrics such as production cost.
Major challenges include limited real-world deployment, data availability, and a lack of model interpretability, especially in safety-critical systems.
Future work should focus on developing robust, generalizable ML models supported by high-quality real-time datasets, emphasizing industrial integration, cost analysis, and techno-economic and life-cycle assessments and addressing current gaps in model validation and interpretability.

Author Contributions

Conceptualization, X.D. and G.Y.; methodology, X.D.; investigation, S.G.; resources, S.G.; writing—original draft preparation, X.D.; writing—review and editing, S.G. and G.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

ABR/ADA	AdaBoost regression
AdB	Adaptive boosting regression
AMT	Alternating model tree
ANFIS	Adaptive neuro-fuzzy inference system
ANN	Artificial neural network
ARM	Association rule mining
ASNN	Associative neural network
Bi-LSTM	Bidirectional LSTM
BP	Backpropagation
BR	Bayesian regularization
CNN	Convolutional neural network
DE	Differential evolution
DNN	Deep neural network
DRM	Dry reforming of methane
DT	Decision tree
ELM	Extreme learning machine
ENR	Elastic net regression
ETR	Ensemble tree regression
FFBPNN	Feed-forward backpropagation neural network
GA	Genetic algorithm
GBDT	Gradient boosting decision tree
GBM	Gradient boosting machine
GBR	Gradient boosting regression
GP	Genetic programming
GPR	Gaussian process regression
KNN	N-nearest neighbor
KR	Kernel ridge
LGB	LightGBM
LM	Levenberg–Marquardt
LSTM	Long short-term memory
LSSVM	Least squares support vector machine
LTS	Low temperature shift
MDEA	Methyl diethanolamine
MLP	Multilayer perceptron
MLR-RR	Multi-linear regression with ridge regularization
MOGA	Multi-objective genetic algorithm
MSE	Mean squared error
MTL	Multitask learning
MVR	Multivariate regression
NARX	Nonlinear autoregressive model with exogenous inputs neural network
NNs	Neural networks
NSGA-II	Non-dominated sorting genetic algorithm II
PLS	Partial least squares
PSA	Pressure swing adsorption
PSO	Particle swarm optimization
PZ	Piperazine
QSPR	Quantitative structure–property relationship
RBFNN	Radial basis function neural network
ResNet	Residual convolutional neural network
RF	Random forest
RR	Ridge regression
SCG	Scaled conjugate gradient
SE-SMR	Sorption-enhanced steam methane reforming
SMOreg	Sequential minimal optimization regression
SMR	Steam methane reforming
SMR	Small modular reactor
SVD	Singular value decomposition
SVM	Support vector machine
SVR	Support vector regression
TINN	Thermodynamics-informed neural network

References

IEA World Energy Outlook. 2024. Available online: https://www.iea.org/reports/world-energy-outlook-2024 (accessed on 11 February 2025).
Acar, C.; Dincer, I. Review and Evaluation of Hydrogen Production Options for Better Environment. J. Clean. Prod. 2019, 218, 835–849. [Google Scholar] [CrossRef]
Dawood, F.; Anda, M.; Shafiullah, G.M. Hydrogen Production for Energy: An Overview. Int. J. Hydrogen Energy 2020, 45, 3847–3869. [Google Scholar] [CrossRef]
Nikolaidis, P.; Poullikkas, A. A Comparative Overview of Hydrogen Production Processes. Renew. Sustain. Energy Rev. 2017, 67, 597–611. [Google Scholar] [CrossRef]
Holladay, J.D.; Hu, J.; King, D.L.; Wang, Y. An Overview of Hydrogen Production Technologies. Catal. Today 2009, 139, 244–260. [Google Scholar] [CrossRef]
IEA Global Hydrogen Review. 2024. Available online: https://www.iea.org/reports/global-hydrogen-review-2024 (accessed on 11 February 2025).
George Davies, W.; Babamohammadi, S.; Yang, Y.; Masoudi Soltani, S. The Rise of the Machines: A State-of-the-Art Technical Review on Process Modelling and Machine Learning within Hydrogen Production with Carbon Capture. Gas. Sci. Eng. 2023, 118, 205104. [Google Scholar] [CrossRef]
Incer-Valverde, J.; Korayem, A.; Tsatsaronis, G.; Morosuk, T. “Colors” of Hydrogen: Definitions and Carbon Intensity. Energy Convers. Manag. 2023, 291, 117294. [Google Scholar] [CrossRef]
United Nations Economic Commission for Europe (UNECE). Hydrogen: Technology Brief. 2022. Available online: https://unece.org/hydrogen (accessed on 20 February 2025).
Jordan, M.I.; Mitchell, T.M. Machine Learning: Trends, Perspectives, and Prospects. Science (1979) 2015, 349, 255–260. [Google Scholar] [CrossRef]
Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef]
Shahin, M.; Simjoo, M. Potential Applications of Innovative AI-Based Tools in Hydrogen Energy Development: Leveraging Large Language Model Technologies. Int. J. Hydrogen Energy 2025, 102, 918–936. [Google Scholar] [CrossRef]
Kwon, H.; Park, J.; Shin, J.E.; Koo, B. Optimal Investment Strategy Analysis of On-Site Hydrogen Production Based on the Hydrogen Demand Prediction Using Machine Learning. Int. J. Energy Res. 2024, 2024. [Google Scholar] [CrossRef]
Dash, S.K.; Chakraborty, S.; Elangovan, D. A Brief Review of Hydrogen Production Methods and Their Challenges. Energies 2023, 16, 1141. [Google Scholar] [CrossRef]
Alagumalai, A.; Devarajan, B.; Song, H.; Wongwises, S.; Ledesma-Amaro, R.; Mahian, O.; Sheremet, M.; Lichtfouse, E. Machine Learning in Biohydrogen Production: A Review. Biofuel Res. J. 2023, 10, 1844–1858. [Google Scholar] [CrossRef]
Kumar Sharma, A.; Kumar Ghodke, P.; Goyal, N.; Nethaji, S.; Chen, W.-H. Machine Learning Technology in Biohydrogen Production from Agriculture Waste: Recent Advances and Future Perspectives. Bioresour. Technol. 2022, 364, 128076. [Google Scholar] [CrossRef]
Bassey, K.E.; Ibegbulam, C. Machine learning for green hydrogen production. Comput. Sci. IT Res. J. 2023, 4, 368–385. [Google Scholar] [CrossRef]
Allal, Z.; Noura, H.N.; Salman, O.; Vernier, F.; Chahine, K. A Review on Machine Learning Applications in Hydrogen Energy Systems. Int. J. Thermofluids 2025, 26, 101119. [Google Scholar] [CrossRef]
Takeda, S.; Nam, H.; Chapman, A. Low-Carbon Energy Transition with the Sun and Forest: Solar-Driven Hydrogen Production from Biomass. Int. J. Hydrogen Energy 2022, 47, 24651–24668. [Google Scholar] [CrossRef]
Devasahayam, S. Deep Learning Models in Python for Predicting Hydrogen Production: A Comparative Study. Energy 2023, 280, 128088. [Google Scholar] [CrossRef]
Qi, H.; Cui, P.; Liu, Z.; Xu, Z.; Yao, D.; Wang, Y.; Zhu, Z.; Yang, S. Conceptual Design and Comprehensive Analysis for Novel Municipal Sludge Gasification-Based Hydrogen Production via Plasma Gasifier. Energy Convers. Manag. 2021, 245, 114635. [Google Scholar] [CrossRef]
Haq, Z.U.; Ullah, H.; Khan, M.N.A.; Naqvi, S.R.; Ahsan, M. Hydrogen Production Optimization from Sewage Sludge Supercritical Gasification Process Using Machine Learning Methods Integrated with Genetic Algorithm. Chem. Eng. Res. Des. 2022, 184, 614–626. [Google Scholar] [CrossRef]
Kononenko, I. Machine Learning for Medical Diagnosis: History, State of the Art and Perspective. Artif. Intell. Med. 2001, 23, 89–109. [Google Scholar] [CrossRef]
Fradkov, A.L. Early History of Machine Learning. IFAC-Pap. 2020, 53, 1385–1390. [Google Scholar] [CrossRef]
Zhu, X.; Goldberg, A.B. Introduction to Semi-Supervised Learning; Springer International Publishing: Cham, Switzerland, 2009; ISBN 978-3-031-00420-9. [Google Scholar]
Zhou, Z.-H. Machine Learning; Springer Singapore: Singapore, 2021; ISBN 978-981-15-1966-6. [Google Scholar]
Mahesh, B. Machine Learning Algorithms—A Review. Int. J. Sci. Res. (IJSR) 2020, 9, 381–386. [Google Scholar] [CrossRef]
Rundo, F.; Trenta, F.; di Stallo, A.L.; Battiato, S. Machine Learning for Quantitative Finance Applications: A Survey. Appl. Sci. 2019, 9, 5574. [Google Scholar] [CrossRef]
Zitnik, M.; Nguyen, F.; Wang, B.; Leskovec, J.; Goldenberg, A.; Hoffman, M.M. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. Inf. Fusion. 2019, 50, 71–91. [Google Scholar] [CrossRef] [PubMed]
Lary, D.J.; Alavi, A.H.; Gandomi, A.H.; Walker, A.L. Machine Learning in Geosciences and Remote Sensing. Geosci. Front. 2016, 7, 3–10. [Google Scholar] [CrossRef]
Morgan, D.; Jacobs, R. Opportunities and Challenges for Machine Learning in Materials Science. Annu. Rev. Mater. Res. 2020, 50, 71–103. [Google Scholar] [CrossRef]
Schweidtmann, A.M.; Esche, E.; Fischer, A.; Kloft, M.; Repke, J.; Sager, S.; Mitsos, A. Machine Learning in Chemical Engineering: A Perspective. Chem. Ing. Tech. 2021, 93, 2029–2039. [Google Scholar] [CrossRef]
Murkin, C.; Brightling, J. Eighty Years of Steam Reforming. Johns. Matthey Technol. Rev. 2016, 60, 263–269. [Google Scholar] [CrossRef]
Simpson, A.; Lutz, A. Exergy Analysis of Hydrogen Production via Steam Methane Reforming. Int. J. Hydrogen Energy 2007, 32, 4811–4820. [Google Scholar] [CrossRef]
Saha, P.; Akash, F.A.; Shovon, S.M.; Monir, M.U.; Ahmed, M.T.; Khan, M.F.H.; Sarkar, S.M.; Islam, M.K.; Hasan, M.M.; Vo, D.-V.N.; et al. Grey, Blue, and Green Hydrogen: A Comprehensive Review of Production Methods and Prospects for Zero-Emission Energy. Int. J. Green. Energy 2024, 21, 1383–1397. [Google Scholar] [CrossRef]
Oni, A.O.; Anaya, K.; Giwa, T.; Di Lullo, G.; Kumar, A. Comparative Assessment of Blue Hydrogen from Steam Methane Reforming, Autothermal Reforming, and Natural Gas Decomposition Technologies for Natural Gas-Producing Regions. Energy Convers. Manag. 2022, 254, 115245. [Google Scholar] [CrossRef]
Bauer, C.; Treyer, K.; Antonini, C.; Bergerson, J.; Gazzani, M.; Gencer, E.; Gibbins, J.; Mazzotti, M.; McCoy, S.T.; McKenna, R.; et al. On the Climate Impacts of Blue Hydrogen Production. Sustain. Energy Fuels 2022, 6, 66–75. [Google Scholar] [CrossRef]
Van Cappellen, L.; Croezen, H.; Rooijers, F. Feasibility Study into Blue Hydrogen Technical, Economic & Sustainability Analysis. 2018. Available online: https://www.cedelft.eu/en/publications/2149/ (accessed on 20 March 2025).
AlHumaidan, F.S.; Absi Halabi, M.; Rana, M.S.; Vinoba, M. Blue Hydrogen: Current Status and Future Technologies. Energy Convers. Manag. 2023, 283, 116840. [Google Scholar] [CrossRef]
Howarth, R.W.; Jacobson, M.Z. How Green Is Blue Hydrogen? Energy Sci. Eng. 2021, 9, 1676–1687. [Google Scholar] [CrossRef]
Vo, N.D.; Oh, D.H.; Kang, J.-H.; Oh, M.; Lee, C.-H. Dynamic-Model-Based Artificial Neural Network for H₂ Recovery and CO₂ Capture from Hydrogen Tail Gas. Appl. Energy 2020, 273, 115263. [Google Scholar] [CrossRef]
Yu, X.; Shen, Y.; Guan, Z.; Zhang, D.; Tang, Z.; Li, W. Multi-Objective Optimization of ANN-Based PSA Model for Hydrogen Purification from Steam-Methane Reforming Gas. Int. J. Hydrogen Energy 2021, 46, 11740–11755. [Google Scholar] [CrossRef]
Tong, L.; Bénard, P.; Zong, Y.; Chahine, R.; Liu, K.; Xiao, J. Artificial Neural Network Based Optimization of a Six-Step Two-Bed Pressure Swing Adsorption System for Hydrogen Purification. Energy AI 2021, 5, 100075. [Google Scholar] [CrossRef]
Streb, A.; Mazzotti, M. Performance Limits of Neural Networks for Optimizing an Adsorption Process for Hydrogen Purification and CO₂ Capture. Comput. Chem. Eng. 2022, 166, 107974. [Google Scholar] [CrossRef]
Nkulikiyinka, P.; Wagland, S.T.; Manovic, V.; Clough, P.T. Prediction of Combined Sorbent and Catalyst Materials for SE-SMR, Using QSPR and Multitask Learning. Ind. Eng. Chem. Res. 2022, 61, 9218–9233. [Google Scholar] [CrossRef]
Vo, N.D.; Kang, J.-H.; Oh, D.-H.; Jung, M.Y.; Chung, K.; Lee, C.-H. Sensitivity Analysis and Artificial Neural Network-Based Optimization for Low-Carbon H₂ Production via a Sorption-Enhanced Steam Methane Reforming (SESMR) Process Integrated with Separation Process. Int. J. Hydrogen Energy 2022, 47, 820–847. [Google Scholar] [CrossRef]
Oh, H.-T.; Kum, J.; Park, J.; Dat Vo, N.; Kang, J.-H.; Lee, C.-H. Pre-Combustion CO₂ Capture Using Amine-Based Absorption Process for Blue H₂ Production from Steam Methane Reformer. Energy Convers. Manag. 2022, 262, 115632. [Google Scholar] [CrossRef]
Pizoń, Z.; Kimijima, S.; Brus, G. Enhancing a Deep Learning Model for the Steam Reforming Process Using Data Augmentation Techniques. Energies 2024, 17, 2413. [Google Scholar] [CrossRef]
Wang, Y.; Cui, X.; Peters, D.; Çıtmacı, B.; Alnajdi, A.; Morales-Guio, C.G.; Christofides, P.D. Machine Learning-Based Predictive Control of an Electrically-Heated Steam Methane Reforming Process. Digit. Chem. Eng. 2024, 12, 100173. [Google Scholar] [CrossRef]
Cherif, A.; Lee, J.-S.; Nebbali, R.; Lee, C.-J. Novel Design and Multi-Objective Optimization of Autothermal Steam Methane Reformer to Enhance Hydrogen Production and Thermal Matching. Appl. Therm. Eng. 2022, 217, 119140. [Google Scholar] [CrossRef]
Gul, H.; Arshad, M.Y.; Tahir, M.W. Production of H₂ via Sorption Enhanced Auto-Thermal Reforming for Small Scale Applications-A Process Modeling and Machine Learning Study. Int. J. Hydrogen Energy 2023, 48, 12622–12635. [Google Scholar] [CrossRef]
Newborough, M.; Cooley, G. Developments in the Global Hydrogen Market: The Spectrum of Hydrogen Colours. Fuel Cells Bull. 2020, 2020, 16–22. [Google Scholar] [CrossRef]
Chavan, P.D.; Sharma, T.; Mall, B.K.; Rajurkar, B.D.; Tambe, S.S.; Sharma, B.K.; Kulkarni, B.D. Development of Data-Driven Models for Fluidized-Bed Coal Gasification Process. Fuel 2012, 93, 44–51. [Google Scholar] [CrossRef]
Patil-Shinde, V.; Kulkarni, T.; Kulkarni, R.; Chavan, P.D.; Sharma, T.; Sharma, B.K.; Tambe, S.S.; Kulkarni, B.D. Artificial Intelligence-Based Modeling of High Ash Coal Gasification in a Pilot Plant Scale Fluidized Bed Gasifier. Ind. Eng. Chem. Res. 2014, 53, 18678–18689. [Google Scholar] [CrossRef]
Azzam, M.; Aramouni, N.A.K.; Ahmad, M.N.; Awad, M.; Kwapinski, W.; Zeaiter, J. Dynamic Optimization of Dry Reformer under Catalyst Sintering Using Neural Networks. Energy Convers. Manag. 2018, 157, 146–156. [Google Scholar] [CrossRef]
Alsaffar, M.A.; Mageed, A.K.; Abdel Ghany, M.A.R.; Ayodele, B.V.; Mustapa, S.I. Elucidating the Non-Linear Effect of Process Parameters on Hydrogen Production by Catalytic Methane Reforming: An Artificial Intelligence Approach. IOP Conf. Ser. Mater. Sci. Eng. 2020, 991, 012078. [Google Scholar] [CrossRef]
Le, V.T.; Dragoi, E.-N.; Almomani, F.; Vasseghian, Y. Artificial Neural Networks for Predicting Hydrogen Production in Catalytic Dry Reforming: A Systematic Review. Energies 2021, 14, 2894. [Google Scholar] [CrossRef]
Byun, M.; Lee, H.; Choe, C.; Cheon, S.; Lim, H. Machine Learning Based Predictive Model for Methanol Steam Reforming with Technical, Environmental, and Economic Perspectives. Chem. Eng. J. 2021, 426, 131639. [Google Scholar] [CrossRef]
Ayodele, B.V.; Mustapa, S.I.; Kanthasamy, R.; Zwawi, M.; Cheng, C.K. Modeling the Prediction of Hydrogen Production by Co-gasification of Plastic and Rubber Wastes Using Machine Learning Algorithms. Int. J. Energy Res. 2021, 45, 9580–9594. [Google Scholar] [CrossRef]
Ayodele, B.V.; Alsaffar, M.A.; Mustapa, S.I.; Adesina, A.; Kanthasamy, R.; Witoon, T.; Abdullah, S. Process Intensification of Hydrogen Production by Catalytic Steam Methane Reforming: Performance Analysis of Multilayer Perceptron-Artificial Neural Networks and Nonlinear Response Surface Techniques. Process Saf. Environ. Prot. 2021, 156, 315–329. [Google Scholar] [CrossRef]
Hong, S.; Lee, J.; Cho, H.; Kim, M.; Moon, I.; Kim, J. Multi-Objective Optimization of CO₂ Emission and Thermal Efficiency for on-Site Steam Methane Reforming Hydrogen Production Process Using Machine Learning. J. Clean. Prod. 2022, 359, 132133. [Google Scholar] [CrossRef]
Chen, W.; Chen, Z.; Hsu, S.; Park, Y.; Juan, J.C. Reactor Design of Methanol Steam Reforming by Evolutionary Computation and Hydrogen Production Maximization by Machine Learning. Int. J. Energy Res. 2022, 46, 20685–20703. [Google Scholar] [CrossRef]
Kim, C.; Won, W.; Kim, J. Early-Stage Evaluation of Catalyst Using Machine Learning Based Modeling and Simulation of Catalytic Systems: Hydrogen Production via Water–Gas Shift over Pt Catalysts. ACS Sustain. Chem. Eng. 2022, 10, 14417–14432. [Google Scholar] [CrossRef]
Liu, S.; Yang, Y.; Yu, L.; Zhu, F.; Cao, Y.; Liu, X.; Yao, A.; Cao, Y. Predicting Gas Production by Supercritical Water Gasification of Coal Using Machine Learning. Fuel 2022, 329, 125478. [Google Scholar] [CrossRef]
Huang, J.; Liang, Z.; Liu, Y. Smart Reforming for Hydrogen Production via Machine Learning. Chem. Eng. Sci. 2025, 304, 120959. [Google Scholar] [CrossRef]
Chi, J.; Yu, H. Water Electrolysis Based on Renewable Energy for Hydrogen Production. Chin. J. Catal. 2018, 39, 390–394. [Google Scholar] [CrossRef]
El-Shafie, M. Hydrogen Production by Water Electrolysis Technologies: A Review. Results Eng. 2023, 20, 101426. [Google Scholar] [CrossRef]
Alamiery, A. Advancements in Materials for Hydrogen Production: A Review of Cutting-Edge Technologies. ChemPhysMater 2023. [Google Scholar] [CrossRef]
Valizadeh, S.; Hakimian, H.; Farooq, A.; Jeon, B.-H.; Chen, W.-H.; Hoon Lee, S.; Jung, S.-C.; Won Seo, M.; Park, Y.-K. Valorization of Biomass through Gasification for Green Hydrogen Generation: A Comprehensive Review. Bioresour. Technol. 2022, 365, 128143. [Google Scholar] [CrossRef]
Pan, J.; Shahbeik, H.; Shafizadeh, A.; Rafiee, S.; Golvirdizadeh, M.; Ghafarian Nia, S.A.; Mobli, H.; Yang, Y.; Zhang, G.; Tabatabaei, M.; et al. Machine Learning Optimization for Enhanced Biomass-Coal Co-Gasification. Renew. Energy 2024, 229, 120772. [Google Scholar] [CrossRef]
Kumar, M.; Meena, B.; Subramanyam, P.; Suryakala, D.; Subrahmanyam, C. Recent Trends in Photoelectrochemical Water Splitting: The Role of Cocatalysts. NPG Asia Mater. 2022, 14, 88. [Google Scholar] [CrossRef]
Mohd Raub, A.A.; Bahru, R.; Mohd Nashruddin, S.N.A.; Yunas, J. Advances of Nanostructured Metal Oxide as Photoanode in Photoelectrochemical (PEC) Water Splitting Application. Heliyon 2024, 10, e39079. [Google Scholar] [CrossRef]
Saifuddin, N.; Priatharsini, P. Developments in Bio-Hydrogen Production from Algae: A Review. Res. J. Appl. Sci. Eng. Technol. 2016, 12, 968–982. [Google Scholar] [CrossRef]
Li, J.; Pan, L.; Suvarna, M.; Wang, X. Machine Learning Aided Supercritical Water Gasification for H₂-Rich Syngas Production with Process Optimization and Catalyst Screening. Chem. Eng. J. 2021, 426, 131285. [Google Scholar] [CrossRef]
Sezer, S.; Özveren, U. Investigation of Syngas Exergy Value and Hydrogen Concentration in Syngas from Biomass Gasification in a Bubbling Fluidized Bed Gasifier by Using Machine Learning. Int. J. Hydrogen Energy 2021, 46, 20377–20396. [Google Scholar] [CrossRef]
Saadetnejad, D.; Oral, B.; Can, E.; Yıldırım, R. Machine Learning Analysis of Gas Phase Photocatalytic CO₂ Reduction for Hydrogen Production. Int. J. Hydrogen Energy 2022, 47, 19655–19668. [Google Scholar] [CrossRef]
Cheng, G.; Luo, E.; Zhao, Y.; Yang, Y.; Chen, B.; Cai, Y.; Wang, X.; Dong, C. Analysis and Prediction of Green Hydrogen Production Potential by Photovoltaic-Powered Water Electrolysis Using Machine Learning in China. Energy 2023, 284, 129302. [Google Scholar] [CrossRef]
Yang, Q.; Ma, Z.; Bai, L.; Yuan, Q.; Gou, F.; Li, Y.; Du, Z.; Chen, Y.; Liu, X.; Yu, J.; et al. Machine Learning Assisted Prediction for Hydrogen Production of Advanced Photovoltaic Technologies. DeCarbon 2024, 4, 100050. [Google Scholar] [CrossRef]
Babay, M.-A.; Adar, M.; Chebak, A.; Mabrouki, M. Forecasting Green Hydrogen Production: An Assessment of Renewable Energy Systems Using Deep Learning and Statistical Methods. Fuel 2025, 381, 133496. [Google Scholar] [CrossRef]
Salah, A.; Hanel, L.; Beirow, M.; Scheffknecht, G. Modelling SER Biomass Gasification Using Dynamic Neural Networks. In Computer Aided Chemical Engineering; Elsevier: Amsterdam, The Netherlands, 2016; Volume 38, pp. 19–24. [Google Scholar]
Krzywanski, J.; Fan, H.; Feng, Y.; Shaikh, A.R.; Fang, M.; Wang, Q. Genetic Algorithms and Neural Networks in Optimization of Sorbent Enhanced H₂ Production in FB and CFB Gasifiers. Energy Convers. Manag. 2018, 171, 1651–1661. [Google Scholar] [CrossRef]
Ozbas, E.E.; Aksu, D.; Ongen, A.; Aydin, M.A.; Ozcan, H.K. Hydrogen Production via Biomass Gasification, and Modeling by Supervised Machine Learning Algorithms. Int. J. Hydrogen Energy 2019, 44, 17260–17268. [Google Scholar] [CrossRef]
Torky, M.; Dahy, G.; Hassanein, A.E. GH₂_MobileNet: Deep Learning Approach for Predicting Green Hydrogen Production from Organic Waste Mixtures. Appl. Soft Comput. 2023, 138, 110215. [Google Scholar] [CrossRef]
Gil, M.V.; Jablonka, K.M.; Garcia, S.; Pevida, C.; Smit, B. Biomass to Energy: A Machine Learning Model for Optimum Gasification Pathways. Digit. Discov. 2023, 2, 929–940. [Google Scholar] [CrossRef]
Meena, M.; Kumar, H.; Chaturvedi, N.D.; Kovalev, A.A.; Bolshev, V.; Kovalev, D.A.; Sarangi, P.K.; Chawade, A.; Rajput, M.S.; Vivekanand, V.; et al. Biomass Gasification and Applied Intelligent Retrieval in Modeling. Energies 2023, 16, 6524. [Google Scholar] [CrossRef]
Oral, B.; Can, E.; Yildirim, R. Analysis of Photoelectrochemical Water Splitting Using Machine Learning. Int. J. Hydrogen Energy 2022, 47, 19633–19654. [Google Scholar] [CrossRef]
Tajima, M.; Nagai, Y.; Chen, S.; Pan, Z.; Katayama, K. A Robust Methodology for PEC Performance Analysis of Photoanodes Using Machine Learning and Analytical Data. Analyst 2024, 149, 4193–4207. [Google Scholar] [CrossRef]
Sahu, N.; Azad, C.; Kumar, U. Construction of Hybrid Models Based on Cascade Technique Using Basic Machine Learning Models: An Application as Photocurrent Density Predictor of the Photoelectrode in PEC Cell. Mater. Today Commun. 2024, 41, 110643. [Google Scholar] [CrossRef]
Mishra, S.; Kumar, P.; Dey, S.; Pattanayak, P.; Singh, T. Design of Ternary Metal Oxides for Photoelectrochemical Water Splitting Using Machine Learning Techniques. J. Environ. Chem. Eng. 2025, 13, 115260. [Google Scholar] [CrossRef]
Taheri, E.; Amin, M.M.; Fatehizadeh, A.; Rezakazemi, M.; Aminabhavi, T.M. Artificial Intelligence Modeling to Predict Transmembrane Pressure in Anaerobic Membrane Bioreactor-Sequencing Batch Reactor during Biohydrogen Production. J. Environ. Manag. 2021, 292, 112759. [Google Scholar] [CrossRef]
Hosseinzadeh, A.; Zhou, J.L.; Altaee, A.; Li, D. Machine Learning Modeling and Analysis of Biohydrogen Production from Wastewater by Dark Fermentation Process. Bioresour. Technol. 2022, 343, 126111. [Google Scholar] [CrossRef] [PubMed]
Venkatesh, P.; Chowdhury, M.R.; Rajasekhar, N.; Radhakrishnan, T.K.; Samsudeen, N. Deep Learning Based Modelling and Control of a Microbial Electrolysis Cell for Enhanced Bio Hydrogen Production. Int. J. Hydrogen Energy 2024. [Google Scholar] [CrossRef]
Office of Nuclear Energy Nine Mile Point Begins Clean Hydrogen Production. Available online: https://www.energy.gov/ne/articles/nine-mile-point-begins-clean-hydrogen-production (accessed on 23 March 2025).
Fernández-Arias, P.; Antón-Sancho, Á.; Lampropoulos, G.; Vergara, D. Emerging Trends and Challenges in Pink Hydrogen Research. Energies 2024, 17, 2291. [Google Scholar] [CrossRef]
Diab, J.; Fulcheri, L.; Hessel, V.; Rohani, V.; Frenklach, M. Why Turquoise Hydrogen Will Be a Game Changer for the Energy Transition. Int. J. Hydrogen Energy 2022, 47, 25831–25848. [Google Scholar] [CrossRef]
Muradov, N.; Vezirolu, T. From Hydrocarbon to Hydrogen?Carbon to Hydrogen Economy. Int. J. Hydrogen Energy 2005, 30, 225–237. [Google Scholar] [CrossRef]
Steinberg, M. Fossil Fuel Decarbonization Technology for Mitigating Global Warming. Int. J. Hydrogen Energy 1999, 24, 771–777. [Google Scholar] [CrossRef]
Dagle, R.A.; Dagle, V.; Bearden, M.D.; Holladay, J.D.; Krause, T.R.; Ahmed, S. An Overview of Natural Gas Conversion Technologies for Co-Production of Hydrogen and Value-Added Solid Carbon Products; Richland, WA, USA, 2017. [Google Scholar] [CrossRef]
Bhutto, A.W.; Bazmi, A.A.; Zahedi, G. Underground Coal Gasification: From Fundamentals to Applications. Prog. Energy Combust. Sci. 2013, 39, 189–214. [Google Scholar] [CrossRef]
Jiang, L.; Xue, D.; Wei, Z.; Chen, Z.; Mirzayev, M.; Chen, Y.; Chen, S. Coal Decarbonization: A State-of-the-Art Review of Enhanced Hydrogen Production in Underground Coal Gasification. Energy Rev. 2022, 1, 100004. [Google Scholar] [CrossRef]
Schneider, S.; Bajohr, S.; Graf, F.; Kolb, T. State of the Art of Hydrogen Production via Pyrolysis of Natural Gas. ChemBioEng Rev. 2020, 7, 150–158. [Google Scholar] [CrossRef]
Hermesmann, M.; Müller, T.E. Green, Turquoise, Blue, or Grey? Environmentally Friendly Hydrogen Production in Transforming Energy Systems. Prog. Energy Combust. Sci. 2022, 90, 100996. [Google Scholar] [CrossRef]
Kim, J.; Rweyemamu, M.; Purevsuren, B. Machine Learning-Based Approach for Hydrogen Economic Evaluation of Small Modular Reactors. Sci. Technol. Nucl. Install. 2022, 2022, 1–9. [Google Scholar] [CrossRef]
Salimian, A.; Grisan, E. Deep Learning Analysis of Plasma Emissions: A Potential System for Monitoring Methane and Hydrogen in the Pyrolysis Processes. Int. J. Hydrogen Energy 2024, 58, 1030–1043. [Google Scholar] [CrossRef]
Wen, Y.; Wang, S.; Wu, L.; Hondo, E.; Tang, C.; Jiang, J.; Ho, G.W.; Kawi, S.; Wang, C.-H. Exploring the Role of Process Control and Catalyst Design in Methane Catalytic Decomposition: A Machine Learning Perspective. Int. J. Hydrogen Energy 2024, 72, 601–613. [Google Scholar] [CrossRef]
Zhang, T.; Zhang, Y.; Katterbauer, K.; Al Shehri, A.; Sun, S.; Hoteit, I. Deep Learning–Assisted Phase Equilibrium Analysis for Producing Natural Hydrogen. Int. J. Hydrogen Energy 2024, 50, 473–486. [Google Scholar] [CrossRef]
Zhao, Y.; Wang, J.; Yi, Q. Bridging Uncertainty Gaps with Artificial Intelligence-Assisted Syngas Precise Prediction in Coal Gasification. Chem. Eng. Sci. 2025, 301, 120734. [Google Scholar] [CrossRef]
Ceylan, Z.; Ceylan, S. Application of Machine Learning Algorithms to Predict the Performance of Coal Gasification Process. In Applications of Artificial Intelligence in Process Systems Engineering; Elsevier: Amsterdam, The Netherlands, 2021; pp. 165–186. [Google Scholar] [CrossRef]

Figure 1. Global energy mix by scenario to 2050 [1]. STEPS = Stated Policies Scenario; APS = Announced Pledges Scenario; NZE = Net Zero Emissions by 2025 Scenario.

Figure 2. Hydrogen demand by sector and by region, historical and in the Net Zero Emissions by 2050 Scenario, 2019–2030 [6].

Figure 3. Classification of hydrogen production methods.

Figure 4. Carbon intensity for various H₂ production methods in 2022 (data obtained from UNECE [9]).

Figure 5. Most frequently used keywords of ML applications in hydrogen production according to the Scopus database (developed by VOSviewer, version 1.6.20).

Figure 6. Simplified process flow diagram of SMR.

Figure 7. Simplified process flow diagram of ATR.

Figure 8. Nine Mile Point Nuclear Station (photo: Constellation Energy) [93].

Figure 9. Simplified process flow diagram of coal gasifier.

Table 1. Selected examples of common ML algorithms.

Algorithm	Concept	Sample Applications	Advantages	Limitations
LR	One of the simplest ML algorithms used for predicting continuous numerical values. It assumes a linear relationship between input features (independent variables) and the target variable (dependent variable). The algorithm fits a straight line that best represents the relationship between the input and output.	Predicting house prices based on size, location, and other features; forecasting sales trends in retail and e-commerce; stock price prediction in financial markets.	Simple and easy to interpret; works well when the relationship between variables is approximately linear.	Fails for nonlinear relationships; sensitive to outliers, which can distort predictions.
DT	A tree-like structure used for both classification and regression. It splits data into branches based on conditions, forming a flowchart-like decision model. Each node represents a decision based on a feature, and branches lead to possible outcomes.	Credit risk assessment (loan approvals); medical diagnosis (classifying diseases based on symptoms); customer segmentation (targeted marketing.	Easy to interpret and visualize; handles both numerical and categorical data; works well for small to medium-sized datasets.	Prone to overfitting on complex datasets; highly sensitive to noisy data (small changes in data can lead to different tree structures).
RF	An ensemble learning algorithm that builds multiple decision trees and combines their results to make more accurate predictions. It reduces overfitting by averaging multiple trees trained on different subsets of data.	Fraud detection in banking; predicting customer churn in telecom and subscription-based businesses; medical imaging analysis (cancer detection from MRI scans).	Higher accuracy than a single decision tree; handles missing data well and works on large datasets; reduces overfitting by combining multiple trees.	Computationally expensive for large datasets; harder to interpret compared to a single decision tree.
SVM	A powerful classification algorithm that works by finding the best decision boundary (hyperplane) for separating different classes. It aims to maximize the margin between data points of different classes.	Text classification (spam email detection); image recognition (face detection); medical diagnostics (classifying tumors as benign or malignant).	Effective for high-dimensional data; works well for small datasets with clear class separation.	Computationally expensive for large datasets; sensitive to noisy data and requires careful feature scaling.
K-means	An unsupervised learning algorithm that groups similar data points into k clusters. It minimizes the distance between data points within a cluster and assigns new data to the closest cluster.	Customer segmentation in marketing; anomaly detection (fraudulent transactions); image segmentation in computer vision.	Fast and scalable for large datasets; works well when clusters are clearly defined.	Sensitive to outliers; requires the number of clusters (k) to be predefined.
ANN	Inspired by the human brain, consisting of layers of interconnected neurons. These models use backpropagation to adjust weights and improve accuracy.	Speech recognition (Google Assistant, Siri); autonomous driving (object detection in self-driving cars); medical diagnostics (AI-driven X-ray analysis).	Handles complex problems like speech and image recognition; self-learning capabilities from vast amounts of data.	Requires large datasets for training; computationally expensive (needs GPUs).
Gradient Boosting (XGBoost, LightGBM, CatBoost)	Combines multiple weak models (decision trees) to create a strong predictive model. It corrects previous mistakes iteratively using gradient descent.	Financial modeling (credit scoring); weather forecasting; medical outcome prediction.	Handles missing data and outliers well.	Computationally expensive for big data; prone to overfitting if not carefully tuned.

Table 2. Comparison of ATR and SMR for blue hydrogen production [39].

Feature	SMR	ATR
Feedstock	Natural gas (CH₄)	Natural gas (CH₄)
Process complexity	Lower (relies on external heating)	Higher (requires O₂ supply)
CO₂ capture efficiency	Moderate (~75–85% with CCUS)	Higher (~95% with CCUS)
Energy requirements	Higher (requires external heat input)	Lower (self-sustaining heat generation)
CO₂ emissions	Moderate (requires CCUS)	Lower (easier CO₂ separation)
Industrial maturity	Widely used globally	Emerging but growing
Capital costs	Lower (simpler design)	Higher (complex setup)

Table 3. Research overview of ML within blue hydrogen production.

Category	No.	Reference	Algorithms	Dataset	Inputs	Output(s)	Key Findings
SMR	1	Vo et al. (2020) [41]	Dynamic-model-based ANN, SVD, FFBPNN	108 (cryogenic unit); 291 (membrane); 35 (PSA)	Membrane area, adsorption time, purge-to-feed ratio	H₂ purity, CO₂ capture rate, H₂ productivity, H₂ production cost, energy consumption for CO₂ capture	ANN models provided high accuracy (<2% error) and significant computational cost reduction.
	2	Yu et al. (2021) [42]	ANN-GA	100	Adsorption pressure, part of adsorption time, feed flow rate, length of activated carbon layer, ratio of purge to feed	Purity of hydrogen, recovery rate, productivity of the PSA process	The optimized PSA process achieved hydrogen purity above 99% while balancing recovery and productivity.
	3	Tong et al. (2021) [43]	ANN	112	Adsorption pressure, adsorption step time	H₂ purity and recovery	ANN can effectively predict and optimize PSA ¹-based hydrogen purification.
	4	Streb and Mazzotti (2022) [44]	ANN	20,000	Feed composition (mol fractions of CO₂, CO, CH₄, N₂, Ar, and H₂), adsorption time, light purge duration, evacuation pressure, recycle ratio	H₂ purity, H₂ recovery, CO₂ purity, CO₂ recovery, CO₂ specific energy consumption, productivity	ANN successfully used for multi-objective constrained optimization: H₂ purity ≥ 99–99.97%; CO₂ purity ≥ 96%; H₂ and CO₂ recovery ≥ 90%.
	5	Nkulikiyinka et al. (2022) [45]	QSPR, MTL, ASNN, DNN, LSSVM	446	Molecular descriptors of materials, CaO or Ni concentration, calcination/carbonation temperature and time, synthesis method, BET surface area, steam-to-carbon ratio	Methane conversion, CO₂ capture capacity	ASNN with GSFrag descriptors + multitask learning gave the most accurate predictions.
	6	Vo et al. (2022) [46]	ANN	402	Inlet temperature, velocity, steam-to-carbon ratio, purge-to-feed ratio, adsorption pressure	H₂ purity and recovery, CO₂ capture efficiency, H₂ production cost, energy consumption	The ANN-based SE-SMR model reduces simulation time from 2 h to 20 s, achieving 99.99% H₂ purity and 90.3% CO₂ capture efficiency.
	7	Oh et al. (2022) [47]	ANN-DE	480	MDEA concentration, PZ concentration, flash drum pressure, PZ ion flow rate in lean-amine solvent	Reboiler duty, electricity consumption, total equivalent work	ANNs and DE successfully optimize pre-combustion CO₂ capture in SMR-based blue hydrogen production.
	8	Pizoń et al. (2024) [48]	ANN	10,475	Temperature, steam-to-CH₄ ratio, N₂-to-CH₄ ratio, CH₄ flow rate, nickel catalyst mass	Concentration of H₂, CO, CO₂, and CH₄	The ANN model performs better than traditional kinetic models, showing MSE = 0.00022 compared to alternative models.
	9	Wang et al. (2024) [49]	RNN, LSTM	100,000	Electric current, reactor temperature, flow rate of CH₄, H₂, CO₂, and CO	Reactor temperature, flow rate of CH₄, H₂, CO₂, and CO	The LSTM-RNN model can accurately predict reactor dynamics and drive model predictive control for H₂ production.
ATR	10	Cherif et al. (2022) [50]	MOGA	N/A	Catalyst configuration (Ni/A_l2O₃ or Pt/A_l2O₃)	H₂ yield, maximum wall temperature	Optimized catalyst configuration has 46% increase in H₂ yield and 27% increase in CH₄ conversion.
ATR	11	Gul et al. (2023) [51]	LM, BR, SCG	N/A	Concentration of CH₄, CO, CO₂, H₂, H₂O, and N₂, CaCO₃ and CaO (solid phase), reactor temperature	H₂ yield, CO₂ capture efficiency, H₂ purity, CH₄ conversion	Sorption-enhanced autothermal reforming (SEATR) process achieved 97% H₂ purity (compared to 66% in conventional ATR) and 94% CH₄ conversion (compared to 77% in conventional ATR).

¹ The hydrogen can be produced via coal gasification or SMR.

Table 4. Examples of gray hydrogen production.

No.	Reference	Algorithms	Dataset	Inputs	Output(s)	Key Findings
1	Chavan et al. (2012) [53]	MVR, ANN	106	Fixed carbon, volatile matter, mineral matter/ash content, air feed per kg of coal, steam feed per kg of coal, bed temperature	Gas production rate, heating value of the product gas	ANN models outperformed MVR models. The air feed rate was the most influential factor for both gas production and heating value.
2	Patil-Shinde et al. (2014) [54]	GP, ANN, PCA	36	Fuel ratio, ash content, specific surface area of coal, activation energy of gasification, coal feed rate, gasifier bed temperature, ash discharge rate, air/coal ratio	CO + H₂ generation rate, syngas production rate, carbon conversion, heating value of syngas	Both GP and ANN models performed well, with R² between 0.920 and 0.996. Air/coal ratio, temperature, ash discharge rate, and coal feed rate were the most influential inputs.
3	Azzam et al. (2018) [55]	ANN-GA	2000	Reaction temperature, pressure, catalyst diameter	CH₄ conversion, CO₂ conversion, H₂/CO ratio, molar percentage of solid carbon	ANN and GA provide accurate and efficient optimization; high temperatures favor DRM performance but increase catalyst degradation.
4	Alsaffar et al. (2020) [56]	ANN-MLP	30	Gas hourly space velocity, O₂ concentration in the feed, reaction temperature, CH₄/CO₂ ratio	H₂ yield, CH₄ conversion	The best-performing ANN architecture was 4-9-2, achieving a sum of squares error of 0.076 and R² > 0.9.
5	Le et al. (2021) [57]	ANN-DE	100	Hydrocarbon type, catalyst composition, reaction temperature, support material properties, process conditions	Hydrocarbon conversion, H₂ yield, catalyst stability	Hydrocarbon type affects H₂ yield. The best ANN model had MSE < 0.05 and relative error < 3.36%.
6	Byun et al. (2021) [58]	SVR, DT, GPR	10,000	Number of reactors, temperature, H₂ permeance, membrane area, sweep gas flow rate, steam-to-carbon ratio, compressor capital cost, labor cost, natural gas cost, electricity cost	H₂ production rate, CO₂ emission rate, unit H₂ production cost	Reactor count and operating temperature have the strongest influence on hydrogen production. The GPR model outperformed SVR and DT.
7	Ayodele et al. (2021) [59]	ANN (RBFNN and MLP)	30	Gasification temperature, rubber seed shell particle size, high-density polyethylene particle size, amount of plastic in the mixture	H₂ production	One-layer MLP showed the best performance with an R² of 0.990 and the lowest sum of squares error.
8	Ayodele et al. (2021) [60]	ANN-MLP	17	Methane partial pressure, steam partial pressure, reaction temperature	H₂ yield and CH₄ conversion	ANN with 3–17–15–2 structure provides the best prediction for H₂ yield (R² = 0.997) and CH₄ conversion (R² = 0.996).
9	Hong et al. (2022) [61]	DNN-PSO	10,514 for operation, 10,000 for simulation	Natural gas feed flow rate, demineralized water flow rate, air flow rate, natural gas fuel flow rate, PSA recovery rate, system pressure, off-gas pressure, SMR reactor inlet/outlet temperature, LTS reactor inlet temperature, air-to-fuel ratio	H₂ production, H₂ purity, thermal efficiency, CO₂ emission, SMR conversion efficiency	The hybrid DNN model achieves an R² score of 0.94; higher thermal efficiency comes at the cost of higher CO₂ emissions.
10	Chen et al. (2022) [62]	NNs	60	Inlet temperature, steam-to-carbon ratio, Reynolds number	H₂ yield, methanol conversion	Steam-to-carbon ratio has the most significant impact on H₂ yield; NNs achieve high prediction accuracy.
11	Kim et al. (2022) [63]	ANN	419	A total of 34 input features (catalyst composition, operating conditions, catalyst preparation conditions)	CO conversion	The ANN model predicts one-pass CO conversion with high accuracy (R² = 0.997). The best-performing catalysts include Pt/Co(10 wt%)/Al₂O₃, Pt/Co(20 wt%)/Al₂O₃, and Pt/Ce(5 wt%)/TiO₂.
12	Liu et al. (2022) [64]	GBR, RF, SVR, DT, ANN, ABR	3536	Elements (C, H, O, N, S), moisture, ash, volatile, fixed carbon, temperature, concentration ratio, equivalence ratio, residence time	H₂, CO, CH₄, CO₂ gas yields	GBR was the most accurate model. Operating conditions (especially temperature and residence time) contributed 88.55% to gas yield predictions.
13	Huang et al. (2025) [65]	LR, RR, Lasso, ENR, DT, RF, GBR, ETR, XGBoost, KNN, MLP	1386	Temperature, steam-to-carbon ratio, oxygen-to-carbon ratio, pressure	H₂ yield, CO₂ yield, heat duty	XGBoost outperformed all other models. Temperature was the most influential factor for H₂ yield.

Table 5. Comparison of electrolyzer technologies for green hydrogen production.

Electrolyzer Type	Electrolyte	Temperature (°C)	Efficiency (%)	Advantages	Challenges
AEL	KOH or NaOH solution	60–80	65–75	Low-cost, mature technology	Low current density
PEM	Solid polymer membrane	50–80	75–80	Fast response, compact	Uses expensive catalysts (Pt, Ir)
SOEC	Ceramic oxide	700–1000	80–85	High efficiency, uses heat	High degradation, expensive materials

Table 6. Selected examples of green hydrogen production.

Category	No.	Reference	Algorithms	Dataset	Inputs	Output(s)	Key Findings
Water electrolysis	1	Li et al. (2021) [74]	NNs, RF, SVR	718	Feedstock composition, operational conditions (temperature, pressure, reaction time, solid content), catalyst properties	H₂ yield, CO₂ yield, CH₄ yield, CO yield	NNs outperformed RF and SVR in optimizing H₂ production from supercritical water gasification.
	2	Sezer and Özveren (2021) [75]	ANN-LM	370,656	Carbon content, H₂ content, O₂ content, gasifier temperature, steam flow rate, fuel (biomass) flow rate	H₂ mole fraction in syngas, total exergy value of syngas	The ANN model achieved high accuracy (R² = 0.9999 for training and test data sets).
	3	Haq et al. (2022) [22]	GPR, ETR, ANN, SVM, GA	125	Proximate analysis of sewage sludge, ultimate analysis of sewage sludge, supercritical water gas operation conditions	H₂ yield, CO₂ yield, CH₄ yield, CO yield	The GPR model is the best for predicting H₂ yield; temperature is the most influential factor for H₂ production.
	4	Saadetnejad et al. (2022) [76]	RF, DT	549	Photocatalyst properties (semiconductor material, band gap energy, co-catalyst type and loading) and reaction conditions (temperature, pressure, CO₂/H₂O molar ratio)	Band gap energy of the photocatalyst, total gas production rate	Best semiconductors for gas-phase CO₂ reduction are CeO₂, SrTiO₃, ZnS, ZrO₂.
	5	Cheng et al. (2023) [77]	SVM, Prophet	9840	Temperature, atmospheric pressure, relative humidity, cloud cover, precipitation, fixed month index, full timestamp and time-series structure (for Prophet only)	Hydrogen production	ML is effective for regional hydrogen production forecasting, especially when integrated with local climate data. SVM outperformed Prophet.
	6	Yang et al. (2024) [78]	ELM, RF, SVM, GA, LSTM, RBF, BPNN	1095	Solar irradiance, temperature, sunshine hours	Photovoltaic (PV) power generation, H₂ production	LSTM performed best with R² = 0.8402. HJT PV technology produced most H₂ with the lowest cost.
	7	Babay et al. (2025) [79]	SVR, RF, MLP, LSTM-CNN	N/A	Solar irradiance, ambient temperature, photovoltaic (PV) panel temperature, panel type, seasonal data	H₂ production	Polycrystalline panels showed higher H₂ output than monocrystalline and amorphous silicon. RF gave the best accuracy.
Biomass or organic waste	8	Salah et al. (2016) [80]	NARX	N/A	Mass flow of fuel and steam into gasifier, fuel mass and air and O₂ flow into regenerator, continuous and discontinuous mass flow from generator	Product gas flow rate, temperature and pressure of gasifier, temperature and pressure of regenerator	The model achieved low prediction errors, demonstrated real-time adaptability, and helped find the key operating parameters.
	9	Krzywanski et al. (2018) [81]	ANN-GA	25	Reactor type, CaO/C mole ratio, H₂O/C mole ratio, reaction temperature	Volumetric H₂ concentration in syngas	Developed [4–3–3–1] ANN-GA model predicted H₂ concentrations with high accuracy: <±8% relative error.
	10	Ozbas et al. (2019) [82]	LR, KNN, SVR, DT	2036	Time, temperature, concentration of CO, CO₂, CH₄, and O₂, higher heating value of syngas	Hydrogen concentration in syngas	LR has the highest accuracy with R² = 0.99. The highest H₂ concentration in syngas reached 35% vol.
	11	Torky et al. (2023) [83]	MobileNet-CNN, Xception-CNN, DNN, Mask-RCNN	23,628	Image data, waste characteristics (material type, waste category, physical properties, environmental conditions), estimated weight parameters (volume, density)	Waste classification (recyclable, organic, or harmful), estimated waste weight (dry or wet), H₂ production	MobileNet-CNN achieved 93% accuracy for waste classification and 98% accuracy in distinguishing dry vs. wet organic waste.
	12	Gil et al. (2023) [84]	GPR	30	Process parameters (temperature, steam-to-air ratio, stoichiometric ratio, steam-to-biomass ratio), biomass properties (C%, H%, O%, ash content)	H₂ vol%, CO vol%, CH₄ vol%, gas yield, combustible gas concentration	The GPR model achieved high predictive accuracy, with R² values ranging from 0.82 to 0.98 for different gasification parameters.
	13	Meena et al. (2023) [85]	ANN, SVM, DT, RF, GB	N/A	Process parameters (temperature, equivalence ratio, steam-to-biomass ratio, pressure), biomass properties (C%, H%, O%, ash content, volatile matter), type of gasifying agent, catalyst type, time	H₂ vol%, CO vol%, CH₄ vol%, CO₂ vol%, syngas heating value, syngas yield, tar content, gasification efficiency	RF and GB models showed the highest accuracy, with R² values exceeding 0.95 for predicting H₂ yield and syngas composition.
	14	Pan et al. (2024) [70]	GBR, RF, DT, KR, GA	458	Feedstock composition (C, H₂, N₂, O₂, S, volatile matter, fixed carbon, ash, moisture %), temperature, biomass/coal blending ratio, equivalence ratio, gasifying agent type	Syngas yield, H₂, CO₂, CH₄, and CO₂ content, syngas lower heating value	GBR showed the highest accuracy (R² up to 0.99) for predicting syngas composition and heating value.
PEC	15	Oral et al. (2022) [86]	ARM, RF, DT	10,560	Electrode materials, synthesis methods, doping elements, co-catalyst, second-layer materials, calcination conditions, electrolyte type and pH, irradiation conditions, applied bias voltage	Band gap energy, photocurrent density	ML successfully identified patterns and optimized conditions. RF performed well in predicting band gap energy. ARM and DT helped identify key parameters for enhancing PEC efficiency
	16	Tajima et al. (2024) [87]	SVR, GPR, DT	75 (Fe₂O₃), 32 (BiVO₄), 58 (WO₃/BiVO₄)	X-ray diffraction, Raman spectroscopy, UV/Vis absorbance, photoelectrochemical impedance spectroscopy	Photocurrent density	GPR achieved highest prediction accuracy across all tested photoanode materials (hematite, BiVO₄, and WO₃/BiVO₄) with R² among 0.85–0.99, even for small datasets (30–70 samples).
	17	Sahu et al. (2024) [88]	MLP, ABR, RR, ENR	2593	Band gap of photoelectrode material, working electrode area, light intensity, power of light source, pH, filter condition, molarity	Photocurrent density	The hybrid model ABR + MLP performs best with R² = 0.9686.
	18	Mishra et al. (2025) [89]	KNN, RF, AdB, GBR, XGBoost	85	Material properties (Shannon ionic radius, density, electronegativity, etc.), experimental conditions (light density, applied bias voltage, preparation method)	Band gap energy, photocurrent density	The XGB model performed best for both band gap prediction and photocurrent density prediction.
Biohydrogen	19	Taheri et al. (2021) [90]	ANN, ANFIS	119	Organic loading rate, effluent pH, mixed liquor suspended solids, mixed liquor volatile suspended solids	Transmembrane pressure	ANFIS slightly outperformed ANN (R² = 0.93 vs. R² = 0.88).
	20	Hosseinzadeh et al. (2022) [91]	GBM, SVR, RF, AdaBoost	210	Acetate (A), butyrate (B), A/B ratio, ethanol, iron, nickel, pH, biomass proportion, hydraulic retention, chemical oxygen demand	H₂ production (yield or rate) during dark fermentation	All four ML models showed high accuracy (R² > 0.88), with RF having the highest (R² = 0.902).
	21	Venkatesh et al. (2024) [92]	LSTM, Bi-LSTM	5600	Applied voltage, sequential input data	Current density (directly correlates to H₂ production rate)	Bi-LSTM outperforms LSTM in modeling and controlling biohydrogen production in a microbial electrolysis cell.

Table 7. Selected examples of pink, turquoise, white, and black/brown hydrogen production.

Category	No.	Reference	Algorithms	Dataset	Inputs	Output(s)	Key Findings
Pink hydrogen	1	Kim et al. (2022) [103]	CART implemented in Minitab software	NA	61 inputs (heat consumption at H₂ generation plant, electricity rating of SMRs, heat supplied to plants, operating years, tax rate, inflation, etc.)	H₂ production cost (USD/kg)	ML can identify key economic drivers in nuclear H₂ production. Heat consumption is the most important factor.
Turquoise hydrogen	2	Salimian and Grisan (2024) [104]	ResNet-50	4975	Plasma emission spectra (200–1100 nm) reshaped as 32 × 32 tensors	H₂ and CH₄ concentration	The model performed well in predicting CH₄ concentration but was less accurate for low H₂ concentrations.
Turquoise hydrogen	3	Wen et al. (2024) [105]	RF, XGBoost, DT, ADA, GBDT, LGB, KNN, SVR, Lasso, RR, ENR, MLP, MLR	2733	wt% of Fe, Ni, Cu, Co, Al₂O₃, SiO₂, TiO₂, MgO; calcination temperature, CH₄ concentration, gas hourly space velocity, reaction temperature and time	CH₄ conversion and H₂ yield	RF and XGBoost achieved the highest accuracy with R² = 0.9999 for CH₄ conversion and R² = 0.9996 for the H₂ yield model.
White hydrogen	4	Zhang et al. (2024) [106]	TINN	5041	Thermodynamic properties (critical temperature and pressure, acentric factor, mole fraction), process conditions (temperature, total molar density, pore radius)	Number of equilibrium phases, compositional mole fractions of gas and liquid phase	The TINN model achieves ~20× speedup in phase equilibrium computation compared to traditional iterative flash calculation methods.
Black hydrogen	5	Zhao et al. (2025) [107]	BP-MLP, SVR, MLR-RR, DT, RF, XGBoost, GPR	750	Coal composition (C, H, N, O, S, Cl, volatile matter, fixed carbon, ash content, moisture content), temperature, pressure, steam-to-coal ratio, oxygen-to-coal ratio	Syngas component proportions: H₂, CO, CO₂, CH₄, N₂, and others, hydrogen-to-carbon ratio	The BP-MLP showed the best performance. The steam-to-coal ratio, moisture content, and Cl content were the most influential features for H₂ prediction.
Black hydrogen	6	Ceylan and Ceylan (2021) [108]	SMOreg, GPR, Lazy K-Star, Lazy IBk, AMT, RF	106	Mineral matter, fixed carbon, volatile matter, air feed, steam feed, bed temperature	Gas yield, heating value	RF performed best with R² = 0.9998 for gas yield and R² = 0.9730 for heating value.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Du, X.; Gao, S.; Yang, G. Machine Learning Applications in Gray, Blue, and Green Hydrogen Production: A Comprehensive Review. Gases 2025, 5, 9. https://doi.org/10.3390/gases5020009

AMA Style

Du X, Gao S, Yang G. Machine Learning Applications in Gray, Blue, and Green Hydrogen Production: A Comprehensive Review. Gases. 2025; 5(2):9. https://doi.org/10.3390/gases5020009

Chicago/Turabian Style

Du, Xuejia, Shihui Gao, and Gang Yang. 2025. "Machine Learning Applications in Gray, Blue, and Green Hydrogen Production: A Comprehensive Review" Gases 5, no. 2: 9. https://doi.org/10.3390/gases5020009

APA Style

Du, X., Gao, S., & Yang, G. (2025). Machine Learning Applications in Gray, Blue, and Green Hydrogen Production: A Comprehensive Review. Gases, 5(2), 9. https://doi.org/10.3390/gases5020009

Article Menu

Machine Learning Applications in Gray, Blue, and Green Hydrogen Production: A Comprehensive Review

Abstract

1. Introduction

1.1. Background of Hydrogen Production

1.2. Incorporating Machine Learning with Hydrogen Production

1.3. Motivation of This Review

2. Overview of Machine Learning

2.1. Brief History of ML

2.2. Categories of ML

2.3. Common ML Algorithms

3. Blue Hydrogen Production and ML Applications

3.1. Blue Hydrogen Production

3.1.1. Steam Methane Reforming (SMR)

3.1.2. Autothermal Reforming (ATR)

3.2. ML Application for Blue Hydrogen

4. Gray Hydrogen Production and ML Applications

4.1. Gray Hydrogen Production Process

4.2. ML Application for Gray Hydrogen

5. Green Hydrogen Production and ML Applications

5.1. Green Hydrogen Production Process

5.1.1. Water Electrolysis

5.1.2. Biomass Gasification

5.1.3. PEC Water Splitting

5.1.4. Biohydrogen Production

5.2. ML Application for Green Hydrogen

6. Other Hydrogen Production Pathways and Their ML Applications

6.1. Pink Hydrogen Production

6.2. Turquoise Hydrogen Production

6.3. White Hydrogen Production

6.4. Black/Brown Hydrogen Production

6.5. ML Application Summary

7. Key Challenges, Opportunities, and Future Work

7.1. Key Challenges

7.2. Opportunities

7.3. Future Work

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI