1. Introduction
In recent years, the development of hydrogen-powered internal combustion engines (H2ICEs) has often been overshadowed by the growing dominance of battery–electric vehicles (BEVs). However, challenges related to battery range, charging infrastructure, and energy demand present significant obstacles to the widespread adoption of BEVs. In contrast, hydrogen-fueled internal combustion engines emerge as a promising alternative, capable of replacing fossil fuels while producing minimal greenhouse gas emissions. Given the continuous depletion of fossil fuel reserves and the rising costs of conventional fuels, the urgency to find alternative energy sources has never been greater [
1].
Hydrogen-fueled internal combustion engines are not a recent innovation, with their origins tracing back over two centuries. The concept was pioneered by François Isaac de Rivaz, who designed the De Rivaz engine in 1806. Despite this long history, modern advancements are still required to develop an efficient and reliable hydrogen combustion engine that can rival conventional fossil fuel-based internal combustion engines in terms of performance and cost-effectiveness [
2]. An example of the hydrogen-fueled engine is presented in
Figure 1.
Hydrogen can be utilized as an energy source in two primary ways: direct combustion in an internal combustion engine or as fuel in hydrogen fuel cells. The latter relies on hydrogen oxidation to generate electricity, with water as the sole byproduct [
4]. When used as a combustion fuel, hydrogen has several advantages, including a high flame speed and wide flammability limits. However, challenges such as pre-ignition, backfiring, and the high reactivity of hydrogen necessitate specialized engine modifications to ensure stable and efficient combustion [
2]. Additionally, hydrogen storage presents logistical challenges, as it must be kept under high pressure or in liquid form at cryogenic temperatures, increasing the complexity and weight of onboard storage systems.
Despite these challenges, hydrogen remains a highly attractive fuel due to its abundance and potential for carbon-free energy production. It can be synthesized through various processes, such as electrolysis, thermochemical conversion of biomass, and other renewable methods [
5]. The environmental impact of hydrogen production depends on the energy sources used, with the possibility of achieving zero-emission hydrogen production through renewable energy integration [
6]. Furthermore, hydrogen’s ability to be stored and transported provides a significant advantage over battery electric technology, enabling rapid refueling and reducing dependency on extensive electric grid infrastructure. Unlike other alternative fuels such as biodiesel or vegetable oils, hydrogen does not contain carbon, making it a superior choice in terms of reducing emissions [
1].
Hydrogen has gained considerable attention as an additive to conventional fuels in internal combustion engines (ICEs) due to its favorable combustion properties, which include high diffusivity, rapid flame speed, and the absence of carbon emissions [
7]. When blended with traditional fossil fuels such as gasoline and diesel, hydrogen can improve combustion efficiency, reduce pollutant emissions, and mitigate the long-standing trade-off between engine performance and environmental compliance.
The addition of hydrogen to ICE fuels has demonstrated significant potential in reducing harmful emissions, particularly carbon dioxide (CO
2), particulate matter (PM), and nitrogen oxides (NO
x) [
8]. Since hydrogen combustion does not produce CO
2, its inclusion in hydrocarbon fuels directly reduces carbon-based emissions [
9]. Additionally, hydrogen-enriched combustion enhances oxidation reactions, leading to more complete combustion and a decrease in carbon monoxide (CO) and unburned hydrocarbons (UHCs) [
10].
However, while hydrogen reduces CO
2 and PM emissions, its impact on NO
x formation is more complex. On one hand, lean-burn operations facilitated by hydrogen reduce in-cylinder temperatures, which can lower NO
x emissions [
11]. On the other hand, the higher combustion temperature of hydrogen compared to diesel or gasoline can lead to localized NO
x formation under certain conditions, necessitating the use of exhaust gas recirculation (EGR) and advanced after-treatment systems to mitigate this effect [
12,
13,
14]. Several experimental studies have confirmed that, with optimized engine control strategies, such as variable ignition timing and optimized injection strategies, NO
x emissions can be effectively controlled while maintaining the benefits of hydrogen-enriched combustion [
15].
The optimal hydrogen-blending ratio varies depending on engine type, operating conditions, and fuel characteristics. Studies have explored a wide range of hydrogen enrichment levels, typically ranging from 5% to 30% by volume, demonstrating that moderate hydrogen fractions yield the best balance between efficiency improvements and emissions reductions [
16,
17]. Excessive hydrogen concentrations, however, can lead to premature ignition, increased pressure rise rates, and knocking, particularly in SI engines, highlighting the need for precise fuel metering and real-time engine control systems [
18,
19].
Advanced hydrogen injection techniques, including port fuel injection (PFI) and direct injection (DI), have been investigated to optimize mixture formation and combustion phasing [
20]. Dual-fuel combustion strategies, where hydrogen is co-fueled with diesel in compression–ignition engines, have also shown promise in reducing particulate emissions while maintaining high efficiency [
21]. Recent advancements in machine learning algorithms have enabled real-time optimization of hydrogen blending and injection strategies, paving the way for further enhancements in hydrogen-assisted combustion [
22].
One of the key challenges in ICE optimization is the complexity associated with understanding the interactions among numerous input variables. The presence of nonlinear relationships between engine parameters, such as fuel mixture composition, ignition timing, and operating conditions, complicates the prediction of optimal performance metrics. Moreover, data acquisition systems in dynamometer test rigs often introduce noise, further hindering accurate correlation analysis. Traditional modeling techniques struggle with these uncertainties, necessitating the adoption of more robust predictive tools, such as machine learning algorithms [
23,
24,
25,
26].
Milojevic S. et al. have effectively highlighted the importance of reducing emissions in internal combustion engines through various optimization techniques, including tribological advancements and variable compression ratio systems [
14]. However, it is also worth noting that the use of hydrogen as a fuel or additive presents a promising avenue for further minimizing emissions of toxic and harmful combustion byproducts. Hydrogen-enriched combustion has been shown to enhance the efficiency of fuel utilization while significantly reducing particulate matter (PM) and nitrogen oxides (NO
x), thereby aiding compliance with increasingly stringent emission regulations. The possibility of producing hydrogen from renewable energy sources, as mentioned in the article, is particularly relevant in the context of alternative fuels. When sourced sustainably, hydrogen can serve as a key element in achieving cleaner combustion in diesel engines, especially during the transition toward electrification. This aligns with the broader objective of mitigating air pollution while maintaining the operational advantages of internal combustion engines in sectors where they remain indispensable.
Hydrogen-fueled internal combustion engines (H2ICEs) present several challenges impacting their development and integration into vehicles. One significant issue is hydrogen embrittlement, where hydrogen atoms diffuse into engine materials, leading to reduced ductility and potential component failure. This phenomenon is particularly concerning for high-strength steels and certain alloys used in engine construction. Studies have shown that materials like cast iron, which contain graphite, can accommodate diffused hydrogen and delay embrittlement effects [
27]. Another challenge is the storage and handling of hydrogen fuel. Hydrogen’s low energy density necessitates high-pressure storage systems, which can be prone to leakage due to hydrogen’s small molecular size. This leakage not only poses safety risks but also contributes to material degradation over time [
28]. Combustion-related issues also arise with H2ICEs. Hydrogen’s wide flammability range increases the risk of pre-ignition and backfiring, which can damage engine components. Additionally, while hydrogen combustion does not produce CO
2, it can lead to significant NO
x emissions due to high combustion temperatures, necessitating advanced emission control technologies [
29]. System integration poses further difficulties. The unique properties of hydrogen require modifications to fuel injection systems, cooling mechanisms, and ignition controls. For instance, hydrogen’s low lubricity can cause increased wear in fuel pumps, as traditional lubricants may contaminate the fuel [
30]. Economic and infrastructural factors also limit the adoption of H2ICEs. The production of green hydrogen is currently expensive, and the refueling infrastructure is underdeveloped, posing significant barriers to widespread implementation [
31].
Machine learning (ML) offers a significant advantage in modeling engine combustion behavior by adjusting input parameters such as speed, load, and spark timing (ST) to predict key engine performance indicators [
32]. As a result, ML can serve as a surrogate model incorporating sensitivity analysis and global optimization techniques to identify optimal engine operating conditions [
33,
34].
Numerous studies have explored the application of ML models for predicting engine-related parameters, where three commonly employed ML techniques, artificial neural networks (ANNs), Random Forest (RF), and Support Vector Regression (SVR), have demonstrated strong predictive capabilities [
35,
36]. These models effectively capture the nonlinear relationships between input variables and engine performance metrics.
Furthermore, the applicability of different ML models varies across different engine-operating conditions, emphasizing the need for comparative analyses to assess their predictive effectiveness [
37,
38]. Despite the growing use of ML in internal combustion engine (ICE) research, there is limited literature comparing the predictive accuracy of multiple ML models for engine combustion modeling [
39,
40].
Beyond internal combustion engine optimization, machine learning has also proven instrumental in enhancing the performance and sustainability of electric vehicles (EVs). By leveraging digital twin technology and predictive analytics, ML facilitates real-time monitoring and optimization of EV components such as battery management systems, charging infrastructure, and energy efficiency strategies. These advancements enable predictive maintenance, extending battery lifespan and reducing waste while optimizing power consumption for improved range and reduced environmental impact. Additionally, ML-driven traffic monitoring and autonomous driving technologies contribute to safer and more energy-efficient transportation systems, further supporting the transition to cleaner mobility solutions. Integrating ML into both hydrogen-enriched internal combustion engines and electric powertrains highlights its crucial role in advancing sustainable automotive technologies and reducing global carbon emissions [
41].
2. Literature Review
Banerjee et al. [
42] presented a comprehensive investigation into the integration of artificial intelligence-based meta-modeling strategies with optimization algorithms for enhancing the performance and emissions characteristics of hydrogen–diesel dual-fuel internal combustion engines. The research focuses on leveraging artificial neural networks (ANNs) for system identification and employing the Multi-Objective Particle Swarm Optimization (MOPSO) algorithm to achieve superior trade-offs between performance and emissions, particularly in compliance with the stringent EPA Tier-4 emission mandates. The study establishes the relevance of AI-driven meta-modeling techniques in addressing multi-objective calibration challenges associated with contemporary diesel engines. It underscores the significance of systematic optimization in tackling the emission-performance trade-off, a key constraint in dual-fuel combustion strategies. The research is structured around an experimental case study where a diesel engine was modified for hydrogen–diesel dual-fuel operation with Exhaust Gas Recirculation (EGR). The methodology integrates ANN-based system identification with MOPSO for calibrating engine parameters in a computationally efficient manner. A notable aspect of the study is its introduction of an Adaptive Merit Function (AMF), which serves as a constraint to enhance the efficacy of the optimization process. By imposing trade-off domains that surpass the best observed trade-off values during experimental testing, the study demonstrates a structured and systematic approach to optimizing hydrogen-enriched dual-fuel combustion. The optimization framework is validated through a series of experimental observations, comparing the obtained optimal solutions with empirical data. The results indicate significant improvements in emission reduction and performance enhancement, with reductions of 10.2%, 30.6%, 25.4%, and 9.4% in the emission-performance trade-off footprint at various load conditions. The study utilizes ANN meta-modeling as the core predictive framework for system identification and response surface approximation. The ANN models are structured using a Multi-Input Single-Output (MISO) topology, where separate neural network architectures are developed to predict key response variables, including NO
x-HC emissions (NHC), brake-specific fuel consumption equivalent (BSFCeq), and hydrogen energy share (HES). The ANN-training process employs a Levenberg–Marquardt backpropagation learning algorithm, ensuring accurate function approximation and generalization of input–output relationships. The optimization routine integrates these trained ANN models into the MOPSO framework to explore the optimal solution space efficiently. The MOPSO algorithm is used for multi-objective optimization by guiding the particle search process within the ANN-predicted design space, ensuring convergence toward an optimal emission-performance trade-off. The algorithm incorporates Pareto-dominance principles to maintain diversity in the solution space and employs sigma-based selection methods for leader particle assignment. Additionally, an unsupervised partitive clustering technique, based on the K-means algorithm, is implemented to categorize and evaluate the robustness of the obtained Pareto-optimal solutions. This clustering approach helps in systematically selecting the most desirable operating points while maintaining computational efficiency.
Optimization of the emission characteristics of a hydrogen-induced internal combustion engine operating with low-carbon biofuels using machine learning algorithms is presented in a paper by Bai et al. [
43]. The research integrates hydrogen induction with lemon peel oil (LPO) and camphor oil (CMO) as alternative fuels while employing a zeolite-based after-treatment system to mitigate emissions. The core objective is to predict key engine emissions, including carbon dioxide (CO
2), nitrogen oxides (NO
x), smoke, hydrocarbons (HCs), and brake thermal efficiency (BTE), using advanced ensemble learning models. The significant role of hydrogen in reducing carbon-based emissions is underscored. The experimental results demonstrate that hydrogen induction leads to a reduction in CO
2 emissions by up to 47% and smoke emissions by approximately 49%. These improvements are attributed to the superior combustion properties of hydrogen, which enhances flame speed and combustion completeness. However, the research also highlights the inevitable increase in NO
x emissions, a known drawback of hydrogen combustion due to its higher flame temperature. To counteract this effect, the study incorporates a zeolite-based after-treatment system, which effectively reduces NO emissions. This combination of hydrogen with biofuels and an after-treatment system presents a viable pathway for cleaner and more efficient internal combustion engines. Machine learning methodologies are implemented to predict engine performance and emission characteristics. The authors employ ensemble learning models, including Extreme Gradient Boosting (XGBoost), Light Gradient Boosted Machine (LGBM), CatBoost, and Random Forest (RF), available in the scikitlearn python library [
44], to establish predictive models based on experimental data. Among these models, CatBoost demonstrates the highest accuracy in emission prediction, followed closely by XGBoost and Random Forest (
Figure 2). The study also highlights the limitations of LGBM, particularly its tendency to produce less reliable predictions when trained on smaller datasets. The evaluation of these models is conducted using standard metrics such as R-squared (R
2), mean absolute error (MAE), Mean Absolute Percentage Error (MAPE), and Root Mean Square Error (RMSE), which confirm the robustness of the CatBoost and XGBoost models in predicting engine emissions with high accuracy.
Uludamar et al. [
45] investigates the role of artificial intelligence in predicting noise and vibration levels in a hydrogen-enriched diesel engine running on biodiesel and diesel fuel blends. The research employs three distinct AI models—Radial Basis Function Neural Network (RBNN), Adaptive Neuro-Fuzzy Inference System (ANFIS), and Least-Squares Boosting (LSBoost)—to forecast engine behavior under different hydrogen flow rates and biodiesel proportions. The study presents compelling evidence that AI-driven models, particularly RBNN and LSBoost, demonstrate high predictive accuracy, with R
2 values exceeding 0.99 for both noise and vibration prediction, as presented in
Figure 3.
Hydrogen injection through the inlet manifold appears to influence the acoustic and vibrational characteristics of the engine, which are critical parameters for assessing engine longevity and operational efficiency. The results suggest that hydrogen addition, when properly optimized, contributes to the reduction in noise and vibration, aligning with the broader goal of developing cleaner and more efficient internal combustion engines. However, while the study provides valuable insights into the mechanical response of the engine, it does not extensively discuss the direct impact of hydrogen on combustion thermodynamics or overall fuel efficiency, leaving room for further research. Artificial intelligence plays a crucial role in optimizing engine performance by enabling accurate and efficient modeling of complex nonlinear systems. The study demonstrates that RBNN and LSBoost outperform ANFIS in predicting engine noise and vibration, which can be attributed to their superior ability to capture intricate relationships within the dataset. The optimal configurations identified in the study, such as the spread parameter for RBNN and the number of learners in LSBoost, highlight the importance of hyperparameter tuning in achieving reliable predictions. The findings reinforce the viability of machine learning models in automotive applications, particularly for real-time monitoring, predictive maintenance, and system optimization. Given the high accuracy achieved, these AI techniques could be extended to broader applications, including combustion analysis, emission prediction, and fuel economy optimization.
The study conducted by Javed et al. [
46] focuses on the noise emission characteristics of a dual-fuel compression ignition engine (
Figure 4) using Jatropha methyl ester (JME) biodiesel, zinc oxide (ZnO) nanoparticles, and hydrogen (H
2) as fuel additives. The research integrates artificial intelligence (AI) techniques, specifically an artificial neural network (ANN), to model and predict noise emissions based on varying operational parameters. A simple ANN architecture is created, consisting of three layers: an input layer, a hidden layer, and an output layer. The findings highlight the synergistic role of hydrogen and nanoparticles in optimizing combustion characteristics while also addressing the environmental concerns of conventional diesel engines. The use of hydrogen as a fuel additive has been widely explored in internal combustion engine (ICE) research due to its high energy density and clean combustion characteristics. In the present study, the integration of hydrogen in dual-fuel mode resulted in significant modifications to the combustion process. The high flame speed of hydrogen contributed to improved combustion efficiency, reducing ignition delay and ensuring a more complete fuel burn. However, the study notes that excessive hydrogen flow rates may lead to increased peak combustion pressure, which correlates with elevated noise emissions. The results indicate that an optimal hydrogen flow rate of 1.5 L/min provided the best trade-off between noise attenuation and combustion efficiency. One of the novel aspects of this study is the application of ANN modeling to predict noise emissions under varying fuel compositions and operational conditions. The ANN model demonstrated exceptional predictive accuracy, with a regression coefficient of 0.9992, indicating a near-perfect correlation between experimental and predicted values. The use of the trainlm algorithm with tansig–logsig transfer functions enabled effective learning of noise emission patterns, significantly reducing the need for extensive experimental trials. This application of AI highlights its potential in optimizing fuel blends, reducing experimental costs, and improving engine performance by identifying optimal parameter combinations. The study provides valuable insights into the role of ZnO nanoparticles in noise attenuation. The inclusion of ZnO in the JME biodiesel blends improved combustion stability and reduced engine noise. The results show that B20JME40 (a 20% JME blend with 40 nm ZnO particles) exhibited the most effective noise reduction characteristics, particularly when combined with hydrogen. This reduction is attributed to enhanced atomization and catalytic effects of ZnO nanoparticles, which promote better fuel–air mixing and more efficient combustion. Moreover, the increase in the biodiesel percentage contributed to noise attenuation due to improved lubricity and damping characteristics. However, noise emissions were observed to increase with higher engine loads, likely due to increased peak pressures and mechanical forces within the engine.
Bai et al. [
47] presents an in-depth investigation into the application of machine learning algorithms for predicting the emission and performance characteristics of a wheat germ oil (WGO) and hydrogen dual-fuel internal combustion engine. Hydrogen’s impact on the combustion process in a WGO-fueled internal combustion engine is thoroughly explored in the article. The authors establish that hydrogen’s high flame speed, high calorific value, and broad flammability limits improve the combustion efficiency of the dual-fuel engine. The experimental results indicate that with a 15% hydrogen energy share, the peak in-cylinder pressure increases to 70.8 bar, demonstrating a clear enhancement over WGO alone. This is attributed to the rapid and more complete combustion of the hydrogen-enriched mixture, which leads to an increase in peak heat release and improved power output. Additionally, the brake thermal efficiency (BTE) of the engine shows marked improvement with hydrogen induction, with a peak efficiency of 29.87% at full load for a 15% hydrogen energy share. This increase in efficiency is crucial for making alternative biofuels like WGO viable, given their generally poor atomization and mixing characteristics compared to conventional diesel fuels. On the emissions front, the study highlights a complex trade-off associated with hydrogen induction. The inclusion of hydrogen significantly reduces hydrocarbon (HC), carbon monoxide (CO), and smoke emissions. Specifically, at full load, HC emissions drop from 257 ppm for neat WGO to 156 ppm with a 15% hydrogen energy share. Similarly, smoke opacity is reduced by 15% compared to WGO alone. These reductions are attributed to hydrogen’s cleaner combustion characteristics, which reduce particulate formation and incomplete oxidation of fuel molecules. However, nitrogen oxide (NO
x) emissions increase due to the elevated combustion temperature, reaching 1089 ppm at a 15% hydrogen energy share—a 33% increase compared to WGO alone. This trade-off remains a major challenge in hydrogen-assisted combustion and highlights the need for NO
x mitigation strategies such as exhaust gas recirculation (EGR) or selective catalytic reduction (SCR). The application of machine learning in this study adds a novel dimension to engine performance prediction. The authors employ four different machine learning models—multiple linear regression (MLR), Decision Tree (DT), Random Forest (RF), and Support Vector Machine (SVM), models available in the publicly available libraries [
44]—to predict the emission and performance characteristics based on experimental data. Their methodology involves training and testing the models using engine parameters such as brake power and brake-specific fuel consumption (BSFC) as independent variables. The findings suggest that MLR provides the most accurate predictions, with the highest R
2 value among the tested models. The validation process shows that MLR predictions closely align with the experimental results, outperforming SVM, RF, and DT in terms of accuracy (
Figure 5). This result is consistent with the observation that the experimental dataset exhibits strong linear correlations, making MLR a suitable predictive tool.
Hydrogen-enriched compressed natural gas (HCNG) engines are investigated [
48], including the performance and emissions characteristics, through experimental analysis and artificial neural network (ANN) modeling (
Figure 6). The incorporation of hydrogen into the HCNG fuel blend results in various performance and emissions-related modifications. The study identifies hydrogen as a key factor in extending lean burn limits, reducing brake-specific fuel consumption (BSFC) under lean conditions and influencing emission characteristics. Notably, the increase in hydrogen content facilitates a faster combustion rate due to hydrogen’s high diffusivity and flame speed, contributing to improved combustion stability. From the experimental findings, it is evident that hydrogen enrichment significantly alters combustion thermodynamics. At lean conditions, BSFC decreases as a result of enhanced flame propagation and superior mixing capabilities of hydrogen. However, at higher excess air ratios, this effect diminishes, indicating the necessity for optimized excess air ratios and ignition timing to fully exploit hydrogen’s benefits. Additionally, while torque output generally declines with hydrogen addition due to its lower energy density compared to conventional natural gas, the study reveals that at optimal ignition timing and λ values, hydrogen supplementation compensates for this loss through improved combustion efficiency. A key challenge associated with hydrogen enrichment is the increase in nitrogen oxides (NO
x) emissions, attributed to higher in-cylinder temperatures. The study suggests that this can be mitigated by adjusting excess air ratios and ignition timing. By employing retarded ignition timing, NO
x formation is reduced, which aligns with the broader goal of balancing efficiency improvements with emission control. A major contribution of the study lies in its successful implementation of ANN for predicting HCNG engine performance and emissions. The ANN model incorporates excess air ratio, manifold absolute pressure (MAP), ignition timing, and hydrogen blend percentage as input parameters, while output variables include BSFC, torque, NO
x, carbon monoxide (CO), total hydrocarbons (THCs), and methane (CH
4) emissions. The hidden layer consist of eight nodes. The use of ANN in this context is highly relevant, as traditional engine modeling approaches, such as zero-dimensional and quasi-dimensional combustion models, often require extensive computational resources and empirical calibration. The ANN-based approach offers a data-driven alternative that effectively captures complex nonlinear relationships between engine operating parameters and performance outcomes. The study employs a feed-forward backpropagation ANN architecture, with hyperbolic tangent activation functions and Levenberg–Marquardt learning algorithms. The training process optimizes the number of neurons in the hidden layers to minimize mean square error (MSE) and maximize the correlation coefficient (R). The results indicate that the ANN model achieves high prediction accuracy, with R values close to 1.00 across all parameters, demonstrating the robustness of the ANN approach in ICE performance modeling. The study provides crucial insights into how ANN can be leveraged to fine-tune combustion and emission characteristics. For instance, by systematically varying the input parameters in the ANN model, optimal operating conditions can be identified that achieve the lowest BSFC while maintaining acceptable emission levels. This approach is particularly advantageous for hydrogen-enriched fuels, where complex trade-offs exist between efficiency gains and NO
x emissions. Moreover, the ANN model facilitates real-time performance prediction, which can be integrated into electronic control units (ECUs) for adaptive engine control strategies. The ability to dynamically adjust ignition timing and air–fuel ratios based on ANN predictions can lead to real-world efficiency improvements and emission reductions in HCNG engines.
The study by Reddy et al. [
49] presents an in-depth experimental and computational analysis of hydrogen-enriched diesel combustion in a single-cylinder, direct-injection compression ignition (DI-CI) engine (
Figure 7). The research aims to optimize engine load and performance while reducing emissions by introducing hydrogen at varying proportions (5%, 10%, and 15%) into diesel fuel blends. The study further integrates artificial neural networks (ANNs) to model and predict engine behavior, demonstrating a high correlation between experimental and simulated data. An ANN is composed of three layers: the input layer, hidden layer, and output layer, with neurons facilitating interactions between them. In this experiment, the Levenberg–Marquardt method was integrated with the backpropagation algorithm. The significance of this work lies in the dual benefits offered by hydrogen supplementation—improved combustion efficiency and reduced emissions—coupled with the predictive power of artificial intelligence (AI). The integration of ANN allows for accurate forecasting of engine performance, facilitating optimization without extensive physical trials. One of the primary findings of the study is the substantial improvement in brake thermal efficiency (BTE) with the increase in hydrogen proportion. The highest efficiency gain is observed with DH15 (diesel + 15% hydrogen), reaching 31.8%, a 13.5% improvement over pure diesel (D100). This enhancement is attributed to the superior combustion characteristics of hydrogen, including higher flame speed and complete combustion, which result in increased power output and thermal efficiency. Another critical aspect of hydrogen supplementation is the reduction in brake-specific fuel consumption (BSFC). The study reports that DH15 achieves a 31.6% reduction in fuel consumption compared to pure diesel. The underlying reason for this efficiency gain is the higher calorific value of hydrogen, which allows more energy to be extracted from a smaller fuel quantity. Cylinder pressure and temperature also exhibit significant improvements with hydrogen addition. The peak cylinder pressure for DH15 increases by 21% compared to D100, reflecting enhanced combustion dynamics. This pressure rise is likely due to the rapid burning rate of hydrogen, which accelerates the overall combustion process. The integration of artificial neural networks (ANNs) is a notable feature of this research, enabling accurate prediction and optimization of engine performance. The ANN model demonstrates an excellent fit (R
2 > 0.95) for all engine parameters, including BTE, BSFC, NO
x, and smoke emissions. Such predictive capabilities offer significant advantages in engine design and tuning, reducing the dependency on labor-intensive experimental trials. The ANN model’s ability to learn complex relationships between independent variables (such as hydrogen proportion and engine load) and dependent performance parameters underscores the potential of AI-driven approaches in automotive engineering. The results demonstrate that ANN can be effectively utilized for forecasting engine behavior and optimizing fuel blends to achieve a balance between efficiency and emissions. The study highlights that hydrogen supplementation leads to a significant reduction in pollutant emissions, making it a cleaner alternative for CI engine operation. Smoke opacity, carbon monoxide (CO), and hydrocarbon (HC) emissions decrease as the proportion of hydrogen increases. This trend is primarily attributed to the absence of carbon in hydrogen fuel, leading to more complete combustion and reduced carbonaceous emissions. For the DH15 blend, the study reports a 10.5% reduction in smoke opacity, along with notable reductions in CO and HC emissions. This cleaner combustion profile underscores hydrogen’s potential as a sustainable fuel additive that can help mitigate air pollution from diesel engines. However, the study also observes an increase in NO
x emissions, a well-documented consequence of hydrogen combustion due to higher in-cylinder temperatures. NO
x emissions are reported to rise by 8.5% with DH15, which is a key challenge in hydrogen–diesel dual-fuel applications. The authors suggest that further optimization strategies, such as exhaust gas recirculation (EGR) or water injection, could help counteract this drawback.
This literature review highlights the effectiveness of machine learning (ML) in optimizing hydrogen-enriched internal combustion engines (H2ICEs) for improved performance and reduced emissions. Various ML models, including artificial neural networks (ANNs), XGBoost, and Random Forest, have been applied to predict and optimize parameters such as brake-specific fuel consumption (BSFC), brake thermal efficiency (BTE), and emissions. Studies demonstrate efficiency gains of up to 30% and significant reductions in CO2, HC, and particulate emissions, though NOx emissions tend to increase with higher hydrogen content. Solutions such as exhaust gas recirculation (EGR), zeolite-based catalysts, and adaptive ECU integration have been explored to mitigate this issue. ML-driven strategies enable real-time optimization, making hydrogen a viable alternative fuel, with future research needed to refine emission control and hybrid fuel applications.
3. Results and Discussion
The development of machine learning models for optimizing hydrogen-fueled internal combustion engines (H2ICEs) follows a standardized procedure observed in the reviewed studies, presented in
Figure 8. This workflow begins with data acquisition from experimental engine tests or numerical simulations, capturing key parameters such as combustion efficiency, emissions, and operating conditions. The collected data undergoes preprocessing, including noise reduction, normalization, and feature selection, to enhance model accuracy. Various machine learning techniques—such as artificial neural networks (ANNs), XGBoost, Random Forest (RF), and Support Vector Regression (SVR)—are then selected based on predictive performance. The chosen models are trained and validated using cross-validation techniques and optimized through hyperparameter tuning to prevent overfitting. Performance is assessed through key metrics like R
2, mean squared error (MSE), and mean absolute error (MAE), ensuring robustness. The validated models are subsequently integrated into engine control strategies, such as electronic control units (ECUs), for real-time optimization of fuel injection, ignition timing, and air–fuel ratio adjustments. Several studies further emphasize continuous learning and adaptive tuning, where models are periodically updated with new operational data to improve predictive accuracy and adaptability in real-world applications.
Table 1 presents a systematic comparison of various studies on hydrogen-fueled internal combustion engines (H2ICEs), focusing on key technical aspects, including fuel composition, engine geometry, application range, noise characteristics, engine performance, and dataset parameters. The comparison highlights the diverse approaches in hydrogen integration, ranging from dual-fuel hydrogen–diesel engines with exhaust gas recirculation (EGR) [
42] to hydrogen–diesel blends [
43] and hydrogen-enriched compressed natural gas (HCNG) engines [
48]. The studies also vary in their engine configurations, with some focusing on conventional compression ignition (CI) engines, while others explore modifications for hybrid hydrogen applications.
Noise characteristics have been analyzed in a limited number of studies. While Uludamar et al. [
45] and Javed et al. [
46] investigated the impact of hydrogen on noise and vibration, most other studies primarily concentrated on emissions and performance. Javed et al. [
46] found that the addition of hydrogen with ZnO nanoparticles reduced engine noise under certain conditions, though higher hydrogen flow rates led to increased combustion noise. Similarly, Uludamar et al. [
45] reported that hydrogen injection influenced vibration levels, suggesting potential applications in reducing engine noise.
The performance evaluation across studies reveals that hydrogen induction generally enhances brake thermal efficiency (BTE) and reduces carbon-based emissions. For instance, Bai et al. reported a 47% reduction in CO
2 emissions and a 49% decrease in smoke emissions when hydrogen was blended with low-carbon biofuels. However, increased NO
x emissions remain a recurring issue, with studies such as Reddy et al. [
49] and Bai et al. [
47] highlighting that higher hydrogen proportions lead to elevated NO
x levels due to increased combustion temperatures.
Overall, this comparison underscores the potential of hydrogen-enriched internal combustion engines while highlighting critical challenges such as increased NOx formation, storage limitations, and integration complexity. Future studies should focus on addressing these challenges through advanced emission control strategies, improved fuel injection techniques, and real-time AI optimization to enhance hydrogen combustion efficiency.
Table 2 presents a summary of the best performing models together with the R-squared performance measure [
51], whose value is calculated as shown in Equation (1). Unfortunately, statistics of the training datasets are not provided by the authors.
The presented findings demonstrate that various machine learning models, including XGBoost, artificial neural networks (ANNs), and multiple linear regression (MLR), exhibit exceptionally high R-squared values ranging from 0.98 to 0.99. While such values indicate a strong fit to the data and suggest that the models effectively capture variance, it is crucial to determine whether these results genuinely reflect predictive performance or if they are influenced by methodological limitations. Overfitting is a primary concern, particularly for complex models such as ANN and XGBoost, which may capture noise rather than meaningful underlying patterns if not appropriately regularized. When models are excessively complex and training data are limited, they can memorize the dataset instead of learning generalizable trends, leading to inflated performance metrics [
52].
Another critical issue is dataset bias, where the training data fails to represent real-world distributions, resulting in misleadingly high predictive accuracy. A small sample size exacerbates this problem, as limited observations may fail to encapsulate the true complexity of the underlying relationships, creating an illusion of model effectiveness. Additionally, data leakage poses a substantial risk, occurring when information from the target variable inadvertently influences the predictor variables. This can lead to an unrealistic increase in performance metrics and must be rigorously checked during model evaluation. The fact that all models yield nearly identical R-squared values also raises concerns regarding the problem’s complexity. Either the dataset is inherently simple, allowing all models to perform similarly well, or there are methodological inconsistencies, such as improper data partitioning or target contamination, which distort the results [
53].
To ensure the validity of these findings, it is imperative to assess model performance on an independent test dataset that has not been used for training or parameter tuning. Implementing cross-validation techniques can further help evaluate model robustness and prevent overfitting. A thorough dataset audit is also necessary to detect potential data leakage, ensuring that predictors do not contain unintended information about the target variable. If the dataset is small, expanding the sample size or applying data augmentation techniques could enhance the reliability of evaluations. Furthermore, relying solely on R-squared as a performance metric is insufficient. Complementary metrics such as mean squared error, mean absolute error, and adjusted R-squared should be incorporated to provide a more comprehensive assessment of model accuracy and generalizability. Without these additional validations, the high R-squared values observed in this study may not be indicative of true predictive power but rather an artifact of overfitting, biased data, or methodological flaws [
54].
Several factors can contribute to high R-squared values, some of which reflect strong model performance, while others suggest systematic errors in the modeling process. Overfitting is a prevalent issue in complex models like ANN and XGBoost, where the model not only learns the true underlying patterns but also captures random fluctuations within the training set, resulting in excellent in-sample performance but poor generalization to unseen data. Data leakage is another critical factor, as any unintentional inclusion of target-related information in predictor variables can artificially enhance model accuracy, rendering performance metrics unreliable [
55].
A high R-squared value may also arise from a dataset that lacks sufficient variability. When the range of target values is limited, or the input features are highly correlated with the dependent variable, models can achieve near-perfect fits without necessarily being robust. This issue is further exacerbated by small sample sizes, as models trained on limited data are prone to artificially inflated performance, failing to generalize beyond the observed data points. If the dataset lacks diversity or is not representative of real-world distributions, models may yield high R-squared values without actually being useful in practical applications [
56].
Another explanation for high R-squared values is the inherent nature of the problem itself. If the relationship between the predictors and the target variable is highly deterministic with minimal stochastic variation, even relatively simple models like multiple linear regression can achieve excellent predictive accuracy. However, in real-world applications, data are rarely free from noise or external influences, necessitating rigorous validation procedures to ensure that model performance is not being misrepresented [
53].
To determine whether a high R-squared value is meaningful, it is essential to employ additional evaluation techniques beyond standard goodness-of-fit measures. Cross-validation, independent test set evaluation, and alternative performance metrics, such as mean squared error and mean absolute error, should be used to verify the robustness of model predictions. By incorporating these validation strategies, it becomes possible to differentiate between truly effective predictive models and those whose high R-squared values stem from overfitting, data leakage, or sampling biases [
57].
Designing an experiment and gathering data for machine learning model training require a systematic approach to ensure high-quality, representative, and unbiased datasets. The foundation of an effective dataset begins with clearly defining the research objective, specifying the key variables, and understanding the relationships that need to be captured. In the context of hydrogen-enriched internal combustion engines, the experiment should be structured to cover a wide range of operating conditions, fuel compositions, and engine configurations [
58].
The selection of input variables is critical, as they directly influence the predictive power of machine learning models. Key engine parameters such as ignition timing, fuel injection pressure, air–fuel ratio, engine speed, and load conditions should be recorded with high precision. Additionally, emission outputs, thermal efficiency, brake-specific fuel consumption, and combustion characteristics must be systematically measured. Ensuring diversity in the dataset by collecting data across different environmental conditions, fuel compositions, and engine loads enhances model generalizability [
52].
Effective selection of input variables must be complemented by a robust data acquisition strategy to ensure accuracy and reliability in machine learning models. The precision of recorded parameters depends not only on selecting relevant variables but also on the quality of measurement techniques and data collection processes. Inconsistent or noisy data can undermine model performance, making it essential to implement systematic acquisition methods that maintain data integrity. By integrating high-resolution sensors and standardized recording protocols, the dataset can accurately reflect real-world engine behavior, ultimately enhancing model predictive capability [
59].
Data acquisition must be carried out using high-resolution sensors and advanced diagnostic tools to minimize noise and measurement errors. Standardized protocols for data recording should be established, ensuring consistency across multiple experimental trials. It is essential to conduct multiple repetitions of each test condition to account for variability and ensure statistical robustness. The dataset should be sufficiently large to prevent overfitting, allowing the machine learning model to learn complex patterns rather than memorizing noise [
54].
Preprocessing plays a vital role in improving data quality. Outlier detection and removal should be conducted to eliminate erroneous readings that could skew the model. Normalization or standardization of input features ensures that all variables contribute proportionately to the learning process. Handling missing values appropriately, whether through imputation or discarding incomplete records, is crucial for maintaining dataset integrity. Feature selection techniques should be employed to identify the most relevant predictors, reducing dimensionality while preserving critical information [
53].
Balancing the dataset is necessary to prevent bias in model predictions. If certain engine operating conditions are overrepresented, the model may fail to generalize well to underrepresented scenarios. Techniques such as stratified sampling or data augmentation can help achieve a well-distributed dataset. Splitting the dataset into training, validation, and test subsets is essential for evaluating model performance objectively. The training set should be sufficiently large to enable learning, while the validation set assists in hyperparameter tuning, and the test set provides an independent performance assessment [
55].
Real-time data collection strategies, such as implementing online monitoring systems, can enhance dataset quality by continuously updating the training data with new observations. This approach allows models to adapt to changing conditions and maintain predictive accuracy over time. Incorporating domain expertise in data labeling and validation ensures that the recorded values reflect realistic and meaningful engine behavior [
54].
A well-designed dataset should be subjected to exploratory data analysis (EDA) to uncover potential correlations, trends, and inconsistencies. Visualization techniques such as scatter plots, histograms, and principal component analysis (PCA) provide insights into data distributions and relationships between variables. Statistical methods, including regression analysis and hypothesis testing, can further validate dataset consistency before it is introduced to the machine learning model.
Ensuring reproducibility and transparency in the experimental design is fundamental for scientific rigor. Documenting all data collection procedures, sensor calibration processes, and preprocessing steps allows other researchers to replicate the study and validate the results. Open-access datasets and standardized benchmarks facilitate broader adoption of machine learning models in engine performance optimization [
53].
By adhering to these best practices, the resulting dataset will be comprehensive, high-quality, and well-structured, enabling the development of machine learning models that offer accurate, reliable, and interpretable predictions. Such an approach ensures that models are not only effective but also applicable in real-world scenarios, ultimately advancing the field of hydrogen-enriched internal combustion engines.
4. Conclusions
The integration of hydrogen as a fuel in internal combustion engines represents a promising avenue for achieving sustainable and low-emission transportation. The research findings analyzed in this study underscore hydrogen’s potential to significantly reduce carbon-based emissions while enhancing combustion efficiency. However, challenges such as NOx emissions, hydrogen storage complexities, and the need for specialized engine modifications remain critical barriers to widespread adoption. These issues necessitate the implementation of advanced control strategies and after-treatment solutions to mitigate adverse environmental impacts while maximizing efficiency gains.
The application of machine learning algorithms in optimizing hydrogen-enriched combustion has demonstrated substantial improvements in predictive modeling and engine performance tuning. Various machine learning techniques, including artificial neural networks (ANNs), XGBoost, and multiple linear regression (MLR), have been successfully employed to model complex nonlinear relationships between engine parameters and emissions. The consistently high R-squared values reported in multiple studies indicate the strong predictive capability of these models. However, concerns regarding potential overfitting, data leakage, and dataset representativeness must be addressed to ensure the robustness and generalizability of these models.
In terms of research findings, machine learning-driven optimization strategies have proven highly effective in refining hydrogen-assisted internal combustion engines by enabling real-time adjustments of key operating parameters. Experimental and computational studies validate the ability of these models to enhance fuel efficiency, reduce harmful emissions, and optimize ignition timing. Nonetheless, further improvements are necessary to address real-world challenges, such as fuel variability, transient engine conditions, and sensor inaccuracies that impact model accuracy.
Future research directions should focus on several key areas:
Integration of machine learning with electronic control units (ECUs): AI-driven adaptive control strategies should be further explored to dynamically adjust combustion parameters based on real-time sensor inputs, improving engine adaptability under varying conditions.
Hybrid approaches with renewable fuels: Investigating hydrogen blending with biofuels, synthetic fuels, or other renewable energy carriers could help improve sustainability while addressing storage and infrastructure challenges.
Advanced emission mitigation techniques: More research is needed on after-treatment technologies, such as exhaust gas recirculation (EGR), selective catalytic reduction (SCR), and water injection, to counteract the increase in NOx emissions associated with hydrogen combustion.
Large-scale implementation feasibility: Studies should assess the economic and logistical viability of hydrogen-powered internal combustion engines, including lifecycle analysis, supply chain optimization, and policy recommendations to encourage adoption.
Robust machine learning methodologies: Expanding dataset diversity, incorporating additional performance metrics, and validating predictive models through cross-validation and real-world testing will ensure AI-based strategies remain reliable and scalable.
Despite advancements in hydrogen-enriched combustion and AI-based optimization, practical implementation challenges persist. Hydrogen production, storage, and distribution infrastructure require significant improvements to support large-scale adoption. Furthermore, regulatory frameworks and policy incentives will play a crucial role in accelerating the transition toward hydrogen-powered internal combustion engines.
In conclusion, hydrogen-enriched internal combustion engines, complemented by advanced machine learning models, offer a promising solution to the challenges associated with fossil fuel dependence and environmental degradation. While significant progress has been made, continued interdisciplinary research is essential to overcome existing technical and logistical barriers. The synergy between AI-driven optimization and hydrogen fuel utilization holds great potential for revolutionizing internal combustion engine technology, contributing to a cleaner and more efficient future for transportation and energy systems.