Artificial Intelligence- and Machine Learning-Driven Strategies for Catalyst Design and Sustainable Chemical Processes

Bratovčić, Amra; Tomašić, Vesna

doi:10.3390/pr14121866

Open AccessReview

Artificial Intelligence- and Machine Learning-Driven Strategies for Catalyst Design and Sustainable Chemical Processes

by

Amra Bratovčić

^1,*

and

Vesna Tomašić

^2,*

¹

Department of Physical Chemistry and Electrochemistry, Faculty of Technology, University of Tuzla, Urfeta Vejzagića 8, 75000 Tuzla, Bosnia and Herzegovina

²

Faculty of Chemical Engineering and Technology, University of Zagreb, Marulićev trg 19, 10000 Zagreb, Croatia

^*

Authors to whom correspondence should be addressed.

Processes 2026, 14(12), 1866; https://doi.org/10.3390/pr14121866

Submission received: 2 April 2026 / Revised: 2 June 2026 / Accepted: 4 June 2026 / Published: 9 June 2026

(This article belongs to the Special Issue Feature Review Papers in Section "Chemical Processes and Systems")

Download

Browse Figures

Versions Notes

Abstract

The integration of artificial intelligence (AI), machine learning (ML), and computational modeling with experimental catalysis is reshaping materials design and chemical process development. Tailored heterogeneous catalysts including supported metals, zeolites, defect-engineered materials, and multi-element systems exhibit enhanced activity, selectivity, and stability through engineered active sites and porosity. AI and ML approaches enable predictive modeling, high-throughput screening, mechanistic insight, and rational catalyst design by linking synthesis conditions, structural features, and performance metrics across scales. Applications span CO₂ conversion, methane reforming, hydrogen production, polymer recycling, and photocatalysis, with platforms such as PHOTOREAC, QMOF, and PhotoCatDB facilitating the translation from laboratory experiments to reactor-scale processes. Hybrid strategies that combine mechanistic understanding with data-driven models improve interpretability, predictive accuracy, and process optimization. These advances underscore a paradigm shift toward data-driven catalysis, accelerating discovery, supporting sustainable chemical technologies, and emphasizing the role of human expertise in guiding responsible AI deployment.

Keywords:

catalysis; artificial intelligence; machine learning; heterogeneous catalysts; photocatalysis; CO₂ conversion; high-throughput screening; materials design; mechanistic modeling; sustainable chemistry

Graphical Abstract

1. Introduction

The global energy crisis, increasing climate change, and severe environmental pollution demand urgent solutions to reduce petroleum use, produce alternative chemicals and fuels, and develop sustainable chemical processes aimed at achieving carbon neutrality. Catalytic processes are the foundation of the chemical process industry and are exceptionally important for the development of modern society. It is estimated that catalysts and catalytic processes are used in more than 90% of modern chemical industrial processes. Key knowledge in chemistry, particularly organic synthesis and analytical chemistry, along with fundamental knowledge in chemical engineering, enables the continuous development, optimization, and improvement of chemical processes. The chemical engineering discipline has evolved continuously since its inception in the late 19th century, undergoing several transformations mainly driven by technological advances and societal needs. Today, we are witnessing further evolution driven by the application of artificial intelligence technologies. Given the scope of the problem and the broad application of chemical engineering in developing sustainable technologies, this work focuses on artificial intelligence (AI)- and machine learning (ML)-driven strategies for catalyst design and the development of sustainable chemical processes. Artificial intelligence is a broad field that includes various subfields such as machine learning and deep learning (DL). Although these concepts are related, they are sometimes mistakenly considered synonymous, even though they are based on different principles (Figure 1). Therefore, understanding these distinctions helps clarify the rapidly evolving AI landscape.

This review is organized into several chapters. The introduction outlines the challenges in catalyst design and the development of sustainable chemical processes in response to increasing demands caused by the energy crisis, climate change, and environmental pollution. The next chapter discusses the integration of AI and ML in catalytic research, with particular emphasis on the transformative role and impact on catalyst design, reaction optimization and multi-scale process engineering. After that, achievements in applying artificial intelligence and machine learning are presented, focusing on photocatalysis and the design of catalysts for CO₂ conversion. The following chapter addresses ethical issues, safety risks, and the responsible use of generative AI. In the final part of the review, challenges and guidelines for future research are discussed, and key conclusions are drawn.

1.1. Advances and Challenges in Modern Catalysis

Heterogeneous catalysis continues to advance through the design of multifunctional materials, such as supported metals, molecular sieves, and oxides, with growing emphasis on activity, selectivity, and control of reaction environments. Tailoring active sites and modifying porosity remain essential for processes such as dehydrogenation, hydroisomerization, epoxidation, and hydrogenation. Zeolites and bifunctional catalysts remain central due to their tunable acidity and ability to regulate supported metal properties [1]. These developments illustrate how improvements in material design and control of catalytic environments continue to refine the performance of heterogeneous catalysts. At the same time, they highlight a broader point: catalysis underpins a wide range of chemical technologies, and the design of effective catalytic systems requires navigating complex relationships between synthesis, structure, and function. This broader perspective motivates a closer examination of the central role of catalysis and the multifaceted challenges inherent in catalyst design.

1.2. The Central Role of Catalysis and the Complexity of Catalyst Design

Catalysis research, including biocatalysis, homogeneous catalysis, and heterogeneous catalysis, is central to advances in sustainable energy, materials, and pharmaceuticals. This field is inherently interdisciplinary, requiring collaboration among chemists, physicists, engineers, and computational scientists. Achieving efficient and sustainable catalytic processes requires a combination of experimental techniques, theoretical models, and advanced instrumentation.

To develop effective catalysts, it is essential to understand how the way they are made influences how they behave in a specific reaction. However, this relationship is complex. A catalyst’s performance depends not only on its chemical composition but also on many closely related factors involved in its preparation. The ability to measure and map how formulation affects performance is crucial for designing improved catalysts [2].

Despite its potential, applying supervised learning to catalyst development raises questions specific to this field. To build meaningful models, it is necessary to understand how the catalytic data itself affects the results. This is important because catalytic datasets have several constraints: they are typically small due to the high cost of experiments, the choice and range of variables must be defined by the researcher, measurement errors are unavoidable, and many experimental variables are strongly correlated. These correlations can be simple or highly nonlinear, depending on the underlying catalytic processes [3]. Taken together, these considerations emphasize that understanding catalytic performance requires more than identifying active materials. It requires a systematic ability to relate synthesis decisions to the structural and physicochemical features that govern reactivity. However, the multidimensional nature of catalyst formulation makes these relationships difficult to disentangle through experimentation alone. This challenge has created an opportunity for data-driven methods, particularly supervised machine learning, which can help clarify how specific synthesis parameters shape catalyst properties and, ultimately, catalytic behavior.

1.3. Catalysis for Sustainable and Renewable Technologies

Sustainable catalytic applications are gaining momentum, with progress in carbon dioxide utilization, methane reforming, hydrogen production, polymer recycling, etc. Examples include copper-doped tin oxide for carbon dioxide reduction, nickel–ceria (Ni-CeOx) bifunctional catalysts for methane reforming, and mixed ionic and electronic conducting reactors for hydrogen generation [1]. Advances in catalytic depolymerization, glycolysis, and other recycling strategies show how kinetic modeling and experimental validation can support environmentally responsible and industrially viable technologies.

Recently, many studies have focused on photocatalysis. Photocatalysis enables solar-driven chemical production, environmental remediation, and carbon-neutral strategies, but traditional design approaches face challenges, such as low efficiency and high discovery costs. Integrating computational materials science with machine learning helps address these barriers by providing insights into electronic structure, band gaps, surface reactivity, and microstructure–performance relationships. This combination enables the exploration of novel materials, microstructure optimization, and mechanistic understanding, thereby improving photocatalyst performance and expanding practical applications. Advanced computational methods, including density functional theory (DFT) and molecular dynamics (MD), provide detailed insights into electronic structures, band gaps, and surface reactions that are difficult to capture experimentally, while high-throughput screening (HTS) enables rapid evaluation of large candidate libraries, prioritizing promising materials for experimental validation [4].

High-throughput screening has accelerated the identification of promising photocatalysts, as demonstrated in studies exploring ternary organic heterojunction [5] photocatalysts for hydrogen evolution, two-dimensional Janus heterostructures for solar energy applications [6], and tens of thousands of photocathode candidates for carbon dioxide reduction [7]. Deep learning further enhances the capabilities of HTS by predicting key material properties, such as band gaps, charge separation efficiency, and light absorption, from large experimental and computational datasets. Neural network-based models, combined with feature selection and regression techniques, enable rapid identification of high-performance photocatalysts and reduce reliance on trial-and-error approaches. Notable applications include the design of perovskite oxides for photocatalytic water splitting, with models successfully predicting hydrogen production rates and optimal band gaps. Integrating deep learning with computational modeling and HTS establishes a dynamic, data-driven framework that accelerates the discovery, optimization, and mechanistic understanding of photocatalysts, supporting the development of next-generation materials for renewable energy and environmental applications.

1.4. Computational, Digital, and Data Infrastructure for Modern Catalysis

Modern catalysis and reactor design rely on computational models, digital platforms and well-structured data systems to connect experiments, simulations, and optimization in a faster, more reliable workflow [8,9]. In catalyst and reactor design, AI and ML can link atomistic chemistry, microkinetics, CFD, and process-scale models into one workflow and then use optimization algorithms to search for improved catalysts and operating conditions. More extensive information on challenges and opportunities for ML in multi-scale computational modeling can be found in the literature [10].

Supervised machine learning can directly identify which synthesis variables are most important, making it a valuable tool for guiding catalyst design. In practice, researchers often include intermediate properties that link formulation to performance. When these properties are clearly related to catalytic behavior, they are called catalyst descriptors. As pointed out by several researchers [11,12,13,14] catalyst descriptors are numerical representations of catalyst, reactant, and reaction properties that enable the prediction of catalytic performance (activity, selectivity, durability) using data-driven models. Generally, catalyst descriptors can be categorized by the kind of information they encode into: geometric structural descriptors [15], electronic descriptors [16], thermodynamic/kinetic descriptors [17], composition and materials descriptors, reaction or environment descriptors, data-driven or learned descriptors [18], and emerging spectroscopic descriptors [16]. With the aid of ML, descriptors play a central role in optimizing catalyst performance, elucidating the essence of catalytic activity, and predicting more efficient catalysts [19]. However, selection of the optimal input variables for ML is a very challenging task in developing accurate and interpretable models [20,21]. Additionally, descriptor choice must be mechanism-specific and application-specific. The most reliable design insights come from descriptors linked to the elementary step, the active sites, or the stability constraint that matters most in the target catalytic system [22,23]. A schematic representation of multidimensional spaces during ML-driven rational catalyst design, showing the relationship between the features of catalytic materials, their quantifiable descriptors, and performance metrics, is given in Figure 2 [24]. For new catalytic systems, computational screening is usually the most efficient way to identify these descriptors, while for well-established reactions, the key properties are already known.

Computational methods, including DFT, and multi-scale modeling deepen understanding of adsorption, reaction energetics, surface intermediates, and transport phenomena, supporting reactor design and process optimization [1]. Machine learning and AI have further accelerated catalyst design by enabling predictive modeling, rapid screening of materials, and analysis of complex reaction systems, bridging experimental observations with theoretical insights. These approaches are increasingly applied to model gas–solid flows, reaction kinetics, and mesoscale phenomena, supporting scale-up and process optimization [1]. The digitalization of research complements these computational advances by facilitating collaboration and data sharing among scientists. Online platforms and databases enhance transparency, reproducibility, and access to knowledge, fostering a global, interconnected community of catalysis researchers and accelerating discovery. In Germany, the National Research Data Infrastructure (NRDI) initiative [25] provides a standardized framework for research data management. Within this initiative, NFDI4Cat [26], the catalysis-focused consortium, promotes the adoption of FAIR (Findable, Accessible, Interoperable, and Reusable) principles [27]. Through partnerships with organizations, such as Chemistry Europe, NFDI4Cat supports the integration of FAIR practices and fosters a culture of digitalization and robust data stewardship [28]. The consortium employs advanced algorithms and standards including the Resource Description Framework (RDF), to ensure machine-readability and data quality. Its infrastructure includes a central repository, automated curation tools, and visualization interfaces, collectively improving interoperability and accelerating catalysis research [29]. Despite progress, establishing a standardized FAIR-compliant representation of catalysis data remains a significant challenge. Common data and metadata standards such as DataCite [30], PREMIS [31], CodeMeta [32], ExptML, and EngMeta provide useful foundations but often fail to capture the complexity of chemical workflows. XML-based schemas are similarly restrictive due to the diversity of methods and instrumentation used in chemistry. The Resource Description Framework (RDF) offers a more flexible alternative. RDF structures information as triples—subject, predicate, object—allowing rich, machine-interpretable descriptions [33]. In catalysis, the subject might represent a research step, measurement, substance, mixture, sample, or method; objects can be other resources or literal data. Predicates such as “has part”, “has numerical value”, or “is used in” express relationships, enabling interoperable data models suitable for complex chemical research.

High-quality, diverse datasets are essential for training robust machine learning models. Understanding dataset diversity, novelty, and redundancy guides efficient model development [34]. In low-data regimes, carefully designed descriptors remain practical, while multi-level learning, delta learning, and physics-inspired inductive biases help reduce data demands. Sampling methods, such as entropic sampling and self-learning population annealing enable efficient exploration of chemical space [35]. Accessibility is equally important, as databases must be usable by non-experts to broaden participation and impact. For example, while the QM9 dataset contains over 100,000 molecules with computed energies, deriving meaningful chemical properties requires domain knowledge, which presents barriers for AI practitioners. Intuitive web interfaces can lower these barriers and broaden participation [36]. Reliable metadata and provenance tracking are also essential. Tools such as AiiDA [37] and NoMaD [38] ensure reproducibility in materials simulations. Differentiating small high-accuracy datasets from large benchmark datasets helps prevent overfitting and promotes practical model deployment. Community-driven curation further enhances both data quality and reliability, ensuring that datasets remain valuable resources for developing accurate and generalizable machine learning models.

2. Artificial Intelligence in Chemical Discovery and Engineering

The integration of AI and ML into catalysis research has fundamentally transformed catalyst discovery and process optimization. Traditionally, catalyst development relied on empirical, trial-and-error methods, which were time-consuming and limited in exploring vast chemical design spaces. Recent advances in ML have enabled the extraction of complex structure–activity relationships, supporting predictive modeling of catalytic performance and the rational design of novel materials [39,40]. At the same time, the increasing focus on sustainability has driven the use of AI not only in catalyst discovery but also in optimizing chemical processes, including energy efficiency, emission reduction, and resource utilization [41]. Interested readers can find more detailed information on this topic in an excellent overview of several applications of AI and ML in analyzing catalytic performance, characterizing structures through spectroscopic data, developing kinetic and mechanistic models, and addressing transport limitations, as reported by Günay and Yıldırım [41]. However, despite rapid progress, the field remains fragmented, with significant challenges in data availability, model generalizability, and experimental validation.

Artificial intelligence is increasingly addressing challenges in retrosynthetic planning, catalyst design, reaction optimization, and autonomous experimentation. In retrosynthetic analysis, ML models, particularly transformer-based systems and generative models, have demonstrated the ability to predict reaction outcomes and propose synthetic routes with high accuracy [41]. In the domain of catalyst design, AI methods have been applied to uncover complex structure–property relationships and accelerate the discovery of new catalytic materials. Reaction optimization has similarly benefited from AI approaches, where techniques such as Bayesian optimization efficiently navigate multidimensional parameter spaces to identify optimal reaction conditions. Finally, the emergence of self-driving laboratories, integrating robotics with machine learning, has enabled fully autonomous experimental workflows, exemplified by systems such as ChemOS [42], which iteratively design, execute, and analyze experiments without human intervention. These tools help chemists navigate complex chemical spaces, refine reaction conditions, and discover new reactivity with unprecedented speed [43]. As pointed out by Li et al. [44], the future of the laboratory is envisioned as the integration of testing modules into automated workflows that provide real-time feedback on experimental results in significantly less time, while simultaneously improving the accuracy and efficiency of the studied processes.

AI now supports route planning for simple molecules and provides insights into complex natural products. Virtual library screening is promising but remains limited by scaffold diversity. AI has enhanced reaction optimization and is beginning to support autonomous experimentation using flow chemistry and robotics. Remaining challenges include limited domain knowledge in current models, high automation costs, and the need for chemical expertise to interpret outcomes. Future progress will depend on high-quality databases, intuitive tools, and strong collaboration between chemistry, computer science, and engineering.

Recent advances have produced chemistry-specific AI agents that enhance large language models with domain tools and structured workflows. Kangyong Ma [45] developed a chemical intelligent assistant using eight fine-tuned open-source large language models (LLMs) trained on 1.7 million chemistry instructions, achieving strong performance with Mistral NeMo. The system integrates molecular visualization, SMILES processing, and literature retrieval, and improves through continuous feedback.

Other specialized agents extend these capabilities. ChemCrow, developed by Kevin Maik Jablonka and colleagues [46], integrates GPT-4 with expert chemical tools for organic synthesis, drug discovery, and materials design. Coscientist, created by Daniil A. Boiko and collaborators [47], autonomously plans and performs scientific experiments through web search, retrieval, code execution, and robotic automation. ChemCrow and Google’s AI Co-scientist have shown promising results, but demonstrate limited reliability, because ChemCrow has had tool-integration problems and brittle behavior in some settings, while CoScientist is framed more as a hypothesis-generation aid than a fully autonomous, experimentally validated system [48]. CACTUS (Chemistry Agent Connecting Tool Usage to Science), introduced by Andrew D. McNaughton and his team [49], enhances reasoning and discovery by integrating LLMs with cheminformatics resources. Together, these systems demonstrate how tailored AI agents expand the usefulness of large language models in chemistry, broadening their practical applications and enabling more sophisticated problem-solving and molecular discovery.

Artificial intelligence is increasingly shaping chemical engineering in areas such as monitoring, process control, catalyst discovery, and product design. Hybrid models that combine data-driven and physics-based elements improve accuracy and interpretability. Recent developments include reinforcement learning for process optimization, Bayesian tuning strategies, transformer-based reactor control, and multimodal data fusion techniques [50].

Responsible use of generative AI in chemical engineering must follow established engineering ethics, including honesty, integrity, respect for life, public safety, and environmental protection [51]. The inherent risks of chemical processes demand rigorous oversight, and while large language models include basic safeguards, they require additional frameworks to ensure transparency, reliability, and safe use. Generative AI offers opportunities to automate flowsheet design, advance material and electrode development, and accelerate green technologies such as fuel cells and batteries. Challenges include limited machine-readable datasets, insufficient integration of chemical knowledge, and risks of generating unsafe or impractical designs. Human expertise and regulatory structures remain essential to ensure responsible adoption.

2.1. Advances in AI and ML for Catalyst Design, Reaction Optimization, and Multi-Scale Process Engineering

PHOTOREAC, a MATLAB-based tool introduced by Acosta-Herazo et al. (2020), provides an accessible platform for modeling slurry solar photocatalytic reactors, offering radiation-field calculations and kinetic modeling capabilities [52]. Although simplified, it helps bridge the gap between laboratory experiments and engineering-scale design.

Other material-specific advances include defect-engineered iron oxide catalysts for volatile organic compound oxidation, machine learning-guided discovery of multi-element reverse water–gas shift catalysts, artificial intelligence methods in geoscience, and data-driven design of single-atom catalysts [53]. Large datasets such as the oxidative coupling of methane database compiled by Mine et al. (2021) further demonstrate how machine learning can uncover key descriptors and enable extrapolative catalyst discovery [54].

In recent years, ML has moved beyond specialized fields to become increasingly integrated into everyday applications and scientific research, including chemistry and physics. Its potential in catalysis is particularly promising, given the complexity of catalytic systems, which span from atomic-level active sites to large-scale industrial reactors and underpin the production of over 90 percent of industrial chemicals. Traditional approaches, including mechanistic studies, empirical exploration, and first-principles modeling, have provided valuable insights but often struggle with the high dimensionality, nonlinearity, and multi-scale interactions of real-world systems. Machine learning offers a complementary strategy, enabling robust predictions without complete mechanistic knowledge and facilitating tasks, such as estimating adsorption energies and reaction barriers, optimizing operating conditions, and designing reactors. Recent developments include surrogate models trained on density functional theory (DFT) data for catalyst screening, graph-based learning for reaction network exploration, reinforcement learning for process optimization, and physics-informed machine learning and neural networks that embed fundamental scientific laws into model architectures. These hybrid approaches allow for reliable, physically consistent predictions, accelerating catalyst discovery and process optimization while generating interpretable insights that bridge data-driven methods with chemical theory [55].

Machine learning is a multidisciplinary and rapidly evolving field that develops algorithms capable of learning from data without explicit programming, drawing on expertise from computer science, statistics, mathematics, engineering, physics, chemistry, and neuroscience. It is generally categorized into several types based on how algorithms learn from data, which can be broadly divided into supervised, unsupervised, and reinforcement learning (Figure 3). Supervised learning is the most widely used category in catalysis, where models are trained on labeled datasets (input–output pairs). Unsupervised learning operates on unlabeled data to identify hidden patterns, structures, or clusters within the data. Reinforcement learning (RL) is an approach in which an agent learns to make decisions by interacting with the environment, receiving feedback based on its actions. In catalyst development, ML enables researchers to analyze large datasets, identify key descriptors, such as material composition, structure, synthesis methods, and physical properties, and predict target properties like activity, selectivity, and stability. By revealing complex relationships between features and catalytic performance, ML supports more efficient catalyst design and material screening. However, careful attention to best practices and potential pitfalls is essential to ensure reliable and accurate predictions [56].

A summary of the key advantages, disadvantages, and critical perspectives of the major AI/ML techniques is provided in Table 1.

The field of chemical engineering has continually evolved, driven by technological advances and societal needs, progressing from traditional unit operations to molecular simulation, nanotechnology, catalysis, and sustainability. Currently, artificial intelligence and machine learning (AI/ML) are driving another major transformation, reshaping how chemical engineers approach complex problems [50]. AI/ML applications are already providing tangible benefits, including enhanced process monitoring and control, accelerated drug and catalyst design, optimization of industrial processes, and the development of products with tailored properties. Despite these advances, challenges remain due to limited and noisy process data, the potential for model errors with safety and regulatory implications, and the need for interpretable models that integrate domain knowledge. Contributions in the Special Issue [50] highlight how AI/ML is being applied across multiple scales, from atomic and molecular systems to process and systems engineering, enabling fundamentally new approaches to both long-standing and emerging challenges in chemical engineering.

Key themes emerging from these studies include the integration of mechanistic insights into AI/ML models and the development of hybrid modeling strategies that combine first-principles understanding with data-driven discovery. Examples include convolutional neural networks enhanced with mechanistic knowledge for predicting gas adsorption in metal–organic frameworks, physics-informed transfer learning to reduce data requirements in process control, and symbolic regression methods for generating interpretable mathematical models. Other advances demonstrate the use of AI/ML for process design, optimization, and control, such as reinforcement learning for bilevel optimization, Bayesian optimization for autotuning controllers, and transformer-based models for reactor operation. Additionally, multimodal and multigranularity data fusion techniques allow models to integrate diverse datasets of varying quality effectively. Overall, these works show that AI/ML is not only enhancing predictive capabilities, but also providing deeper scientific insights and enabling more efficient, sustainable, and innovative solutions in chemical engineering as well as in the chemical process industry [50].

In many domains, AI/ML methods can significantly outperform classical methods. As illustrated in Table 2, AI/ML methods often offer significant improvements, such as catalyst screening that is 100 to 1000 times faster, a substantial reduction in required experiments and resources, higher prediction accuracy, shorter experimentation time, and improved yield and/or selectivity [58].

Despite the numerous advantages described above, the lack of transparency and interpretability of “black-box” models can lead to serious mistakes and dangerous decisions, which can hinder regulatory trust [14,59]. The “black-box” AI/ML models suffer from serious limitations involving interpretability, causality, reproducibility, generalization, and insufficient mechanistic understanding. Nevertheless, there are promising strategies to address these problems, including explainable AI (XAI), physics-informed machine learning, hybrid quantum chemistry–ML approaches, symbolic regression, and uncertainty quantification, among others [60]. However, the most scientifically valuable future direction will probably be a balanced integration of AI prediction, mechanistic chemistry, physical theory, and experimental validation. It is especially important to emphasize the need for validation of AI-based models. Myllyaho et al. [61] presented a systematic literature review of validation methods used to ensure the dependability and trustworthiness of practical AI systems, based on 90 primary studies. They concluded that AI-based models are generally validated in a limited but useful way. These models often perform well on benchmark data, simulations, or control trials, but fewer studies report rigorous validation in realistic, industrial settings with experiments, external datasets, or continuous post-deployment monitoring. It should also be pointed out that the fit between AI models and experiments is often good at the level of statistical prediction but weak when judged against causal, mechanistic, or operational experimental evidence. Therefore, AI-based models are predictively validated much more often than they are experimentally validated.

Table 3 summarizes methods for photocatalyst discovery, optimization, and reactor modeling. It highlights tools such as PHOTOREAC, derivative-free sparse identification (DF-SINDy), and ML frameworks applied to photocatalytic CO₂ reduction, multicomponent reactions, and catalyst screening. The main findings focus on accelerated prediction, mechanistic insights, and experimental efficiency. Key challenges include limited datasets, generalizability, model interpretability, and assumptions in reactor modeling.

Table 4 focuses on catalyst development and reaction optimization for applications such as volatile organic compound (VOC) oxidation, oxidative coupling of methane (OCM), reverse water–gas shift reaction, CO₂ reduction and H₂ evolution, single-atom catalysts (SACs) and hydrodesulfurization. Methods include ML models (version 0.23.1) (XGBoost (version 1.2.1), Random Forest), ML potentials, structural engineering, and high-throughput AI experiments. The main findings show that AI/ML can identify new catalysts, predict activity, and reveal structure–property relationships. Limitations include dataset quality, transferability, stability, and integration with sustainability metrics.

Table 5 covers materials discovery and process optimization using databases, ML, LLMs, and generative AI. Applications include MOF electronic property prediction, CO₂ capture materials, MOF synthesis optimization, and chemical process design. As an example of the successful application of ML in the development of advanced materials, Huang et al. [70] propose a data-driven MOF design strategy that links adsorption conditions, pore structure, site chemistry, and phosphate speciation to guide efficient phosphate removal and resource recovery. The findings demonstrate that AI/ML enables accelerated materials discovery, property prediction, inverse design, and process automation. Challenges include dataset diversity, interpretability, integration with experiments, and safety concerns in generative outputs.

Acosta-Herazo and colleagues (2020) [52] present PHOTOREAC, a MATLAB-based graphical application developed to model and simulate large-scale slurry solar photocatalytic reactors for water-treatment applications. The software integrates modules for two core functions: (i) a photon absorption-scattering module that computes the radiation field (using a six-flux model and a variant coupled with the Henyey–Greenstein scattering phase function) and (ii) a kinetic modeling module that fits experimental photodegradation data with multiple kinetic expressions. The application comes pre-loaded with a database of 26 experimental datasets (different pollutants, catalyst concentrations, and reactor types) and allows users to import their own data. Through three example cases the authors demonstrate how PHOTOREAC can estimate radiation-independent kinetic constants, compare different kinetic models, and analyze the influence of key operational parameters (such as reactor geometry, catalyst loading, and incident radiation). The authors argue that PHOTOREAC lowers the barrier for non-expert researchers to engage in photoreactor design and scale-up, by offering a more accessible alternative to full computational fluid dynamic (CFD) simulations. They also note several limitations, mainly the current restriction to using only one photocatalyst (i.e., titanium dioxide (TiO₂P₂₅)), and the reliance on simplified assumptions such as a well-mixed system and the absence of mass transport limitations. The authors further outline possible directions for future improvements. In the broader context of environmental photocatalysis, the tool addresses a recognized gap: while process and reactor modeling are vital for reactor design and optimization, many researchers focused on experimental work have limited access to dedicated simulation platforms. Thus, PHOTOREAC contributes to closing the gap between experimental photoreactor testing and engineering-scale design by offering a flexible, simplified, and user-friendly modeling approach [52].

While PHOTOREAC provides a practical and accessible platform for modeling and optimizing photocatalytic reactors, advancing catalyst performance itself requires a complementary focus on material design and mechanistic understanding. Recent studies have shown that structural engineering and defect modulation, as in Fe₂O₃-based catalysts, can significantly enhance activity and selectivity, yet challenges in stability and complex reaction environments persist [53]. Zhang et al. (2024) [53] present a comprehensive analysis of Fe₂O₃-based catalysts developed for the catalytic oxidation of toluene, emphasizing how structural engineering and defect modulation can significantly enhance their performance. The review identifies oxygen vacancies as the key active sites that facilitate lattice oxygen mobility and accelerate redox cycling during the Mars–van Krevelen mechanism, which dominates toluene oxidation over iron oxides. The authors discuss how strategies such as morphology control, heteroatom doping, and the construction of Fe₂O₃-based composites effectively increase the concentration of surface defects and improve electron transfer efficiency. Incorporating secondary metal oxides or supports was shown to enhance oxygen activation and lower the reaction temperature required for complete toluene conversion. Additionally, the paper highlights the potential of using waste-derived iron materials as precursors for sustainable catalyst synthesis. Despite these advances, challenges remain in achieving high stability and activity under low-temperature and complex-gas conditions. The authors advocate integrating advanced characterization and data-driven modeling approaches to guide the rational design of next-generation Fe₂O₃-based catalysts for efficient VOC abatement [53].

The study by Mine et al. (2021) [54] presents an updated dataset comprising 4759 experimental data points on the oxidative coupling of methane (OCM), compiled from the literature up to 2019. Using machine learning (ML) techniques, such as Extra Trees Regressor (ETR), eXtreme Gradient Boosting (XGBoost), and Random Forest Regression, the authors analyzed the dataset to identify key features influencing C₂ hydrocarbon yields. The ML models successfully predicted catalyst compositions that were not previously represented in the dataset, demonstrating the potential of ML for extrapolative catalyst discovery. The study also highlighted the significance of elemental features over direct catalyst compositions in model predictions, offering insights into the design of more effective OCM catalysts [54]. The integration of machine learning techniques has further expanded the toolkit for catalyst development, enabling predictive modeling, feature identification, and extrapolative discovery in systems such as oxidative coupling of methane [54] and multi-element catalysts for CO₂ conversion.

Beyond conventional machine learning, artificial intelligence approaches, such as neural networks and large language models (LLMs), are increasingly applied to the design, screening, and optimization of complex catalytic systems, including single-atom catalysts, offering mechanistic insights, high-throughput predictions, and guidance for rational catalyst design.

Wang et al. (2023) [67] introduced an extrapolative machine learning framework to accelerate the discovery of multi-element catalysts for the reverse water–gas shift reaction, a key process for converting carbon dioxide into carbon monoxide. By integrating data-driven prediction with iterative experimental validation, the authors evaluated approximately 300 catalyst compositions through 44 learning cycles and identified more than one hundred highly active candidates. A notable achievement of this approach was the discovery of an efficient platinum–rubidium–barium–molybdenum–niobium catalyst supported on titanium dioxide, in which niobium, a previously untested element, played a critical role, demonstrating the model’s capacity to extrapolate beyond known chemical spaces. The study used a sorted weighted elemental descriptor to represent catalysts based on fundamental elemental properties, enabling the model to predict high-performance combinations from limited training data. Feature-importance analysis further revealed that catalytic activity strongly correlates with parameters related to electronic configuration and oxygen affinity. Overall, this work exemplifies how machine learning can transcend traditional trial-and-error approaches in heterogeneous catalysis, offering a powerful route for the rapid identification and rational design of complex multi-element systems for carbon dioxide utilization [67].

Yu et al. (2025) [68] review the emerging role of artificial intelligence in the design, optimization, and application of single-atom catalysts. Single-atom catalysts are highly promising in electrocatalysis due to their atom-level dispersion, which enhances activity, selectivity, and stability, but their design and optimization are complex. Artificial intelligence, particularly machine learning and neural networks, has emerged as a powerful tool to accelerate SAC development by enabling high-throughput simulations, identifying key performance features, screening structural models, and predicting novel catalyst structures. AI-driven approaches extend the capabilities of DFT to larger and more complex systems, integrate experimental and computational data to construct predictive models, and simulate reactions under realistic conditions, including temperature, pressure, and solvent effects. By combining data-driven modeling with physical and chemical principles, AI not only interprets existing experimental results, but also guides the rational design of high-performance SACs, opening new avenues for innovation in energy conversion and electrocatalysis [68].

Building on the emerging applications of AI in single-atom catalyst design, broader studies have begun to systematically categorize the diverse AI methodologies available for catalyst discovery. While Yu et al. [68] focus on the role of AI in accelerating SAC development, Xu et al. [62] provide a broader perspective by surveying both classical machine learning techniques and advanced approaches, such as graph neural networks and large language models, offering a unified framework for AI-driven discovery across different types of catalytic systems. They categorize existing methods into four main groups: classical machine learning, generative and reinforcement learning, graph neural networks, and LLMs, providing a unified framework for both homogeneous and heterogeneous catalyst discovery. The survey highlights the strengths and limitations of each approach, discusses challenges in data representation and model scalability, and identifies future research directions, such as real-time catalyst discovery, integration of physical principles into machine learning models, and the development of comprehensive, continuously updated datasets. By providing a clear overview and roadmap, the authors aim to assist researchers in computational chemistry and computer science in navigating the evolving landscape of AI-driven catalyst discovery [62].

Beyond catalyst discovery, AI methods are also transforming our understanding of catalytic reaction mechanisms. By integrating experimental and computational data, machine learning enables accurate modeling of reaction kinetics, identification of key mechanistic features, and optimization of reaction conditions, addressing many limitations of traditional phenomenological approaches.

A deep understanding of catalytic reaction mechanisms is crucial for advancing chemical kinetics, but traditional phenomenological models have limitations, such as convergence to local minima, reliance on difficult-to-measure parameters, and high computational costs, particularly for complex catalyst structures or feedstocks. In recent years, machine learning has emerged as a powerful alternative, enabling data-driven modeling of catalytic systems with applications ranging from material and condition screening to mechanism classification and reaction rate analysis. ML approaches facilitate accurate kinetic parameter estimation, extraction of complex patterns from experimental data, and integration with molecular dynamics and optimization methods, overcoming many constraints of conventional techniques. While deep learning often requires large, high-dimensional datasets, simpler ML models can provide valuable insights with smaller, high-quality datasets. Key considerations in applying ML include model interpretability, generalizability, computational efficiency, and data quality. Establishing standardized benchmarks for dataset size and quality remains an important direction for future research. Overall, machine learning offers transformative potential in catalysis by enhancing the design, optimization, and mechanistic understanding of catalytic systems [73].

Taking this a step further, AI is not only a tool for modeling and prediction but can also actively collaborate with human researchers. Co-intelligence (CoI) as described by Ethan Mollick [74], exemplified by large language models, represents a new paradigm in which AI engages directly with human intelligence to assist in research design, problem-solving, and optimization, complementing both conventional ML and mechanistic modeling. Co-intelligence (CoI) arises from the collaboration of multiple individuals sharing diverse knowledge, and in scientific research, it can manifest through experimental–theoretical partnerships or human–robot interactions. In this context, LLMs represent a form of artificial intelligence capable of engaging with human intelligence (HI) to assist in research design and problem-solving. These models leverage natural language processing and generative AI to answer questions, perform optimization tasks, and deliver reasoning, with applications spanning catalysis, data mining, molecular and materials design, chemical space exploration, organic synthesis, property optimization, and education. Despite their potential, LLMs face limitations, including hallucinated outputs, difficulty with specialized scientific language, and dual-use concerns. In a recent study, an LLM was used to codesign a computational catalysis project with machine learning under minimal prompt engineering, demonstrating its ability to contribute meaningfully to high-level project design, workflow development, and evaluation, while also highlighting persistent flaws. This work illustrates the potential of LLMs to enhance research productivity and innovation in complex scientific domains, though further refinement and domain-specific integration are needed [63].

Extending beyond individual projects, AI and ML are increasingly applied across multiple scales in process systems engineering, from molecular-level reactions to full plant and supply chain operations. These multi-scale applications highlight the versatility of AI in integrating physical principles, human–AI collaboration, and generative modeling to optimize complex systems holistically. Srinivasan et al. (2025) [71] provide a comprehensive review of artificial intelligence and machine learning applications across multiple scales in process systems engineering, ranging from molecular and reaction levels to materials, processes, plants, and supply chains. The authors examine the utility of AI and ML at both the design and operational stages, emphasizing the distinct representational frameworks employed at different scales and the physical principles they capture, including equivariance, additivity, injectivity, connectivity, hierarchy, and heterogeneity. They highlight key AI techniques, including hybrid AI modeling, human–AI collaboration, and generative AI methods, and stress the importance of hyperparameter tuning in hybrid models, particularly with physics-informed regularization. The review also discusses human–AI interactions, distinguishing between human-complements-AI and AI-complements-human systems, and emphasizes model explainability through rule-based explanations, example-based reasoning, simplification, visualization, and feature relevance. Additionally, generative AI approaches, such as generative adversarial networks, graph neural networks, large language models and transformers are highlighted for their ability to leverage non-traditional process data, including images, audio, and text, in situations where high-quality labeled data are limited. Overall, the work underscores how AI and ML can enhance process design, optimization, and automation while addressing challenges related to data representation, model interpretability, and multi-scale system complexity [71].

At the material and reaction scale, AI and machine learning offer powerful strategies to accelerate experimental optimization, enabling more efficient discovery and performance tuning of catalysts and functional materials. Li et al. (2025) [64] present a dynamic machine learning-driven approach to optimize the microwave-assisted synthesis of photocatalysts for enhanced hydrogen peroxide production. Their methodology combines iterative cycles of machine learning analysis and experimental validation, allowing efficient optimization without large datasets. Applied to quercetin-based photocatalysts, the approach achieved optimal performance after only three iterations, resulting in significantly improved hydrogen peroxide production rates. The study demonstrates the effectiveness of few-shot machine learning strategies in guiding catalyst synthesis, highlighting a sustainable and efficient pathway for accelerating the development of high-performance photocatalytic materials [64].

Beyond optimizing synthesis conditions, machine learning can also establish direct, interpretable links between experimental measurements and catalytic performance, enabling predictive evaluation of materials from limited data. In this study [65], machine learning (ML) was employed to link infrared (IR) spectral signals of adsorbed species with macroscopic catalytic performance, providing a direct, interpretable, and transferable approach for catalyst screening. Using the photocatalytic NO oxidation reaction as a model system, the ML framework accurately predicted nitrate formation solely from IR signals of NO adsorption, and its generalizability was further demonstrated with a CaCO₃-decorated g-C₃N₄ catalyst. The model’s predictions aligned with mechanistic understanding, confirming the physical plausibility of the approach, and enabled a quantitative assessment of catalytic activity. Notably, the ML-driven method reduced experimental time by approximately 3.5 times, highlighting its efficiency and potential to extend traditional spectroscopic techniques for predictive catalyst evaluation.

2.2. AI in Catalyst Design and High-Throughput Platforms

Recent advances in artificial intelligence (AI) are transforming the landscape of catalyst design and synthesis. Machine learning (ML) methods, in particular, are revolutionizing traditional approaches by enabling rapid identification of catalytic materials, optimization of synthesis conditions, and automation of experimental workflows [65]. By efficiently processing large datasets, AI can uncover complex structure–property relationships and predict catalyst performance with remarkable accuracy. This capability allows researchers to explore extensive chemical spaces and accelerate the development of novel catalysts for applications ranging from energy conversion to environmental remediation. The integration of AI with high-throughput experimental platforms further enhances this process. Computational predictions can guide synthesis, while experimental results iteratively refine AI models, creating a closed-loop system that improves both reliability and scalability in catalyst discovery. The authors also highlight key challenges, including the need for robust models that generalize across diverse catalytic systems and the importance of incorporating sustainability metrics into catalyst evaluation. Addressing these challenges promises to usher in a new era of catalyst research characterized by accelerated discovery, optimized performance, and sustainable practices. Complementing experimental efforts, machine learning potentials (MLPs) provide near ab initio accuracy with significantly lower computational cost, enabling atomistic simulations of catalytic systems at previously inaccessible scales.

It should be pointed out that small data, bias, and heterogeneity are three of the biggest limits on AI-driven catalyst design, because they can make a model look accurate while still being unreliable for discovery or scale-up [75,76,77]. The consequence of small data is weak generalization to new catalyst compositions, supports, reaction conditions, or metrics that were not represented in the training set. Small-data settings also make model comparison unstable, as a few additional data points can substantially change rankings or feature importance. This is why catalyst AI often relies on transfer learning, active learning, or foundation-model-style pretraining to stretch limited data farther. Bias is particularly problematic in catalyst design because negative or failed experiments are often underreported, while successful results are more readily found in the literature. This means the model may misread the space of possibilities, inflate expected performance, and underestimate uncertainty for underexplored catalyst classes. Data heterogeneity implies that an AI model trained across heterogeneous sources may achieve high average performance but still fail badly in specific subdomains. In catalyst design, this is important because catalytic activity and selectivity are often extremely sensitive to context, so a model that ignores domain shift may recommend candidates that are not reproducible in a new laboratory or process.

2.3. Machine Learning and Vibrational Spectroscopy for Mechanistic Insights

The analysis also provided mechanistic insights into the role of reaction intermediates in nitrate formation. Specifically, the study revealed that NO/N₂O₂ promotes nitrate accumulation, whereas N₂O₄/NO₂ inhibits the process, likely due to site-blocking effects at high NO₂ concentrations [64]. These findings suggest that interactions between intermediates and free radicals are key to generating nitrate, while excessive NO₂ can impede activity. Overall, this work demonstrates that combining ML with vibrational spectroscopy not only accelerates experimental workflows but also allows the extraction of physically meaningful insights, offering a promising tool for understanding and optimizing catalytic reactions at the molecular level.

Beyond mechanistic insights from spectroscopic data, machine learning can be leveraged for high-throughput screening and prediction of catalytic activity in complex materials, linking structural descriptors to performance in a cost-effective and interpretable manner.

2.4. ML for Nitrogen Reduction Reaction (NRR) Dual-Atom Catalysts

Machine learning has proven effective in handling large datasets generated by DFT calculations, but linking fundamental descriptors to catalytic performance remains a challenge. In this study [78], the authors developed a cost-effective, high-throughput, and interpretable machine learning framework to identify the key factors governing NRR activity in M₁M₂@TiO₂ catalysts. Screening 378 catalysts produced 33 promising candidates, and XGBoost models were used to predict the free energy changes in essential reaction intermediates. Analysis with Shapley Additive Explanations (SHAP) revealed that the M₁- N-N bond angle and the M₂-N bond length are critical features that strongly influence catalytic activity. Four catalysts with energy changes below 0.3 eV in the potential-determining step were identified and confirmed by electronic structure calculations to have high intrinsic activity.

Importantly, interpretable ML methods such as SHAP allow researchers to connect structural features directly with catalytic performance, providing actionable insights for design and optimization.

2.5. SHAP Feature Analysis and Workflow for Dual-Atom Catalysts

Further, the study showed that XGBoost regression models accurately predicted stability metrics, including Eb, Ec, and ΔE, while classification models effectively distinguished active catalysts from inactive ones [78]. Shapley Additive Explanations-based feature interpretations highlighted the importance of fundamental structural properties in modulating active sites and achieving high nitrogen reduction performance. The proposed workflow provides a valuable reference dataset for experimental studies and establishes a general approach for using interpretable machine learning to guide high-throughput screening, optimize catalyst features, and accelerate the design of highly active dual-atom catalysts.

Beyond specific catalyst studies, AI and ML have been increasingly applied in heterogeneous catalysis more broadly, providing predictive models for adsorption, surface properties, and process optimization across industrially relevant systems.

Bokhimi (2021) [69] reviews the application of AI, particularly supervised learning, in heterogeneous catalysis, with an emphasis on hydrodesulfurization (HDS) processes. The study highlights the importance of defining input and output variables, which are essential for building predictive models and ensuring accurate and interpretable results. By identifying these variables, researchers and industrial practitioners can effectively apply AI techniques to new catalytic systems, facilitating the understanding of complex relationships between reaction conditions, catalyst properties, and performance outcomes. The paper presents several applications of learning machines in catalysis, including predictions of adsorption energies, surface areas, adsorption isotherms, novel catalyst design, and sulfur content in hydrodesulfurization products. Models trained on interactions of CO or hydrogen with metals and alloys achieved adsorption energy predictions with mean absolute errors of 0.15 electronvolts. Other models calculated surface areas of metal–organic frameworks more accurately than the Brunauer–Emmet–Teller (BET) model and predicted adsorption isotherms of thousands of zeolite structures more rapidly than Monte Carlo simulations. Additionally, AI models can suggest new catalysts based on atomic substitutions and forecast sulfur content based on experimental parameters such as temperature, pressure, hydrogen dosage, and initial sulfur concentration. These findings demonstrate that supervised learning can significantly accelerate catalyst design, process optimization, and mechanistic understanding in heterogeneous catalysis.

The integration of ML with large materials databases has further expanded capabilities, enabling efficient exploration of vast chemical spaces such as metal–organic frameworks (MOFs) and facilitating prediction of quantum and electronic properties for targeted applications.

2.6. MLPs for Heterogeneous Catalysis

Recent advances in heterogeneous catalysis increasingly rely on a combination of experimental observations and computational simulations to understand reaction mechanisms at the atomic level. Traditional ab initio molecular dynamics (AIMD) simulations provide detailed atomistic insights, but are limited by high computational costs, restricting system sizes and time scales. Machine learning potentials (MLPs) have emerged as a transformative solution, enabling simulations with ab initio accuracy at a fraction of the computational expense. By accurately learning reactive atomic interactions, MLPs make it possible to study catalytic systems under more realistic conditions, bridging the so-called complexity, materials, and pressure gaps in catalysis modeling. These methods enable simulations involving thousands of atoms over nanosecond timescales, capturing essential phenomena such as proton transfer, surface reconstructions, and nanoconfinement that are critical for understanding catalytic activity [79].

MLP-driven atomistic simulations have provided significant insights into catalytic processes at gas–solid and liquid–solid interfaces. They allow researchers to explore reaction mechanisms, evaluate solvent effects, and investigate the influence of defects and interfaces on catalytic performance under operando conditions. The dynamic behaviors captured by MLPs, including temperature effects and atomic rearrangements, are crucial for predicting realistic catalytic activity and guiding the rational design of new catalysts. Despite their advantages, careful selection of training data, validation of transferability, and appropriate electronic structure references remain essential for reliable predictions. The increasing accessibility of user-friendly MLP packages is expected to accelerate their adoption in catalysis research, enabling a shift from static models to dynamic, high-fidelity simulations that can inform the design, optimization, and development of more efficient and robust heterogeneous catalysts [79].

Similarly, photocatalysis benefits from data-driven approaches that accelerate material discovery and reaction optimization, enabling rapid exploration of complex chemical spaces while reducing experimental trial-and-error.

2.7. ML in Photocatalysis

Photocatalysis has emerged as a transformative approach with broad applications in environmental remediation, energy conversion, and chemical synthesis. However, optimizing photocatalysts is challenging, for example due to the complex interactions among material composition, light absorption, and surface reactivity. Traditional trial-and-error experimentation is time-consuming and resource-intensive, limiting the speed of catalyst development. Machine learning (ML) has recently shown great potential to accelerate photocatalyst discovery by analyzing large datasets to predict material properties, identify optimal reaction conditions, and reduce the need for exhaustive experimental testing. This data-driven approach enables rapid exploration of complex chemical spaces and reaction environments, allowing researchers to design photocatalysts with targeted performance for specific environmental and energy applications [80].

Despite these advancements, several challenges remain in integrating ML with photocatalysis. One major issue is data quality and generalizability, as photocatalytic performance data often come from diverse sources with varying experimental conditions, leading to inconsistent or non-transferable models. Model interpretability is another concern, especially for deep learning architectures that often operate as “black boxes”, limiting mechanistic understanding [14,56]. Recent studies have attempted to address these challenges: for example, small datasets have been supplemented with computational estimates of properties, such as band gaps and lattice thermal conductivity, improving prediction accuracy through the introduction of additional descriptors. Techniques such as transfer learning, meta-learning, and the creation of standardized, open access datasets are also promising strategies to enhance model robustness and generalizability [80].

Recent applications of ML in photocatalysis highlight its ability to complement traditional computational methods such as DFT. Models such as MolNexTR enable accurate image-to-graph conversion for molecular representations, while ML interatomic potentials combined with Monte Carlo simulations have been used to investigate CO₂ reduction on CuPt/TiO₂ catalysts [81]. Furthermore, ML has been successfully applied to predict excited-state properties, such as redox potentials and 0-0 transition energies, minimizing the need for extensive DFT calculations. Overall, the integration of ML with experimental and computational techniques has accelerated catalyst design, provided mechanistic insights, and enabled rapid screening of photocatalysts, establishing ML as an essential tool in the development of next-generation materials for environmental and energy applications [80].

Beyond photocatalysis, LLMs and ML approaches are transforming materials discovery, particularly in metal–organic frameworks, where they accelerate structure–property prediction, synthesis optimization, and high-throughput design.

2.8. QMOF Database for MOFs

The modular and tunable nature of metal–organic frameworks (MOFs) offers significant control over their physical and chemical properties, yet identifying the optimal MOFs for specific applications remains challenging due to the vast chemical space. To address this, the Quantum MOF (QMOF) database was developed, providing computed quantum-chemical properties for over 14,000 experimentally synthesized MOFs [72]. This publicly available resource includes DFT-computed geometries, energies, band gaps, densities of states, partial charges, spin densities, bond orders, and other electronic structure properties. The study demonstrates how machine learning models trained on the QMOF database can predict electronic properties, such as band gaps, more efficiently than traditional high-cost DFT calculations. In particular, a crystal graph convolutional neural network achieved high predictive performance, while unsupervised dimensionality reduction with SOAP and composition-based features revealed subtle structure–property relationships. The work highlights several MOFs predicted to have low band gaps, which is notable given the typically insulating nature of most MOFs, suggesting potential candidates for applications requiring electrical conductivity.

Beyond its immediate application to band gap prediction, the QMOF database serves as a versatile platform for a wide range of research directions in MOF discovery. It enables the development of improved machine learning representations specifically tailored for MOF structures, supports transfer learning and multitask learning approaches to enhance prediction accuracy, and assists in benchmarking semi-empirical methods or force fields for MOFs. The database is intended as a living resource, with future updates and expansions planned, allowing subsets or modifications to meet the evolving needs of the MOF research community. By providing a comprehensive, high-quality dataset of quantum-chemical properties, the QMOF database has the potential to accelerate computational materials design and discovery, with a focus on experimentally realized MOFs [72]. In addition to MOFs, machine learning has been applied to a range of materials for CO₂ capture, leveraging featurization and descriptor-based modeling to predict performance and guide the rational design of adsorbents and porous materials.

2.9. ML and LLMs in MOF Development

Machine learning (ML) and large language models (LLMs) are revolutionizing research in metal–organic frameworks (MOFs) by offering new strategies to accelerate discovery, optimization, and materials design. LLMs can automate labor-intensive tasks, such as literature reviews, data extraction, and trend analysis, allowing researchers to rapidly gather insights from vast numbers of publications. These models also enable the prediction of experimental outcomes, optimization of synthesis conditions, and the design of novel MOF structures with targeted properties. By integrating LLMs with existing databases and ML workflows, researchers can perform predictive modeling of key properties, such as gas adsorption, band gaps, and stability, shortening the discovery-to-deployment cycle for high-performance MOFs in energy, environmental, and chemical applications. Tools like ChatMOF exemplify how LLMs can link structural information with property predictions, guiding experimental synthesis efficiently and effectively [66].

In addition to predictive modeling, generative AI methods, including Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), facilitate inverse design, where desired properties dictate the generated MOF structures. These approaches complement high-throughput computational screening (HTCS) and molecular simulations by reducing computational costs and narrowing down candidates for experimental validation. Reinforcement learning (RL) has also been applied to optimize synthesis parameters, such as temperature, solvents, and reaction times, enabling adaptive experimentation that minimizes trial-and-error. For example, AI-assisted optimization of UiO-66 synthesis significantly increased space-time yield, while halving operational costs, illustrating both the practical and economic benefits of integrating AI into MOF research workflows [66].

Machine learning potentials further expand the frontiers of MOF research by combining quantum-mechanical accuracy with computational efficiency. Unlike classical interatomic potentials, MLPs are trained on high-quality quantum mechanical data to capture complex atomic interactions, including bond formation and breaking, allowing large-scale simulations of MOFs that were previously infeasible. These simulations enable the study of phase transitions, adsorption behavior, and mechanical properties at the atomic level, while bridging different hierarchical scales of material modeling. As MLPs continue to evolve alongside LLMs and autonomous experimental systems, MOF research is entering a new era of accelerated, data-driven discovery, where AI not only predicts material properties, but also guides the rational design, synthesis, and deployment of materials tailored for specific applications. The transformative potential of these AI-driven approaches was underscored by the 2024 Nobel Prizes in Physics (John Hopfield and Geoffrey Hinton) and Chemistry (David Baker, Demis Hassabis, and John Jumper), highlighting how computational and data-driven methods are reshaping the study of complex materials. Moving forward, democratizing access to high-quality datasets, developing user-friendly tools, and fostering cross-disciplinary collaboration will be essential to fully harness AI as a driver for next-generation MOF research [66].

3. Artificial Intelligence and Machine Learning in Photocatalysis and Catalyst Design for CO₂ Conversion

3.1. AI-Driven Databases and Reaction Prediction in Photocatalysis

Recent advancements in photocatalysis have increasingly leveraged AI to accelerate reaction discovery and optimization, particularly in multicomponent reactions. A notable development in this area is PhotoCatDB, a curated database of 6523 multicomponent photocatalytic reactions, which includes critical reaction conditions such as photocatalysts, solvents, acids, bases, and additives. By addressing the limitations of existing reaction datasets, PhotoCatDB provides a robust foundation for data-driven modeling [82]. Using this database, the transformer-based deep learning model PhotoCat was developed, pretrained on USPTO data and fine-tuned on PhotoCatDB, achieving a Top-1 accuracy of 82.25% in predicting photocatalytic reactions. Its attention-based architecture allows interpretability, aligning the model’s focus with chemical intuition. PhotoCat has demonstrated practical utility by predicting five previously unreported photocatalytic reactions, subsequently validated in the laboratory, highlighting the synergy between curated databases and deep learning in accelerating green chemistry and sustainable reaction discovery [83].

3.2. Data-Driven Kinetic Modeling and Mechanistic Insights

Complementing reaction prediction, data-driven approaches enable the construction of kinetic models without relying solely on first-principles methods. The derivative-free sparse identification technique (DF-SINDy) incorporates domain knowledge, including mass balance and chemical reaction constraints, to accurately recover governing equations from experimental or synthetic data under varying noise levels and conditions. By reducing optimization complexity and improving interpretability, DF-SINDy supports robust and tractable analysis of multiphase and dynamically evolving catalytic systems, facilitating mechanistic understanding and the identification of reaction pathways [84].

3.3. Machine Learning for Photocatalytic CO₂ Reduction

The increasing concentration of CO₂ due to industrialization underscores the importance of sustainable conversion technologies. Therefore, the development of efficient materials for CO₂ capture and its conversion into high-value chemicals has become an urgent necessity [85]. Photocatalytic CO₂ reduction (PC-CO₂R) into value-added fuels and chemicals offers a promising approach, yet traditional trial-and-error methods are slow and costly. Machine learning enhances PC-CO₂R by analyzing experimental datasets to predict catalyst performance, optimize reaction conditions, and uncover hidden correlations [86]. Techniques ranging from linear regression to artificial neural networks enable rapid screening, material selection, and automated experimentation. ML also facilitates improved interpretability of complex photocatalytic systems. Challenges remain, including dataset quality, computational demands, and overfitting management. Future strategies involve integrating multi-scale simulations, optimizing reactor parameters, and hybridizing ML with chemical knowledge to enhance efficiency, selectivity, and sustainability in CO₂ conversion [86].

3.4. ML in CO₂ Capture

Machine learning is also playing an increasingly significant role in advancing CO₂ capture research and materials science, with a growing number of studies exploring this interdisciplinary area [87]. A key element in developing ML models for CO₂ capture is featurization, which translates materials into numerical or categorical descriptors for computational analysis. These descriptors may include operating conditions, electronic properties such as charge and orbital characteristics, thermodynamic parameters, geometric and structural features, and chemical composition. The selection and impact of these descriptors depend on the type of material, the ML algorithm employed, and the intended application. Metal–organic frameworks (MOFs), ionic liquids, and other porous materials have been widely investigated, with MOFs being particularly suitable due to their modular, repeating structures that facilitate descriptor-based analysis. Effective featurization not only supports accurate prediction of CO₂ capture performance but also provides insights into material design by identifying the features most critical to performance [87].

Beyond featurization, ML models are increasingly applied to both laboratory- and industrial-scale CO₂ capture systems, where they help identify critical descriptors, guide material selection, and optimize operational conditions.

3.5. ML Applications in CO₂ Capture Across Scales

ML applications in carbon capture extend across both industrial-scale and laboratory-scale settings, including solvent-based post-combustion capture, ionic liquids, adsorbents, and membrane technologies. Analyses of the literature reveal that structural, chemical, adsorption-based, and charge-based descriptors are commonly used, especially in datasets containing multiple materials [87]. ML models can also highlight the relative importance of specific descriptors, thereby guiding the rational design of more efficient CO₂ capture materials. While MOFs are the most frequently studied class, other candidates such as activated carbons, polymers, and emerging materials like eutectic solvents are increasingly being considered. Beyond predictive modeling, ML can provide mechanistic understanding and inform design strategies that incorporate synthesis feasibility, stability, and economic considerations, including the potential for generating valuable byproducts from captured CO₂.

Looking forward, the role of ML in CO₂ capture is expected to expand, driven by the development of advanced featurization methods and generative AI approaches for material discovery. The integration of ML with material science and CO₂ capture technologies holds the potential to accelerate innovation, enabling the identification and optimization of high-performance materials more efficiently. The successful translation of ML-discovered materials into practical applications will depend on their synthetic accessibility, stability, and cost-effectiveness. As ML tools become more sophisticated and interpretable, they are likely to play a central role in guiding sustainable material design and enhancing the effectiveness of CO₂ capture technologies [87].

In addition to CO₂ capture, ML and AI are transforming catalyst discovery and design, enabling rapid screening of materials, optimization of synthesis conditions, and automation of experimental workflows.

3.6. AI-Enhanced Catalyst Design for Renewable Energy Applications

Artificial intelligence is increasingly applied to the design of next-generation catalysts for renewable energy, providing an alternative to traditional trial-and-error methods [86]. For example, machine learning frameworks can predict promising candidates for electrochemical CO₂ reduction and hydrogen evolution reactions by combining DFT-derived datasets with surrogate models [88,88].

By combining a small dataset of catalytic reactivity properties obtained from DFT with a surrogate machine learning model, the platform can identify correlations between catalyst structure and activity. Using physically transparent features, such as atomic numbers, electronegativity, coordination numbers, and adsorption energies, the model efficiently screens intermetallic surfaces and predicts key reactivity descriptors, reducing the need for computationally expensive simulations. Transparent descriptors, including atomic numbers, electronegativity, coordination numbers, and adsorption energies, enable rapid screening of intermetallic surfaces and identification of key reactivity parameters. The workflow can be fully automated, iteratively exploring large chemical spaces and narrowing down candidate materials for experimental validation. Despite these advances, challenges remain in considering thermal and electrochemical stability, synthesis feasibility, adsorbate effects, and quantifying uncertainties from DFT and energy scaling relations. Integrating AI with experimental validation and developing more sophisticated fingerprints for active sites are essential for building versatile, robust frameworks for catalyst discovery.

4. Ethical Issues, Safety Risks and Responsible Application of Generative AI

The integration of generative AI in chemical engineering raises important ethical considerations. Professional ethics, emphasizing safety, integrity, and environmental stewardship, are crucial when deploying AI in experimental design and process optimization [89].

Serious ethical issues and safety risks arise because inaccurate results can directly affect workers, facilities, and the environment, not just prediction quality [89]. While AI systems can improve hazard detection and provide decision support, they also introduce new risks, such as automation bias, hidden tool interconnections, update delays, and overreliance on outputs that may seem scientifically plausible but are actually wrong. One of the main ethical issues is accountability. When an AI system recommends unsafe reaction conditions, a flawed control action, or a misleading molecular structure, the ultimate responsibility—whether it lies with model builders, implementers, or human operators—is often unclear. Another important ethical issue is transparency and explainability, as many models cannot clearly explain why they made a particular recommendation. This makes it difficult to assess whether a result is chemically significant or just a statistical artifact. Privacy and security (data protection) are also critical when chemical data, industrial process data, or proprietary synthesis routes are used to train or operate these systems. Ethical issues of generative AI can also relate to intellectual property and authorship, as well as to environmental impacts. In this context, the following question arises: Who owns AI-generated content? Furthermore, training and running large AI models require significant electricity, water for cooling, and computing infrastructure, which can lead to a “green paradox”. There are also challenges in education and human judgment, because introducing AI in education leads to the risk that students and researchers may accept AI solutions without critical thinking. As a result, there is a real danger that AI, instead of serving as a tool, becomes a shortcut that replaces deep understanding and experimental skills.

The safety risk is particularly significant when AI is used for demanding, high-responsibility tasks such as process control, hazard prediction, or autonomous planning. In these cases, incorrect output can result in the selection of the wrong temperature, pressure, catalyst, solvent, or sequence of steps, leading to uncontrolled reactions, low yields, toxic by-products, equipment damage, or exposure of workers and the environment to hazardous situations. The consequences of such failures can be serial, as chemical systems are often nonlinear and sensitive to even small changes. Thus, a single incorrect recommendation can waste time and materials, but may also lead to unsafe scale-up decisions, incorrect hazard assessments, or the choice of a synthesis route that is not feasible or reproducible. AI models can be statistically useful but may fail to capture chemically relevant causality, making experimental validation extremely important. Finally, AI should be used as a support tool, not as an authority in important decision-making. The results of AI systems require credibility checks, uncertainty assessments, and human verification, especially when decisions have safety, regulatory, or environmental implications. Responsible use of AI systems requires strict oversight, validated workflows, and a clear boundary between machine-generated suggestions and experimentally validated results. Finally, responsible application also requires a combination of technological improvements, user-centered design, and regulatory frameworks, ensuring AI accelerates innovation, while maintaining safety, reliability, and ethical standards. More comprehensive information on ethical issues in generative AI, including fairness, harmful content or hallucinations, privacy, safety and societal risks, cybersecurity risks, environmental impacts and other aspects, can be found in the available literature [90,91].

5. Challenges and Future Research Directions

Despite remarkable advances in AI- and ML-driven catalysis, several challenges remain that limit the full potential of data-driven catalyst design. A key issue is the quality, standardization, and availability of experimental datasets. Variations in synthesis procedures, reaction conditions, and reporting practices can significantly reduce the reliability and reproducibility of ML models. Therefore, establishing FAIR-compliant, high-quality, and standardized databases is critical to ensure interoperability across laboratories and facilitate robust, generalizable predictions.

Another major challenge lies in model interpretability and mechanistic insight. While ML models excel at identifying correlations between catalyst structure and performance, understanding the underlying chemical mechanisms remains difficult. Hybrid strategies that combine physics-informed modeling with data-driven approaches offer a pathway to ensure predictions are chemically meaningful, interpretable, and actionable, bridging the gap between computational outputs and experimental design.

Extrapolation beyond known chemical spaces also presents significant hurdles. Many ML models perform reliably within well-characterized datasets but struggle to predict novel catalysts, reaction intermediates, or operating conditions. Thus, future research should focus on developing robust extrapolative frameworks, including reinforcement learning, generative models, and active learning approaches, to explore uncharted material spaces and accelerate the discovery of innovative, high-performance catalysts.

The integration of multi-scale experimental and computational data remains another critical frontier. Bridging atomic-scale simulations with reactor- and process-scale performance is essential for realistic catalyst design and industrial application. Coordinated approaches that combine density functional theory, molecular dynamics, high-throughput experimentation, and process modeling are needed to capture complex interactions, stability challenges, and dynamic reaction environments under operational conditions.

Finally, the responsible deployment of AI in catalysis requires careful attention to safety, sustainability, and human–AI collaboration. Autonomous experimentation and optimization offer great promise, but human expertise remains essential to interpret results, guide experimental planning, and ensure ethical and reliable use of AI. Future research should also focus on multi-objective optimization, in situ monitoring, and frameworks for safe, interpretable, and sustainable AI-assisted chemical innovation. Addressing these challenges will establish a data-driven, mechanistically informed, and sustainable paradigm for next-generation catalyst development.

6. Conclusions

The integration of artificial intelligence (AI), machine learning (ML), and computational modeling with experimental catalysis is redefining the discovery and design of advanced materials. Recent work demonstrates that data-driven approaches can accelerate catalyst optimization across scales, from atomic-level active sites to reactor- and process-level performance. Hybrid strategies that combine physics-informed models with ML enable mechanistic insight, interpretable predictions, and rational design of heterogeneous catalysts, single-atom systems, defect-engineered materials, and multi-element frameworks. Tools such as PHOTOREAC, QMOF, and PhotoCatDB illustrate the power of curated datasets and simulation platforms in bridging experimental studies with predictive modeling, facilitating rapid screening and high-throughput exploration of novel catalytic materials.

A major insight is the transformative impact of AI/ML on reaction optimization and process engineering. Surrogate models, graph neural networks, reinforcement learning, and generative AI allow extrapolation beyond existing datasets, identification of key descriptors, and efficient exploration of uncharted chemical spaces. These approaches not only accelerate catalyst discovery but also enhance understanding of complex reaction mechanisms, electron transfer dynamics, and multi-step catalytic processes, providing a rational basis for tuning activity, selectivity, and stability.

Advances in multi-scale modeling and autonomous experimentation further extend the capabilities of AI-driven catalysis. By integrating atomistic simulations, vibrational spectroscopy, high-throughput experimentation, and real-time process monitoring using advanced operando methods, researchers can optimize synthesis conditions, predict long-term stability, and design reactors with improved efficiency. Large language models and generative tools enhance retrosynthetic planning and experimental design, promoting sustainable and efficient chemical processes.

Despite these advances, challenges remain in dataset standardization, model generalizability, interpretability, and the responsible deployment of AI. Addressing these issues will require continued collaboration across experimental, computational, and AI domains, ensuring that human expertise guides the application of advanced tools in a safe, reproducible, and sustainable manner.

In conclusion, the convergence of AI, ML, computational modeling, and experimental innovation is establishing a paradigm shift in catalysis. This integrated, data-driven approach accelerates discovery, enhances mechanistic understanding, and enables sustainable chemical technologies, paving the way for next-generation catalysts and processes that are both highly efficient and environmentally responsible.

Author Contributions

A.B. and V.T. contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research was conducted within the framework of the project HEMCAT, financed by the European Union’s NextGeneration EU fund from source 581—The recovery and resilience mechanism in the framework of programme financing of public higher education institutions and public scientific institutes.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Baldea, M.; Broadbelt, L.J.; Ierapetritou, M.G.; Kwan, T.A.; Li, C.; Luo, Z.-H.; Ma, X.; Morbidelli, M.; Sahu, K.C.; Scurto, A.M.; et al. 2024 in retrospective: Trends in chemical engineering. Ind. Eng. Chem. Res. 2025, 64, 11615–11623. [Google Scholar] [CrossRef]
Chakkingal, A.; Pirro, L.; Costa da Cruz, A.R.; Barrios, A.J.; Virginie, M.; Khodakov, A.Y.; Thybaut, J.W. Unravelling the influence of catalyst properties on light olefin production via Fischer–Tropsch synthesis: A descriptor space investigation using single-event microkinetics. Chem. Eng. J. 2021, 419, 129633. [Google Scholar] [CrossRef]
Feng, J.; Lansford, J.L.; Katsoulakis, M.A.; Vlachos, D.G. Explainable and trustworthy artificial intelligence for correctable modeling in chemical sciences. Sci. Adv. 2020, 6, eabc3204. [Google Scholar] [CrossRef] [PubMed]
Liu, J.; Liang, L.; Su, B.; Wu, D.; Zhang, Y.; Wu, J.; Fu, C. Transformative strategies in photocatalyst design: Merging computational methods and deep learning. J. Mater. Inform. 2024, 4, 33. [Google Scholar] [CrossRef]
Yang, H.; Che, Y.; Cooper, A.I.; Chen, L.; Li, X. Machine learning accelerated exploration of ternary organic heterojunction photocatalysts for sacrificial hydrogen evolution. J. Am. Chem. Soc. 2023, 145, 27038–27044. [Google Scholar] [CrossRef] [PubMed]
Sa, B.; Hu, R.; Zheng, Z.; Li, X.; Zhang, Y. High-throughput computational screening and machine learning modeling of Janus 2D III-VI van der Waals heterostructures for solar energy applications. Chem. Mater. 2022, 34, 6687–6701. [Google Scholar] [CrossRef]
Singh, A.K.; Montoya, J.H.; Gregoire, J.M.; Persson, K.A. Robust and synthesizable photocatalysts for CO₂ reduction: A data-driven materials discovery. Nat. Commun. 2019, 10, 443. [Google Scholar] [CrossRef]
de la Hidalga, N.; Goodall, A.J.; Anyika, C.; Matthews, B.; Catlow, C.R.A. Designing a data infrastructure for catalysis science aligned to FAIR data principles. Catal. Commun. 2022, 162, 106384. [Google Scholar] [CrossRef]
Spatenka, S.; Matzopoulos, M.; Urban, Z.; Cano, A. From laboratory to industrial operation: Model-based digital design and optimization of fixed-bed catalytic reactors. Ind. Eng. Chem. Res. 2019, 58, 12571–12585. [Google Scholar] [CrossRef]
Nguyen, P.C.H.; Choi, J.B.; Udaykumar, H.S.; Baek, S. Challenges and opportunities for machine learning in multiscale computational modeling. J. Comput. Inf. Sci. Eng. 2023, 23, 060808. [Google Scholar] [CrossRef]
Yuan, Q.; Wang, X.; Xu, D.; Liu, H.; Zhang, H.; Yu, Q.; Bi, Y.; Li, L. Machine learning-assisted catalysts for advanced oxidation processes: Progress, challenges, and prospects. Catalysts 2025, 15, 282. [Google Scholar] [CrossRef]
Deng, C.; Su, Y.; Li, F.; Shen, W.; Chen, Z.; Tang, Q. Understanding activity origin for the oxygen reduction reaction on bi-atom catalysts by DFT studies and machine-learning. J. Mater. Chem. A 2020, 8, 24563–24571. [Google Scholar] [CrossRef]
Ishioka, S.; Fujiwara, A.; Nakanowatari, S.; Takahashi, L.; Taniike, T.; Takahashi, K. Designing catalyst descriptors for machine learning in oxidative coupling of methane. ACS Catal. 2022, 12, 11541–11546. [Google Scholar] [CrossRef]
Casillo, E.; Scattolin, T.; Nolan, S.P. Catalysis meets machine learning: A guide to data-driven discovery and design. Chem. Commun. 2025, 61, 18247–18272. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Li, W.; Wang, S.; Wang, X. The future of catalysis: Applying graph neural networks for intelligent catalyst design. WIREs Comput. Mol. Sci. 2025, 15, e70010. [Google Scholar] [CrossRef]
Wang, S.; Jiang, J. Interpretable catalysis models using machine learning with spectroscopic descriptors. ACS Catal. 2023, 13, 7428–7436. [Google Scholar] [CrossRef]
Zhao, Z.J.; Liu, S.; Zha, S.; Cheng, D.; Studt, F.; Henkelman, G.; Gong, J. Theory-guided design of catalytic materials using scaling relationships and reactivity descriptors. Nat. Rev. Mater. 2019, 4, 792–804. [Google Scholar] [CrossRef]
Dalmau, D.; García-Abellán, S.; Alegre-Requena, J.V. Machine learning in homogeneous catalysis: Basic concepts and best practices. ACS Catal. 2026, 16, 1–11. [Google Scholar] [CrossRef]
Zhu, Q.; Gu, Y.; Ma, J. Digital descriptors in predicting catalysis reaction efficiency and selectivity. J. Phys. Chem. Lett. 2025, 16, 2357–2368. [Google Scholar] [CrossRef]
Choung, S.; Park, W.; Moon, J.; Han, J.W. Rise of machine learning potentials in heterogeneous catalysis: Developments, applications, and prospects. Chem. Eng. J. 2024, 494, 152757. [Google Scholar] [CrossRef]
Mou, T.; Pillai, H.S.; Wang, S.; Wan, M.; Han, X.; Schweitzer, N.M.; Che, F.; Xin, H. Bridging the complexity gap in computational heterogeneous catalysis with machine learning. Nat. Catal. 2023, 6, 122–136. [Google Scholar] [CrossRef]
Doan, H.A.; Li, C.; Ward, L.; Zhou, M.; Curtiss, L.A.; Assary, R.S. Accelerating the evaluation of crucial descriptors for catalyst screening via message passing neural network. Digit. Discov. 2023, 2, 59–68. [Google Scholar] [CrossRef]
Spotti, M.; Maineri, K.; Viñes, F.; Illas, F.; Di Liberto, G.; Pacchioni, G. Scaling relations and catalytic descriptor for the nitrogen reduction on single-atom catalysts. Electrochim. Acta 2025, 542, 147389. [Google Scholar] [CrossRef]
Cheng, Z.; Meng, Q.; Jiang, X.; Gun, S.; Fan, L.S. Machine learning-driven predictive design of catalytic oxygen carriers for chemical looping processes. Discov. Energy 2025, 5, 20. [Google Scholar] [CrossRef]
German Research Foundation (DFG). National Research Data Infrastructure (NFDI). Available online: https://www.dfg.de/en/research-funding/funding-initiative/nfdi (accessed on 31 March 2026).
NFDI4Cat. NFDI4Cat—NFDI for Catalysis-Related Sciences. Available online: https://nfdi4cat.org/nfdi4cat/en/ (accessed on 31 March 2026).
Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef] [PubMed]
NFDI4Cat. NFDI4Cat at EuropaCat 2025. 9 September 2025. Available online: https://nfdi4cat.org/nfdi4cat/en/News/NFDI4Cat+at+EuropaCat+2025.html (accessed on 31 March 2026).
Huskova, N.; Dikova, Y.; Petrenko, T.; Bönisch, T. Improvement of data and metadata quality in catalysis research: A use case-driven methodology. Catal. Today 2025, 446, 115111. [Google Scholar] [CrossRef]
DataCite. DataCite Metadata Schema. Available online: https://schema.datacite.org/ (accessed on 31 March 2026).
Library of Congress. PREMIS Data Dictionary and Schema Revision Process. Available online: https://www.loc.gov/standards/premis/revision-process.html (accessed on 31 March 2026).
FAIR-IMPACT. RSMD Guidelines: Research Software MetaData Guidelines. Available online: https://fair-impact.github.io/RSMD-guidelines/ (accessed on 31 March 2026).
World Wide Web Consortium. Resource Description Framework (RDF). Available online: https://www.w3.org/RDF/ (accessed on 19 October 2025).
Xie, E.; Wang, X.; Siepmann, J.I.; Chen, H.; Snurr, R.Q. Generative AI for design of nanoporous materials: Review and future prospects. Digit. Discov. 2025, 4, 2336–2363. [Google Scholar] [CrossRef]
Back, S.; Aspuru-Guzik, Á.; Ceriotti, M.; Gryn’ova, G.; Grzybowski, B.; Gu, G.H.; Hein, J.; Hippalgaonkar, K.; Hormázabal, R.; Jung, Y.; et al. Accelerated chemical science with AI. Digit. Discov. 2024, 3, 23–33. [Google Scholar] [CrossRef] [PubMed]
St. John, P.C.; Guan, Y.; Kim, Y.; Kim, S.; Paton, R.S.; Others. Prediction of organic homolytic bond dissociation enthalpies at near chemical accuracy with sub-second computational cost. Nat. Commun. 2020, 11, 2328. [Google Scholar] [CrossRef]
Uhrin, M.; Huber, S.P.; Yu, J.; Marzari, N.; Pizzi, G. Workflows in AiiDA: Engineering a high-throughput, event-based engine for robust and modular computational workflows. Comput. Mater. Sci. 2021, 187, 110086. [Google Scholar] [CrossRef]
Draxl, C.; Scheffler, M. Nomad: The FAIR concept for big data-driven materials science. MRS Bull. 2018, 43, 676–682. [Google Scholar] [CrossRef]
Li, A.; Oh, R.; Ren, Y.; Zhao, L.; Huang, X.J. Machine learning for advanced oxidation catalysis: From descriptors to inverse design. Mol. Catal. 2026, 596, 115889. [Google Scholar] [CrossRef]
Zhang, L.; Bing, Q.; Qin, H.; Yu, L.; Li, H.; Deng, D. Artificial intelligence for catalyst design and synthesis. Matter 2025, 8, 102138. [Google Scholar] [CrossRef]
Günay, M.E.; Yıldırım, R. Machine learning for catalytic reaction systems: A framework for complex chemical processes. ACS Eng. Au 2026, 6, 48–67. [Google Scholar] [CrossRef]
Roch, L.M.; Häse, F.; Kreisbeck, C.; Tamayo-Mendoza, T.; Yunker, L.P.E.; Hein, J.E.; Aspuru-Guzik, A. ChemOS: An orchestration software to democratize autonomous discovery. PLoS ONE 2020, 15, e0229862. [Google Scholar] [CrossRef] [PubMed]
Tan, Z.; Yang, Q.; Luo, S. AI molecular catalysis: Where are we now? Org. Chem. Front. 2025, 12, 2759. [Google Scholar] [CrossRef]
Li, A.; Cui, P.; Wang, X.; Fisher, A.; Li, L.; Cheng, D. The artificial intelligence-catalyst pipeline: Accelerating catalyst innovation from laboratory to industry. Front. Chem. Sci. Eng. 2025, 19, 55. [Google Scholar] [CrossRef]
Ma, K. AI agents in chemical research: GVIM—An intelligent research assistant system. Digit. Discov. 2025, 4, 355–375. [Google Scholar] [CrossRef]
Bran, A.M.; Cox, S.; Schilter, O.; Baldassari, C.; White, A.D.; Schwaller, P. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 2024, 6, 525–535. [Google Scholar] [CrossRef]
Boiko, D.A.; MacKnight, R.; Kline, B.; Gomes, G. Autonomous chemical research with large language models. Nature 2023, 624, 570–578. [Google Scholar] [CrossRef]
Yu, B.; Baker, F.N.; Chen, Z.; Herb, G.; Gou, B.; Adu-Ampratwum, D.; Ning, X.; Sun, H. Tooling or not tooling? The impact of tools on language agents for chemistry problem solving. In Findings of the Association for Computational Linguistics: NAACL; Association for Computational Linguistics: Albuquerque, NM, USA, 2025; pp. 7620–7640. [Google Scholar] [CrossRef]
McNaughton, A.D.; Ramalaxmi, G.; Kruel, A.; Knutson, C.R.; Varikoti, R.A.; Kumar, N. CACTUS: Chemistry Agent Connecting Tool-Usage to Science. arXiv 2024, arXiv:2405.00972. [Google Scholar] [CrossRef]
Kumar, A.; Zavala, V.M. Editorial for the AI/ML in Chemical Engineering Special Issue. Ind. Eng. Chem. Res. 2025, 64, 9441–9442. [Google Scholar] [CrossRef]
Daniel, T.; Xuan, J. Responsible use of generative AI in chemical engineering. Digit. Chem. Eng. 2024, 12, 100168. [Google Scholar] [CrossRef]
Acosta-Herazo, R.; Cañaveral-Velásquez, B.; Pérez-Giraldo, K.; Mueses, M.A.; Pinzón-Cárdenas, M.H.; Machuca-Martínez, F. A MATLAB-Based Application for Modeling and Simulation of Solar Slurry Photocatalytic Reactors for Environmental Applications. Water 2020, 12, 2196. [Google Scholar] [CrossRef]
Zhang, R.; He, H.; Tang, Y.; Zhang, Z.; Zhou, H.; Yu, J.; Zhang, L.; Dai, B. A review on Fe₂O₃-based catalysts for toluene oxidation: Catalysts design and optimization with the formation of abundant oxygen vacancies. ChemCatChem 2024, 16, e202400396. [Google Scholar] [CrossRef]
Mine, S.; Takao, M.; Yamaguchi, T.; Toyao, T.; Maeno, Z.; Siddiki, S.; Takakusagi, S.; Shimizu, K.; Takigawa, I.; Shimizu, K. Analysis of updated literature data up to 2019 on the oxidative coupling of methane using an extrapolative machine-learning method to identify novel catalysts. ChemCatChem 2021, 13, 3636–3655. [Google Scholar] [CrossRef]
de Araujo, L.G. Catalysis, meet the machine: From models to meaning. Catal. Res. 2025, 5, 005. [Google Scholar] [CrossRef]
Abraham, B.M.; Viñes, F.; Jyothirmai, M.V.; Sinha, P.; Singh, J.K.; Illas, F. Catalysis in the digital age: Unlocking the power of data with machine learning. WIREs Comput. Mol. Sci. 2024, 14, e1730. [Google Scholar] [CrossRef]
Rekkas, V.P.; Sotiroudis, S.; Sarigiannidis, P.; Wan, S.; Karagiannidis, G.K.; Goudos, S.K. Machine Learning in Beyond 5G/6G Networks—State-of-the-Art and Future Trends. Electronics 2021, 10, 2786. [Google Scholar] [CrossRef]
Abdul Wahab, Y.; Shapril, N.N.; Johari, S.; Johan, M.R. Machine learning for mechanistic insights and optimization in CO₂ cycloaddition catalysis. Appl. Catal. A General. 2026, 710, 120679. [Google Scholar] [CrossRef]
Thalpage, N.S. Unlocking the black box: Explainable artificial intelligence (XAI) for trust and transparency in AI systems. J. Digit. Art. Humanit. 2023, 4, 31–36. [Google Scholar] [CrossRef]
Semnani, P.; Bogojeski, M.; Bley, F.; Zhang, Z.; Wu, Q.; Kneib, T.; Herrmann, J.; Weisser, C.; Patcas, F.; Müller, K.-R. A machine learning and explainable AI framework tailored for unbalanced experimental catalyst discovery. J. Phys. Chem. C 2024, 128, 21349–21367. [Google Scholar] [CrossRef]
Myllyaho, L.; Raatikainen, M.; Männistö, T.; Mikkonen, T.; Nurminen, J.K. Systematic literature review of validation methods for AI systems. J. Syst. Softw. 2021, 181, 111050. [Google Scholar] [CrossRef]
Xu, Y.; Wang, H.; Zhang, W.; Xie, L.; Chen, Y.; Salim, F.; Zhang, Y.; Gooding, J.; Walsh, T. AI-empowered catalyst discovery: A survey from classical machine learning approaches to large language models. arXiv 2025. [Google Scholar] [CrossRef]
Balcells, D. Co-intelligent design of catalysis research with large language models: Hype or reality? ACS Catal. 2025, 15, 16412–16420. [Google Scholar] [CrossRef]
Li, J.; Wang, J.; He, T.; Li, Z.; Xu, H.; Fan, Z.; Liao, F.; Liu, Y.; Kang, Z. Dynamic machine learning-driven optimization of microwave-synthesized photocatalysts for enhanced hydrogen peroxide production. ChemCatChem 2025, 17, e00341. [Google Scholar] [CrossRef]
Wang, Y.; Sun, Y.; Wang, H.; Li, J.; Liu, X.; Dong, F. Infrared spectra-based machine learning framework for photocatalytic reaction and performance. Adv. Intell. Syst. 2025, 5, 2500101. [Google Scholar] [CrossRef]
Ozcan, A.; Coudert, F.-X.; Rogge, S.M.J.; Heydenrych, G.; Fan, D.; Sarikas, A.P.; Keskin, S.; Maurin, G.; Froudakis, G.E.; Wuttke, S.; et al. Artificial intelligence paradigms for next-generation metal−organic framework research. J. Am. Chem. Soc. 2025, 147, 23367–23380. [Google Scholar] [CrossRef]
Wang, G.; Mine, S.; Chen, D.; Jing, Y.; Ting, K.W.; Yamaguchi, T.; Takao, M.; Maeno, Z.; Takigawa, I.; Matsushita, K.; et al. Accelerated discovery of multi-elemental reverse water-gas shift catalysts using extrapolative machine learning approach. Nat. Commun. 2023, 14, 5861. [Google Scholar] [CrossRef]
Yu, Q.; Ma, N.; Leung, C.; Liu, H.; Ren, Y.; Wei, Z. AI in single-atom catalysts: A review of design and applications. J. Mater. Inform. 2025, 5, 9. [Google Scholar] [CrossRef]
Bokhimi, X. Learning the use of artificial intelligence in heterogeneous catalysis. Front. Chem. Eng. 2021, 3, 740270. [Google Scholar] [CrossRef]
Huang, J.; Zong, Z.; Wang, P.; Zhang, Y.; Gao, D.; Wang, Y.; Li, Z. Machine learning-driven simulation and optimization of phosphate adsorption on metal-organic frameworks. Sep. Purif. Technol. 2026, 394, 137479. [Google Scholar] [CrossRef]
Srinivasan, K.; Bhakte, A.; Puliyanda, A.; Thosar, D.; Srinivasan, R.; Singh, K.; Addo, P.; Prasad, V. Artificial intelligence and machine learning at various stages and scales of process systems engineering. Can. J. Chem. Eng. 2025, 103, 22525. [Google Scholar] [CrossRef]
Rosen, A.S.; Iyer, S.M.; Ray, D.; Yao, Z.; Aspuru-Guzik, Á.; Gagliardi, L.; Notestein, J.M.; Snurr, R.Q. Machine learning the quantum-chemical properties of metal–organic frameworks for accelerated materials discovery. Matter 2021, 4, 1578–1597. [Google Scholar] [CrossRef]
de Araujo, L.G.; Vilcocq, L.; Fongarland, P.; Schuurman, Y. Recent developments in the use of machine learning in catalysis: A broad perspective with applications in kinetics. Chem. Eng. J. 2025, 508, 160872. [Google Scholar] [CrossRef]
Mollick, E. Co-Intelligence: Living and Working with A.; Portfolio/Penguin: New York, NY, USA, 2024. [Google Scholar]
Taniike, T.; Fujiwara, A.; Nakanowatari, S.; García-Escobar, F.; Takahashi, K. Automatic feature engineering for catalyst design using small data without prior knowledge of target catalysis. Commun. Chem. 2024, 7, 11. [Google Scholar] [CrossRef]
Ntoutsi, E.; Fafalios, P.; Gadiraju, U.; Iosifidis, V.; Nejdl, W.; Vidal, M.-E.; Ruggieri, S.; Turini, F.; Papadopoulos, S.; Krasanakis, E.; et al. Bias in data-driven artificial intelligence systems—An introductory survey. WIREs Data Min. Knowl. Discov. 2020, 10, e1356. [Google Scholar] [CrossRef]
Guendouzi, B.S.; Ouchani, S.; El Assaad, H.; El Zaher, M. A systematic review of federated learning: Challenges, aggregation methods, and development tools. J. Netw. Comput. Appl. 2023, 220, 103714. [Google Scholar] [CrossRef]
Liu, Q.; Wang, X.; Wei, Y.; Xu, L.; Yang, Y.; Wang, J. Interpretable machine learning-assisted high-throughput screening of highly active nitrogen fixation dual-atom catalysts. AIChE J. 2025, 71, e18866. [Google Scholar] [CrossRef]
Omranpour, A.; Elsner, J.; Lausch, K.N.; Behler, J. Machine learning potentials for heterogeneous catalysis. arXiv 2024. [Google Scholar] [CrossRef]
Tunala, S.; Zhai, S.; Wu, F.; Chen, Y.-H. Machine learning in photocatalysis: Accelerating design, understanding, and environmental applications. Sci. China Chem. 2025, 68, 3415–3428. [Google Scholar] [CrossRef]
Sumaria, V.; Rawal, T.B.; Li, Y.F.; Sommer, D.; Vikoren, J.; Bondi, R.J.; Rupp, M.; Prasad, A.; Prasad, D. Machine learning, density functional theory, and experiments to understand the photocatalytic reduction of CO₂ on CuPt/TiO₂. J. Phys. Chem. C 2024, 128, 14247–14258. [Google Scholar] [CrossRef]
Xu, J.; Zhai, S.; Huang, P.; Yu, W.; Mao, Q.; Du, K.; Su, W.; Sun, B.; Jin, C.; Su, A. An artificial intelligence-driven synthesis planning platform (PhotoCat) for photocatalysis. Commun. Chem. 2026, 9, 92. [Google Scholar] [CrossRef]
Xu, J.; Su, A.; Huang, P.; Yu, W.; Du, K.; Fan, Z.; Sun, B.; Zhong, Z.; Jin, C.; Su, W. PhotoCat: An artificial intelligence-driven synthesis planning platform for photocatalysis. ChemRxiv 2023. [Google Scholar] [CrossRef]
Prabhu, S.; Kosir, N.; Kothare, M.V.; Rangarajan, S. Derivative-free domain-informed data-driven discovery of sparse kinetic models. Ind. Eng. Chem. Res. 2025, 64, 2601–2615. [Google Scholar] [CrossRef]
Liu, X.; Wang, C.; Chen, C.; Pan, Z.; Gao, C.; Lai, W.; Zhao, J.; Tian, T.; Xiao, W. Recent advances in hierarchical porous materials for CO₂ capture and utilization. Coord. Chem. Rev. 2025, 544, 21627. [Google Scholar] [CrossRef]
Ali, M.M.; Hossen, M.A.; Abd Aziz, A. Progress in prediction of photocatalytic CO₂ reduction using machine learning approach: A mini review. Next Mater. 2025, 8, 100522. [Google Scholar] [CrossRef]
Orhan, I.B.; Zhao, Y.; Babarao, R.; Thornton, A.W.; Le, T.C. Machine learning descriptors for CO₂ capture materials. Molecules 2025, 30, 650. [Google Scholar] [CrossRef] [PubMed]
Tran, K.; Ulissi, Z.W. Active learning across intermetallics to guide discovery of electrocatalysts for CO₂ reduction and H₂ evolution. Nat. Catal. 2018, 1, 696–703. [Google Scholar] [CrossRef]
Yang, H.; Kareck, T.L.; Wang, Q. New era of AI in chemical process safety: Foundation models. ACS Chem. Health Saf. 2026, 33, 171–179. [Google Scholar] [CrossRef]
Hagendorff, T. Mapping the ethics of generative AI: A comprehensive scoping review. Minds Mach. 2024, 34, 39. [Google Scholar] [CrossRef]
Gunasekara, L.; El-Haber, N.; Nagpal, S.; Moraliyage, H.; Issadeen, Z.; Manic, M.; De Silva, D. A systematic review of responsible artificial intelligence principles and practice. Appl. Syst. Innov. 2025, 8, 97. [Google Scholar] [CrossRef]

Figure 1. Comparison of artificial intelligence (AI) vs machine learning (ML) vs deep learning (DL).

Figure 2. A schematic representation of multidimensional spaces during catalyst design. Reproduced with permission from ref. [24].

Figure 3. The key categories of machine learning in catalysis.

Table 1. A summary of the advantages, disadvantages and critical perspectives of major AI/ML techniques.

Advantages	Disadvantages	Critical Perspectives	Ref.
Supervised ML
Highly interpretable and fast prediction Reduced experimental costs Require little tuning	Dependence on large datasets Poor with raw, unstructured data Limited mechanistic insight Data imbalance High complexity	The most mature and practical ML strategy in catalysis. Often functions as an advanced interpolation tool rather than a true discovery engine. Published models perform well only within limited chemical spaces and fail under realistic industrial conditions. The absence of standardized negative-result databases remains a major limitation.	[57]
Unsupervised Learning
Discovery of hidden patterns Useful for descriptor generation Can reveal mechanistic groupings	Lower predictive capability Interpretation challenges Sensitive to preprocessing	Less powerful for direct prediction, but essential for understanding high-dimensional chemical spaces and reducing dataset complexity. Results can become subjective if not supported by physical chemistry principles.	[14,21]
Reinforcement Learning (RL)
Efficient optimization Adaptive experimentation Resource efficiency	Training instability Requires well-designed reward functions Computationally intensive	Attractive for self-driving laboratories, but remains limited by experimental throughput and imperfect reward definitions. Translating RL strategies from simulations to industrial chemistry is still challenging.	[34]

Table 2. The performance of AI/ML methods and the quantitative improvements they provide compared to classical methods.

Performance Metric	Typical Gain
Catalyst screening speed	100–1000 times faster
Experimental reduction	50–90% fewer experiments
Prediction accuracy	R² up to 0.99
Optimization time	Week to hours or days
Throughput	10–100 times more experiments per day
Yield/selectivity improvement	Often > 10–30%

Table 3. Overview of studies on photocatalysis and solar photoreactors; AI-driven discovery with methods, findings, and limitations.

Key Methods	Application	Main Findings	Limitations/Challenges	Improvements/Impacts	Ref.
PHOTOREAC (MATLAB-based photon absorption + kinetic modeling)	Modeling slurry solar photocatalytic reactors	Estimates radiation-independent kinetic constants, compares kinetic models, analyzes operational parameters	Limited to TiO₂P₂₅, assumes well-mixed system, ignores mass transport limitations	Enhanced understanding of reaction kinetics; guided reactor optimization	[52]
PhotoCat (transformer-based deep learning) + PhotoCatDB (curated database)	Multicomponent photocatalytic reactions	Predicts reaction outcomes (Top-1 accuracy 82.25%), interpretable, experimentally validated predictions	Dataset completeness, reaction conditions variation, predictive accuracy	Accelerated reaction prediction; improved catalyst selection; reduced experimental workload	[62]
Machine learning (linear regression, decision trees, random forests, ANNs, k-NN)	Photocatalytic CO₂ reduction	Predicts catalyst performance, optimizes reaction conditions, uncovers correlations	Dataset quality, computational demands, overfitting, model bias	More efficient catalyst screening; optimized conditions; discovery of new catalyst relationships	[63]
Few-shot ML + iterative experiments	Microwave-assisted photocatalyst synthesis	Efficient optimization of H₂O₂ production; high performance in three iterations	Limited dataset availability	Reduced experimental time; faster catalyst development; resource savings	[64]
ML linked with IR spectroscopy	Catalyst screening	Predicts nitrate formation from adsorbed species; accelerates experiments; interpretable mechanistic insights	Transferability to other catalysts; data quality	Faster screening process; mechanistic understanding; targeted catalyst design	[65]
DF-SINDy (derivative-free sparse identification)	Kinetic model discovery	Recovers governing equations from experimental data, incorporates domain knowledge, interpretable	Sensitive to noise, multiphase system complexity	Better understanding of reaction mechanisms; data-driven model development	[66]

Table 4. Heterogeneous catalysis and catalyst design.

Key Methods	Application	Main Findings	Limitations/Challenges	Improvements/Impact	Ref.
ML potentials (MLPs)	Atomistic simulation of heterogeneous catalysis	Enables ab initio accuracy at larger scales; captures dynamic behaviors	Data selection, transferability, electronic structure references	Improved predictive accuracy; enabled larger-scale simulations; accelerated catalyst design	[50]
Structural engineering + defect modulation	Fe₂O₃-based VOC oxidation	Oxygen vacancies, morphology control, heteroatom doping improve activity	Low-temp activity, complex-gas mixtures challenge stability	Enhanced catalyst activity; better stability; tailored catalyst properties	[53]
Machine learning (Extra Trees Regressor, XGBoost, Random Forest)	Oxidative coupling of methane (OCM)	Predicts catalyst compositions, identifies elemental features affecting C2 yields	Dataset quality, feature representation	Accelerated catalyst discovery; identification of key elements; optimized catalyst formulations	[54]
Extrapolative ML + iterative experiments	Discovery of multi-element catalysts for reverse water–gas shift reaction	Identified over 100 highly active multi-element catalysts, including previously untested elements	Limited training data, generalizability	Rapid identification of promising catalysts; expanded catalyst space exploration	[67]
Machine learning, neural networks, high-throughput simulations, integration with DFT	AI in single-atom catalyst (SAC) design and optimization	Guides high-throughput simulations, predicts novel structures, integrates experimental data	Complexity, interpretability, data integration	Accelerated discovery of SACs; improved predictive tools; streamlined experimental validation	[68]
ML surrogate models + DFT descriptors	CO₂ reduction and H₂ evolution	Predicts promising candidates, screens chemical space, identifies activity descriptors	Stability, synthesis feasibility, adsorption effects, DFT uncertainties	Faster screening; targeted catalyst synthesis; better understanding of activity descriptors	[64]
Supervised learning	Hydrodesulfurization/heterogeneous catalysis	Predicts adsorption energies, surface areas, adsorption isotherms, sulfur content; accelerates design	Requires accurate input/output variables, model interpretability	Faster catalyst screening; more accurate property prediction; streamlined catalyst development	[69]

Table 5. MOFs, CO₂ capture, and process systems.

Key Methods	Application	Main Findings	Limitations/Challenges	Ref.
Machine learning + descriptor-based featurization	CO₂ capture materials (MOFs, ionic liquids, membranes)	Identifies key descriptors for performance, predicts capture efficiency, guides design	Dataset diversity, synthetic feasibility, material stability	[55]
LLMs + generative AI (VAEs, GANs, RL)	MOF discovery and synthesis optimization	Automates literature review, predicts properties, enables inverse design, guides adaptive synthesis	Integration with experimental workflows, interpretability, data availability	[68]
Generative AI	Chemical engineering workflows (flowsheet/P&ID generation, process optimization)	Automates designs, optimizes chemical processes, accelerates innovation	Limited machine-readable data, poor integration with domain knowledge, risk of unsafe outputs	[71]
Quantum MOF (QMOF) database + ML (crystal graph CNN, SOAP, composition-based features)	MOF electronic property prediction	Predicts band gaps, identifies structure–property relationships, accelerates computational discovery	Vast chemical space, high DFT costs	[72]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bratovčić, A.; Tomašić, V. Artificial Intelligence- and Machine Learning-Driven Strategies for Catalyst Design and Sustainable Chemical Processes. Processes 2026, 14, 1866. https://doi.org/10.3390/pr14121866

AMA Style

Bratovčić A, Tomašić V. Artificial Intelligence- and Machine Learning-Driven Strategies for Catalyst Design and Sustainable Chemical Processes. Processes. 2026; 14(12):1866. https://doi.org/10.3390/pr14121866

Chicago/Turabian Style

Bratovčić, Amra, and Vesna Tomašić. 2026. "Artificial Intelligence- and Machine Learning-Driven Strategies for Catalyst Design and Sustainable Chemical Processes" Processes 14, no. 12: 1866. https://doi.org/10.3390/pr14121866

APA Style

Bratovčić, A., & Tomašić, V. (2026). Artificial Intelligence- and Machine Learning-Driven Strategies for Catalyst Design and Sustainable Chemical Processes. Processes, 14(12), 1866. https://doi.org/10.3390/pr14121866

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Artificial Intelligence- and Machine Learning-Driven Strategies for Catalyst Design and Sustainable Chemical Processes

Abstract

1. Introduction

1.1. Advances and Challenges in Modern Catalysis

1.2. The Central Role of Catalysis and the Complexity of Catalyst Design

1.3. Catalysis for Sustainable and Renewable Technologies

1.4. Computational, Digital, and Data Infrastructure for Modern Catalysis

2. Artificial Intelligence in Chemical Discovery and Engineering

2.1. Advances in AI and ML for Catalyst Design, Reaction Optimization, and Multi-Scale Process Engineering

2.2. AI in Catalyst Design and High-Throughput Platforms

2.3. Machine Learning and Vibrational Spectroscopy for Mechanistic Insights

2.4. ML for Nitrogen Reduction Reaction (NRR) Dual-Atom Catalysts