Molecular Dynamics and Machine Learning in Catalysts

: Given the importance of catalysts in the chemical industry, they have been extensively investigated by experimental and numerical methods. With the development of computational algorithms and computer hardware, large-scale simulations have enabled inﬂuential studies with more atomic details reﬂecting microscopic mechanisms. This review provides a comprehensive summary of recent developments in molecular dynamics, including ab initio molecular dynamics and reaction force-ﬁeld molecular dynamics. Recent research on both approaches to catalyst calculations is reviewed, including growth, dehydrogenation, hydrogenation, oxidation reactions, bias, and recombination of carbon materials that can guide catalyst calculations. Machine learning has attracted increasing interest in recent years, and its combination with the ﬁeld of catalysts has inspired promising development approaches. Its applications in machine learning potential, catalyst design, performance prediction, structure optimization, and classiﬁcation have been summarized in detail. This review hopes to shed light and perspective on ML approaches in catalysts.


Introduction
Catalysts have attracted growing interest due to their unique effects on chemical reactions. A catalyst can increase or decrease the chemical reaction rate without changing its chemical properties and does not change the chemical equilibrium. Therefore, catalysts are widely used in numerous fields, like electroreduction [1][2][3], chemical formation [4,5], combustion [6][7][8], and environmental conservation [9][10][11]. There are many kinds of catalysts, such as metal catalysts, metal oxide catalysts, molecular sieve catalyst [12], biocatalyst [13], and nano catalyst [14]. With the development of catalysts, different catalysts are constantly being discovered and created.
Although new catalysts are still being discovered, a deep understanding of the catalysis mechanism in chemical reactions still lacks and needs continuous improvement. Computational simulation and experimentation are two main approaches to study catalysts. Compared with experiments, computational simulations can provide atomic insights that go deeper into the microscopic mechanism [15]. First-principles calculation is a common computational approach in the catalysis field. The ab initio molecular dynamics (AIMD) method is preferred to study the reaction mechanism of catalytic reactions, which can solve the difficulties in describing the chemical reactions accurately, including the precise calculation of electronic structure and the dynamic process of atomic motion [16]. The AIMD approach solves the Schrödinger equation by various approximations [17]. It combines quantum mechanics and molecular dynamics that can accurately describe the electronic 2. 1

. Introduction of Molecular Dynamics
Although the reaction mechanism of a chemical reaction can be investigated by experiments even assisted with robot [42], studying complex systems by experiments is still tricky. Calculations provide the feasibility to explore the complex systems and reactions. Both static calculations and molecular dynamics have been intensively used to study the reaction mechanism. However, analyses on the reaction mechanism require accurate electronic structure calculations and the real-time tracking of atomic motions. Molecular dynamics provide more information and give vivid dynamics configurations that order more kinetics, thermal dynamics, and reaction trajectories for visual inspection. Furthermore, with the increasing complexity of the reaction process, molecular dynamics offer more computations possibilities to help solve the time-scale gap of AIMD.

Ab initio Molecular Dynamics
Ab initio molecular dynamics, which combines molecular dynamics with force directly calculated from the electronic structure, is a helpful method in the theoretical calculations of chemical reactions [43]. The electronic structure is solved directly at each step, and therefore it allows for bond breaking and formation [44,45]. Although a direct solution of the Schrödinger equation can fully reflect the wave function and the exact total energy of the nucleus and electron [44], it is impossible to solve the Schrödinger equation directly in complex systems. Therefore, several approximations are employed to solve these problems. One of the most important approximation methods is the Born-Oppenheimer approximation [46]. The Born-Oppenheimer approximation assumes that the motion of the nucleus and electron can be separated due to the difference between the nuclear and electronic masses. Moreover, many other approximations have been used to simplify further and create several different methods such as Hartree-Fock molecular dynamics [47]; Kohn-Sham molecular dynamics [48]; Car-Parrinello molecular dynamics [49,50]; and Path Integral molecular dynamics [51]. There are numerous AIMD codes that are popular and widely used, including VASP [52], Quantum ESPRESSO [53], CP2K [54], and CPMD [55].

Reactive Force Field Molecular Dynamics
Wide use of the AIMD approach, the computing speed and expenditure restricts the size of systems. As for the large systems, such as polymers [56,57], many active sites [58] and different approaches should be employed for simulation, like ReaxFF molecular dynamics. Unlike the AIMD approach, in which the electronic structure is solved directly, ReaxFF molecular dynamics is based on the reactive force field. ReaxFF is a bond-orderdependent force field that can be expressed as: where the first term E bond is bond energy and the second and third terms E over and E under are the over-coordination and under-coordination penalty terms, respectively. The fourth term E val is the valence angle term. The fifth term Epan is also a penalty term representing the effects of over-coordination and under-coordination in the central atom. E tors is the torsion angle term, and E conj describes the conjugation effects to the total energy. The last two terms E vdw and E coulomb denote the non-bonded van der Waals interactions and Coulomb interactions, respectively. The most important assumption in ReaxFF is the bond order, which can be calculated directly based on the interatomic distance r ij, and the following equation: The three terms in Equation (2) represent the sigma bond, the first pi bond, and the second pi bond, respectively. The initial ReaxFF only described the hydrocarbons and gradually expanded to other materials, such as Si, SiO 2 [59], MgH [60], and Al 2 O 3 [61]. A In the following subsections, different chemical processes studied by molecular dynamics were reported, including the growth of carbon materials, dehydrogenation and oxidation reaction. In addition, several dynamical phenomena in catalysis that can only model with molecular dynamics, such as segregation and restructuring, were also reported.

The Growth of the Carbon Materials
Undoubtedly, carbon materials have been fascinating in the last two decades, especially with the discovery of carbon nanotubes and graphene [68][69][70][71]. In the past, carbon materials have been extensively studied and prepared by different methods in experiments, whereas, even now, the growth mechanisms of some carbon materials, such as multi-walled carbon nanotubes, are still lacking [72]. Chen et al. [73] investigated the dynamics of the growth of amorphous carbon in graphene using AIMD. The generated structures derived from sp 3 -carbon and sp 2 -carbon showed significant differences when the system temperature varied from 300 K to 1800 K under the catalysis of nickel, and this transformation process is depicted in Figure 2. In addition, a particularly different transformation process from the conventional chemical vapor deposition (CVD) growth was found. Fukuhara et al. [74] investigated nickel-carbon binary clusters as catalysts for the In the following subsections, different chemical processes studied by molecular dynamics were reported, including the growth of carbon materials, dehydrogenation and oxidation reaction. In addition, several dynamical phenomena in catalysis that can only model with molecular dynamics, such as segregation and restructuring, were also reported.

The Growth of the Carbon Materials
Undoubtedly, carbon materials have been fascinating in the last two decades, especially with the discovery of carbon nanotubes and graphene [68][69][70][71]. In the past, carbon materials have been extensively studied and prepared by different methods in experiments, whereas, even now, the growth mechanisms of some carbon materials, such as multi-walled carbon nanotubes, are still lacking [72]. Chen et al. [73] investigated the dynamics of the growth of amorphous carbon in graphene using AIMD. The generated structures derived from sp 3 -carbon and sp 2 -carbon showed significant differences when the system temperature varied from 300 K to 1800 K under the catalysis of nickel, and this transformation process is depicted in Figure 2. In addition, a particularly different transformation process from the conventional chemical vapor deposition (CVD) growth was found. Fukuhara et al. [74] investigated nickel-carbon binary clusters as catalysts for the formation of carbon nanotubes. Using AIMD, the kinetic process of ethanol dehydrogenation was simulated, and the catalytic mechanism of nickel-carbon clusters was revealed at the atomic scale. The phenomenon that more carbon atoms tended to stay on the surface of nickel was observed. Meanwhile, carbon chains formed on the surface as the number of carbon atoms increased.
formation of carbon nanotubes. Using AIMD, the kinetic process of ethanol dehydrogenation was simulated, and the catalytic mechanism of nickel-carbon clusters was revealed at the atomic scale. The phenomenon that more carbon atoms tended to stay on the surface of nickel was observed. Meanwhile, carbon chains formed on the surface as the number of carbon atoms increased.

Figure 2.
The conversion from an initial configuration (a-C/Ni3C, a-C with 12 Ni, a-C, sp 2 -C/Ni3C, sp 2 -C with 12 Ni, and sp 2 -C) to a final model with different initial temperature [73] . Copyright 2016, Royal Society of Chemistry.
In addition, the problems of the chirality of carbon nanotubes have long been mentioned. The process of carbon nanotube growth, which includes the dissolution of carbon and the formation of carbon nanotube, was widely studied by a reactive force field. Neyts et al. [75] employed ReaxFF molecular dynamics and Monte Carlo simulations to investigate the growth process of carbon nanotubes. The observed growth process was consistent with the previous studies. Most importantly, the change of the chirality during the growth process was firstly reported, which is shown in the Figure 3.  In addition, the problems of the chirality of carbon nanotubes have long been mentioned. The process of carbon nanotube growth, which includes the dissolution of carbon and the formation of carbon nanotube, was widely studied by a reactive force field. Neyts et al. [75] employed ReaxFF molecular dynamics and Monte Carlo simulations to investigate the growth process of carbon nanotubes. The observed growth process was consistent with the previous studies. Most importantly, the change of the chirality during the growth process was firstly reported, which is shown in the Figure 3.
the atomic scale. The phenomenon that more carbon atoms tended to stay on the surface of nickel was observed. Meanwhile, carbon chains formed on the surface as the number of carbon atoms increased. In addition, the problems of the chirality of carbon nanotubes have long been mentioned. The process of carbon nanotube growth, which includes the dissolution of carbon and the formation of carbon nanotube, was widely studied by a reactive force field. Neyts et al. [75] employed ReaxFF molecular dynamics and Monte Carlo simulations to investigate the growth process of carbon nanotubes. The observed growth process was consistent with the previous studies. Most importantly, the change of the chirality during the growth process was firstly reported, which is shown in the Figure 3.

Dehydrogenation and Hydrogenation
Ethylene is a necessary chemical raw material, and ethane dehydrogenation is one of the essential methods to produce ethylene. Though there is comprehensive application in industry, there are still numerous challenges [76], such as deactivation of the catalyst. The study of the ethane dehydrogenation mechanism and the funding of more effective catalysts will be helpful to confront these challenges. Using density functional theory and ab initio microkinetic model, Jalid et al. [77] systematically investigated the reaction mechanism of ethane dehydrogenation with transition metals (Pt, Pd, Co, Ni, Rh, Ru, Re, Cu, Au, and Ag) as catalysts and CO 2 as mild oxidant. Different surface types (111 and 211) were considered as factors affecting the reaction. The simulation results show that ethane is directly and mainly dehydrogenated to ethylene, and Rh and Pt are the most efficient catalysts compared to the other calculated transition metals.
In addition, coupling of thermodynamics and dynamics, which has been proven to be a more difficult challenge, was also neglected. In contrast to the ethane dehydrogenation reaction, the reaction mechanism of ethylene hydrogenation on the surface of δ-MoC(001) was investigated by Jimenez et al. [78]. A suitable structure was optimized by density functional theory and ab initio thermodynamics and kinetics to estimate the relationship between hydrogen surface coverage and activation energy barriers with the reaction rate on the δ-MoC(001) surface. In this study, the activation energy barrier of the δ-MoC(001) surface was found to be lower compared to the Pt (111) and Pd(111) catalyst surfaces. Figure 4 show a vivid picture representing the relationship between the hydrogen coverage and ethylene's hydrogenation.

Dehydrogenation and Hydrogenation
Ethylene is a necessary chemical raw material, and ethane dehydrogenation the essential methods to produce ethylene. Though there is comprehensive applic industry, there are still numerous challenges [76], such as deactivation of the catal study of the ethane dehydrogenation mechanism and the funding of more effect lysts will be helpful to confront these challenges. Using density functional theory initio microkinetic model, Jalid et al. [77] systematically investigated the reaction nism of ethane dehydrogenation with transition metals (Pt, Pd, Co, Ni, Rh, Ru, Au, and Ag) as catalysts and CO2 as mild oxidant. Different surface types (111 a were considered as factors affecting the reaction. The simulation results show tha is directly and mainly dehydrogenated to ethylene, and Rh and Pt are the most catalysts compared to the other calculated transition metals. In addition, coupling of thermodynamics and dynamics, which has been pr be a more difficult challenge, was also neglected. In contrast to the ethane dehyd tion reaction, the reaction mechanism of ethylene hydrogenation on the surfa MoC(001) was investigated by Jimenez et al. [78]. A suitable structure was optim density functional theory and ab initio thermodynamics and kinetics to estimate tionship between hydrogen surface coverage and activation energy barriers with action rate on the δ-MoC(001) surface. In this study, the activation energy barrier MoC(001) surface was found to be lower compared to the Pt (111) and Pd (111) surfaces. Figure 4 show a vivid picture representing the relationship between the gen coverage and ethylene's hydrogenation.  [78]. Copyright 2020, American Chemical Society.
As for the application of ReaxFF molecular dynamics in dehydrogenation, o dehydrogenation plays an important role. Chenoweth et al. [79] fitted the param the oxidative dehydrogenation of ReaxFF over vanadium oxide catalysts using a q mechanical approach. The structure and energies of several different vanadium such as V2O5, VO2, and V2O3, were well calculated by using the fitted parameter axFF. In addition, the oxidation process of methanol was simulated by using m dynamics simulations, and the results were in agreement with experiments, pro accuracy of the fitted parameters. The oxidation process of methane was studied the molecular dynamics method by Feng et al. [80] , and ReaxFF was chosen as t field to simulate the oxidative dehydrogenation process of methane. Several c were considered, e.g., functionalized graphene sheets (FGS), Pt, and Pt@FGS, Pt@FGS catalyst showed the best catalytic performance. The essence of catalytic o of methane is the breaking of C-H bonds and the formation of hydroxyl grou Pt@FGS catalyst increases the dehydrogenation rate of methane and drives the cycle that all conduced to the increase of the reaction rate. In addition, the hydroxy  [78]. Copyright 2020, American Chemical Society.
As for the application of ReaxFF molecular dynamics in dehydrogenation, oxidative dehydrogenation plays an important role. Chenoweth et al. [79] fitted the parameters for the oxidative dehydrogenation of ReaxFF over vanadium oxide catalysts using a quantum mechanical approach. The structure and energies of several different vanadium oxides, such as V 2 O 5 , VO 2, and V 2 O 3 , were well calculated by using the fitted parameters of ReaxFF. In addition, the oxidation process of methanol was simulated by using molecular dynamics simulations, and the results were in agreement with experiments, proving the accuracy of the fitted parameters. The oxidation process of methane was studied through the molecular dynamics method by Feng et al. [80] , and ReaxFF was chosen as the force field to simulate the oxidative dehydrogenation process of methane. Several catalysts were considered, e.g., functionalized graphene sheets (FGS), Pt, and Pt@FGS, and the Pt@FGS catalyst showed the best catalytic performance. The essence of catalytic oxidation of methane is the breaking of C-H bonds and the formation of hydroxyl groups. The Pt@FGS catalyst increases the dehydrogenation rate of methane and drives the catalytic cycle that all conduced to the increase of the reaction rate. In addition, the hydroxyl groups generated by oxidation further enhance the functionalization of FGS, leading to an enhanced reaction.

Oxidation Reaction
Oxidation reactions are common but essential chemical reactions. As a promising green technology, water and CO oxidation process was widely researched. The oxidation process of water over a cobalt oxide catalyst was investigated in atomic depth using AIMD by Mattioli et al. [81]. The simulation results were directly compared with the X-ray absorption spectroscopy results. An agreement of bond distance calculations and measurements was found. Both calculations and experiments further revealed the real structure of the cobalt oxide catalyst in the water oxidation reaction. They supported that the cobalt oxide catalyst promoted the presence of low resistance hydrogen bonds. Wang et al. [82] observed the CO oxidation reaction process at the atomic scale using AIMD. The catalytic reaction mechanism of the Au/TiO 2 interfacial oxidation reaction was further investigated. Due to the catalysis of Au/TiO 2 catalyst, the oxidation reaction of CO can occur in a wide temperature range from 120 K to 700 K. Additionally, faster reaction rates were observed at high temperatures compared to low temperatures. In addition, the surface charge of gold greatly influences the oxidation reaction process, and the charge cycle diagram is shown in Figure 5.

Oxidation Reaction
Oxidation reactions are common but essential chemical reactions. As a promising green technology, water and CO oxidation process was widely researched. The oxidation process of water over a cobalt oxide catalyst was investigated in atomic depth using AIMD by Mattioli et al. [81]. The simulation results were directly compared with the X-ray absorption spectroscopy results. An agreement of bond distance calculations and measurements was found. Both calculations and experiments further revealed the real structure of the cobalt oxide catalyst in the water oxidation reaction. They supported that the cobalt oxide catalyst promoted the presence of low resistance hydrogen bonds. Wang et al. [82] observed the CO oxidation reaction process at the atomic scale using AIMD. The catalytic reaction mechanism of the Au/TiO2 interfacial oxidation reaction was further investigated. Due to the catalysis of Au/TiO2 catalyst, the oxidation reaction of CO can occur in a wide temperature range from 120 K to 700 K. Additionally, faster reaction rates were observed at high temperatures compared to low temperatures. In addition, the surface charge of gold greatly influences the oxidation reaction process, and the charge cycle diagram is shown in Figure 5. In addition, there still exist many limitations in understanding the oxidation reaction process of complex organic matter, and some mechanisms are still unclear. Due to the lack of powerful tools for complicated systems, ReaxFF molecular dynamics are widely used to understand the oxidation process. Zhang et al. [83] employed ReaxFF molecular dynamics simulations to study ethanol oxidation and Al nanoparticles. The reaction temperature decreased to 324 K due to the existence of Al nanoparticles. More reaction pathways were found.
Most importantly, with the increase of reaction temperature, the Al nanoparticles converted from solid to a liquid state, which resulted in the more effective diffusion of H and O atoms in nanoparticles. Thus, it ordered a more active site and accelerated the reaction. The oxidation of methane on a palladium catalyst surface was comprehensively investigated by Mao et al. [84] ReaxFF molecular dynamics simulations were used to model bond breakage and formation. In addition to the bare surface, oxygen-covered surfaces were calculated, and different levels of oxygen coverage were considered. The reaction temperature was used as an indicator to evaluate the difficulty of the reaction. During the oxidation reaction of methane, oxygen is more likely to occupy the active site, while the oxygen covering the surface of the palladium catalyst hinders the dissociation and adsorption of oxygen. However, the oxygen-covered palladium catalyst has a more substantial effect on the oxidation reaction compared to the bare palladium catalyst, which is In addition, there still exist many limitations in understanding the oxidation reaction process of complex organic matter, and some mechanisms are still unclear. Due to the lack of powerful tools for complicated systems, ReaxFF molecular dynamics are widely used to understand the oxidation process. Zhang et al. [83] employed ReaxFF molecular dynamics simulations to study ethanol oxidation and Al nanoparticles. The reaction temperature decreased to 324 K due to the existence of Al nanoparticles. More reaction pathways were found.
Most importantly, with the increase of reaction temperature, the Al nanoparticles converted from solid to a liquid state, which resulted in the more effective diffusion of H and O atoms in nanoparticles. Thus, it ordered a more active site and accelerated the reaction. The oxidation of methane on a palladium catalyst surface was comprehensively investigated by Mao et al. [84] ReaxFF molecular dynamics simulations were used to model bond breakage and formation. In addition to the bare surface, oxygen-covered surfaces were calculated, and different levels of oxygen coverage were considered. The reaction temperature was used as an indicator to evaluate the difficulty of the reaction. During the oxidation reaction of methane, oxygen is more likely to occupy the active site, while the oxygen covering the surface of the palladium catalyst hinders the dissociation and adsorption of oxygen. However, the oxygen-covered palladium catalyst has a more substantial effect on the oxidation reaction compared to the bare palladium catalyst, which is supported by the lower reaction temperature. The optimization of catalysts is complicated due to the kinetic processes and numerous factors involved in catalytic reactions.

Segregation and Restructuring
Many studies about the reaction mechanism have combined static calculation and AIMD approaches. Gibbs energy differences and free energy barriers are calculated by static analysis, which can relate to experiments. However, molecular dynamics can only simulate some phenomena, like segregation, restructuring, and excitation. By using AIMD methods, Hoppe et al. [85] studied the segregation behavior of Ag atoms. The DFT calculations and AIMD simulations suggested that the silver atom is next to the chain and does not replace the gold atom.
Furthermore, a more intuitive dynamic process can be seen with the AIMD methods. Wittkamper et al. [43] studied the restructuring of the Rh-Ga model because of the oxidation behavior. In this research, the simulation results were consistent with the experiments. They all supported the claim that compared to the β-Ga 2 O 3 , Rh is less likely to stay at bulk Ga solution. Barnard et al. [86] systematically investigated the role of interstitials in radiationinduced segregation (RIS). Due to the low migration barrier, interstitial diffusion can be easily simulated by molecular dynamics. Considering the accuracy of calculations, the AIMD approach was preferable. In this study, a Weidersich-type rate theory was modeled in Ni-18Cr alloy. Using the AIMD method, the prediction that interstitial diffusion may result in the enrichment of Cr near the grain boundary was certified. In addition, despite the errors in the lower temperature simulations, AIMD still can be a great method to study the RIS.

Discussion
The successful use of molecular dynamics cannot cover up the shortcoming of AIMD and ReaxFF molecular dynamics. For ReaxFF, many challenges still need to be confronted, including the charge description [87], parameter optimization [88], and the complexity of bond order [41]. For example, bond order is essential for ReaxFF molecular dynamics, but for condensed systems, the description of bond order becomes more complicated. In addition, ReaxFF molecular dynamics are empirical methods based on assumptions and a preset formula. The parameters are acquired from the DFT calculations and experimental data. However, even using the parameters with enough optimization, the limitations of parameters and preset formula error still exist. Thus, a method whose calculation accuracy is close to the DFT calculations and whose computing speed approaches the ReaxFF molecular dynamics is imperative. Machine learning produces new opportunities and challenges, especially the development of machine learning potentials, which provide a new direction to solve the above problems. This topic will be detailed discussed in the next part.

Machine Learning in Catalysts
More efficient catalyst performance and the discovery of new catalysts are the goals pursued by chemists [89][90][91]. However, the optimization and search for catalysts are complicated because the factors affecting catalyst performance are diverse, and sometimes, a subtle structural change can cause a dramatic shift in catalyst performance [92]. Unfortunately, the catalytic mechanisms of some reactions still lack understanding and need to be further explored. In addition, traditional catalyst optimization and search require keen scientific intuition and extensive experience. This poses a considerable challenge to scientists. However, ML has facilitated new approaches to address these issues.
Traditional problem-solving methods are based on deduction and inference, but ML methods are based on generalization and summarization [93]. With the development of big data science, ML has been extended to numerous fields, such as aiding medicinal chemical discovery [94][95][96] and material discovery [97][98][99]. Additionally, this approach can be used for catalyst discovery and optimization.
Machine learning is a broad concept that includes many methods, such as artificial neural networks [100], support vector machines [101], linear regression [102], and kernel methods [103]. The methods used in catalyst discovery and optimization are not uniform, and sometimes different methods are used simultaneously. However, the most critical issue in catalyst discovery and optimization is the choice of descriptors, which determines the model's accuracy. The importance of descriptor derives from the catalyst performance being sensitive to the change of structure and energy. Even an energy difference of 1 kcal/mol can change the choice of catalyst [92]. Another reason is the prediction and extrapolation of the results, although the correctness of the extrapolated results is not strictly proven. Predicting and extrapolating results is still an important part of the ML approach. An accurate descriptor helps to make reasonable predictions. Therefore, descriptors should be carefully chosen when using ML methods. In this subsection, we briefly introduce three different forms, including neural networks, regression, and random forests, and machine learning potentials, which represent a critical development, are reported.

Introduction of Methods
Convolutional neural networks (CNNs) have become a widely used method in image recognition because of their powerful feature-capturing ability. In recent years, CNN methods have been applied in the area of catalysts. Xie et al. [104] promoted CNN methods in catalyst performance prediction. The main component of this CNN method is to transform the catalyst structure into catalyst graphs. Both atomic and binding energy information is considered, and the workflow is shown in Figure 6. A significant alteration in this study is the data input layer that transforms the entire structure into a planar graph. In Figure 6, each point represents a different atom, and the connections between the different points consider the environment of a particular atom. Only one convolutional layer; one pooling layer; and, finally, two fully connected layers are used. The role of the convolutional and pooling layers is to capture each atom's feature by nonlinear convolution function and further generate the feature of the crystal, respectively. The final two fully connected layers and output layer are used to predict the target properties. The optimized equation can be expressed as: min where y is the predictive value and f is the function that represents the target property. This model can achieve a computational accuracy close to that of density generalization theory methods with sufficient data training. In addition, this CNN method has been used to classify the types of materials in this study, and the highest accuracy of about 0.95 can reach the identification of 9350 catalysts. Additionally, based on the CNN method, Back et al. [105] modified the technique proposed by Xie [104] for improving the accuracy of predicting the absorption energy. It was demonstrated that the CNN method could be used to predict surface coverage and site activity, which can be helpful for catalyst design.
The random forest method is another commonly used ML approach. The decision tree method is briefly introduced before the random forest method because the former is the latter's foundation. Generally, a decision tree contains finite numbers, nonempty nodes, and a set of edges. By a series of the child's decisions, the original data set can be divided into numerous data sets of different attributes. Information gain is usually used to separate the feature, and its equation can be expressed as: (4) where D is the information entropy. Can et al. [106] used a decision tree to study the factors leading to high hydrogen production, which may have given a simple example. The random forest method carefully considers several different classifications that come from the decision tree method.

Gain(D,a)=Ent(D) -
Regression methods generally can be split into the following types: linear regression, which includes ridge regression [107]; selection operator regression [108]; nonlinear regression, such as kernel ridge regression [109]; support vector regression [110], etc. The most straightforward linear regression is the linear combination of variables.
where x is the input variable and w is a linear parameter. This equation also can be extended to combine with the nonlinear function.
where φ is the nonlinear function; in addition, the parameters of this equation are derived from the error function minimizing. The random forest method is another commonly used ML approach. The decision tree method is briefly introduced before the random forest method because the former is the latter's foundation. Generally, a decision tree contains finite numbers, nonempty nodes, and a set of edges. By a series of the child's decisions, the original data set can be divided into numerous data sets of different attributes. Information gain is usually used to separate the feature, and its equation can be expressed as: where D is the information entropy. Can et al. [106] used a decision tree to study the factors leading to high hydrogen production, which may have given a simple example. The random forest method carefully considers several different classifications that come from the decision tree method. Regression methods generally can be split into the following types: linear regression, which includes ridge regression [107]; selection operator regression [108]; nonlinear regression, such as kernel ridge regression [109]; support vector regression [110], etc. The most straightforward linear regression is the linear combination of variables.
where x is the input variable and w is a linear parameter. This equation also can be extended to combine with the nonlinear function.
where ϕ is the nonlinear function; in addition, the parameters of this equation are derived from the error function minimizing.
where E is the error value and t n is the target value. The example of using linear regression in catalysis can be seen in Werth et al. [111] study.

Machine Learning Potentials
Machine learning potential is one of the most critical calculations advances in recent years and has been intensively studied and applied in catalysis [112][113][114][115][116][117][118][119][120][121][122][123][124][125][126][127]. The machine learning potential is a method that uses the machine learning algorithm to find the underneath relationship of the atomic configuration and energy [128]. It is different from the empirical interatomic potentials, which are based on the presupposed mathematic formula. Hence, the error of the assumptions that correspond to mathematical expressions and parameter optimization can be significantly avoided. The simulation accuracy increases compared to the empirical interatomic potentials. As an example, the calculation process of machine learning potentials [129] is shown in Figure 7. Firstly, a series of configurations are acquired from the AIMD approach or other methods. Then, a sufficient number of configurations are chosen to calculate the energy, force, and other critical physical quantities by using the DFT method. Next, the atomic structure is converted to descriptors as the input of the machine learning model, and the calculated energy and force are designated as target quantity. Finally, the model is trained, and the machine learning potential is achieved. Several machine learning models are chosen to acquire the machine learning potentials, such as neural networks [130,131] and gaussian process regression [132][133][134][135][136][137][138][139]. Ulissi et al. [140] studied the active sites of bimetallic catalysts. By using the DFT calculations, hundreds of possible active sites were found, and neural network potentials were used to accelerate the calculation process. Nickel gallium bimetallic was calculated as the example.
where E is the error value and tn is the target value. The example of using linear regression in catalysis can be seen in Werth et al. [111] study.

Machine Learning Potentials
Machine learning potential is one of the most critical calculations advances in recent years and has been intensively studied and applied in catalysis [112][113][114][115][116][117][118][119][120][121][122][123][124][125][126][127]. The machine learning potential is a method that uses the machine learning algorithm to find the underneath relationship of the atomic configuration and energy [128]. It is different from the empirical interatomic potentials, which are based on the presupposed mathematic formula. Hence, the error of the assumptions that correspond to mathematical expressions and parameter optimization can be significantly avoided. The simulation accuracy increases compared to the empirical interatomic potentials. As an example, the calculation process of machine learning potentials [129] is shown in Figure 7. Firstly, a series of configurations are acquired from the AIMD approach or other methods. Then, a sufficient number of configurations are chosen to calculate the energy, force, and other critical physical quantities by using the DFT method. Next, the atomic structure is converted to descriptors as the input of the machine learning model, and the calculated energy and force are designated as target quantity. Finally, the model is trained, and the machine learning potential is achieved. Several machine learning models are chosen to acquire the machine learning potentials, such as neural networks [130,131] and gaussian process regression [132][133][134][135][136][137][138][139]. Ulissi et al. [140] studied the active sites of bimetallic catalysts. By using the DFT calculations, hundreds of possible active sites were found, and neural network potentials were used to accelerate the calculation process. Nickel gallium bimetallic was calculated as the example.

The Development of Descriptors
Discovering the simple standard features that influence the properties in a small group of materials as descriptors is a valuable approach in properties prediction, such as catalytic activity and materials finding [141][142][143][144][145][146][147][148][149]. Several descriptors can be used, e.g., interatomic distance, nearest neighbor coordination number (CN), surface strain, the number of facets, and p-band center [150]. However, the accuracy of these simple descriptors is challenging to verify experimentally because catalysts have complex structures that change dynamically during the reaction. Timoshenko et al. [151] proposed an ML method that directly processes data from X-ray absorption spectroscopy (XAS) containing structural and electronic information. This information is directly used to obtain specific features of some simple descriptors, such as the charge states and radial distribution function. Both the supervised ML and unsupervised ML methods that are shown in Figure 8 were used to reveal the relationships hidden in the XAS data. Sinthika et al. [151] proposed a special descriptor π electronic structure for nitrogen-, boron-, and co-doped graphene. In this article, several descriptors were summarized and illustrated in Table 1, such as surface energy [152,153], vacancy formation energy [154], occupancy [155,156], and d-band center [157]. Takahashi et al. [158] used the random forest method to search for new catalysts for methane oxidation coupling (OCM) reactions. In order to overcome the difficulty of the uncertainty in terms of methane activation, three key factors that were discovered from 1868 OCM catalyst data were first summarized as the descriptors that could determine C2 yields. By using the discoverable descriptors and the random forest method, new catalysts that could improve C2 yield were found.
Graph neural networks (GNN) have given a new direction to acquire the descriptors [159], which is different from the traditional approaches obtained from functions. Simple GNN [160] contains a series of nodes, edges, node attributes, and edge attributes. Atomic structures are only needed and as input in the GNN approach. Nodes can represent the atoms, and the neighbor information of the specified atom is encoded by the edges. The conversion and parameters optimization will be operated by using the graph. This approach has gained more and more attention and is widely used in materials founding and properties prediction [161][162][163][164].

The Development of Descriptors
Discovering the simple standard features that influence the proper group of materials as descriptors is a valuable approach in properties pred catalytic activity and materials finding [141][142][143][144][145][146][147][148][149]. Several descriptors can interatomic distance, nearest neighbor coordination number (CN), surf number of facets, and p-band center [150]. However, the accuracy of th scriptors is challenging to verify experimentally because catalysts have tures that change dynamically during the reaction. Timoshenko et al. [151 ML method that directly processes data from X-ray absorption spectrosco taining structural and electronic information. This information is directly specific features of some simple descriptors, such as the charge states and tion function. Both the supervised ML and unsupervised ML methods tha Figure 8 were used to reveal the relationships hidden in the XAS data. S proposed a special descriptor π electronic structure for nitrogen-, boron-, graphene. In this article, several descriptors were summarized and illustra such as surface energy [152,153], vacancy formation energy [154], occup and d-band center [157]. Takahashi et al. [158] used the random forest me for new catalysts for methane oxidation coupling (OCM) reactions. In ord the difficulty of the uncertainty in terms of methane activation, three key fa discovered from 1868 OCM catalyst data were first summarized as the d could determine C2 yields. By using the discoverable descriptors and the method, new catalysts that could improve C2 yield were found. Graph neural networks (GNN) have given a new direction to acquire [159], which is different from the traditional approaches obtained from fun GNN [160] contains a series of nodes, edges, node attributes, and edge attr structures are only needed and as input in the GNN approach. Nodes can atoms, and the neighbor information of the specified atom is encoded by conversion and parameters optimization will be operated by using the g proach has gained more and more attention and is widely used in material

Discussion
Machine learning approaches have been extensively and successfully applied in numerous fields, including new materials finding, materials properties prediction, and calculation acceleration. Additionally, some helpful machine learning community projects, like DeepChem [165,166] and OpenCatalysts, are proposed to help use the machine learning method for materials and chemistry [165][166][167][168][169][170][171], whereas it deficiencies still exist that restrict the development of machine learning approaches and need to be overcome. (1) Descriptors have been sufficiently introduced, and kinds of descriptors were listed in above. However, there were all static descriptors based on the fixed function with no optimizable parameters. Graph neural networks (GNN) have given a new direction to acquire the descriptors [159], which is different from the traditional approaches obtained from functions. (2) As for the machine learning potentials, with the increasing training data, the training time and accuracy are still problems. In addition, the development of universal machine learning potentials faces enormous challenges. The existing machine learning potential is obtained from specific problems by fitting the calculated data.

Conclusions and Outlook
The focus of this review is on molecular dynamics calculations of catalysts. Different types of molecular dynamics are outlined, including AIMD and ReaxFF molecular dynamics. The development of both methods in applications including growth, dehydrogenation, hydrogenation, oxidation reactions, bias and recombination of carbon materials is discussed. Although both AIMD methods and ReaxFF molecular dynamics simulations have been successfully applied in mechanistic studies of different catalytic interactions, some limitations remain, such as the expensive cost of AIMD and its limitations in complex systems, as well as the parameter optimization and charge description problems of ReaxFF. In recent years, ML methods have been widely applied in various fields. An overview of the application of ML methods in catalysis, which can address the above limitations, is given. Several different ML algorithms, such as neural networks, random forests, and regression, are briefly described. Their applications in new catalyst search and performance prediction are reported. Most importantly, the potential of one of the most significant advances, machine learning, is presented. With accuracy close to that of DFT calculations, but with lower computational cost, machine learning potential has become one of the most promising directions in analysis. In addition, the challenges of applying machine learning methods, especially the limitations of descriptors, are discussed. Finally, GNN, a viable solution, is discussed.