A Review on Artiﬁcial Intelligence Enabled Design, Synthesis, and Process Optimization of Chemical Products for Industry 4.0

: With the development of Industry 4.0, artiﬁcial intelligence (AI) is gaining increasing attention for its performance in solving particularly complex problems in industrial chemistry and chemical engineering. Therefore, this review provides an overview of the application of AI techniques, in particular machine learning, in chemical design, synthesis, and process optimization over the past years. In this review, the focus is on the application of AI for structure-function relationship analysis, synthetic route planning, and automated synthesis. Finally, we discuss the challenges and future of AI in making chemical products.


Introduction
Chemists have spent substantial time on repetitive experimental tasks, such as the synthesis of organic compounds, optimization of process parameters, and molecular structure identification. To some extent, these tedious tasks limit the creativity of chemists. As green chemistry continues to evolve, the chemical industry has been working to discover new chemical reactions, catalysts, and equipment to reduce the use of hazardous substances and prepare high-value-added chemicals through sustainable production processes. However, such discoveries are expensive and time-consuming for pure human labor [1].
In the past decade, a growing body of literature and patents attest to AI-driven chemical engineering studies. Baum et al. studied the growth and distribution of artificial intelligence in relevant chemical publications over the past two decades using the CAS Content collection (Figure 1) [2]. As shown in Figure 1, the number of published papers and patents containing AI has increased dramatically since 2015, with the second-highest increase in published papers in industrial chemistry and chemical engineering. Meanwhile, data obtained from SciFinder show that the total number of annual publications on machine learning in the chemical industry exceeded 20,000 in 2021-2022. AI involves several methodological domains, such as reasoning, knowledge representation, solution search, and the basic paradigm of machine learning (ML) among them. In the last few years, especially since the introduction of AlphaGo, ML has been greatly developed in the field of industrial chemistry and chemical engineering, thus greatly helping the development of pharmaceuticals and fine chemicals, thus reducing time and cost [3][4][5]. So far, much of the literature has summarized the application of machine learning algorithms in the chemical industry ( Figure 2) [6]. As shown in Figure 2, supervised learning methods are the most used in the chemical industry, accounting for nearly 70% of the total, while hybrid, unsupervised learning, and combinatorial methods are used much less than supervised learning. Almost all of these machine learning methods are used for data mining and analytics in the chemical industry. The only exception is reinforcement learning, whose applications are currently limited to robotics, gaming, and navigation. Figure 3 depicts in more detail the types of problems solved primarily using supervised methods, namely modeling, optimization, control and monitoring, design and discovery, support to sensorial analysis, and reaction prediction. As for unsupervised methods, they are mostly used for dimensionality reduction, data visualization, and information extraction. Additionally, a subfield of ML is deep learning (DL), which engages deep neural networks (DNNs). DNN constitutes a set of nodes, each of which receives individual inputs and eventually converts them to outputs, either singly or in multiple sessions using algorithms to solve problems. In quantitative structure-activity relationship (QSAR) modeling, deep learning models have achieved state-of-the-art results in molecular property prediction as well as property uncertainty quantification. It is worth noting that the MLbased molecular design approach is different from the mathematical optimization-based approach. Mathematical optimization-based approaches require large amounts of experimental process data, such as reaction rates, which are difficult to obtain with a single form of benchmark. In contrast, machine learning models performing molecular design tasks require only structural information of molecules or simple molecular property information, which are more readily available and accurate than experimental processes. In addition, machine Learning for molecular design has more trainable parameters than mathematical optimization. In general, the more trainable parameters, the higher the accuracy of the trained model given a sufficient amount of data. With the development of Industry 4.0, the successful application of artificial intelligence in areas such as image recognition and text processing has also facilitated its use in drug discovery, including the design optimization of small molecule drugs [7][8][9]. Key to the development of computer- AI involves several methodological domains, such as reasoning, knowledge representation, solution search, and the basic paradigm of machine learning (ML) among them. In the last few years, especially since the introduction of AlphaGo, ML has been greatly developed in the field of industrial chemistry and chemical engineering, thus greatly helping the development of pharmaceuticals and fine chemicals, thus reducing time and cost [3][4][5]. So far, much of the literature has summarized the application of machine learning algorithms in the chemical industry ( Figure 2) [6]. As shown in Figure 2, supervised learning methods are the most used in the chemical industry, accounting for nearly 70% of the total, while hybrid, unsupervised learning, and combinatorial methods are used much less than supervised learning. Almost all of these machine learning methods are used for data mining and analytics in the chemical industry. The only exception is reinforcement learning, whose applications are currently limited to robotics, gaming, and navigation. Figure 3 depicts in more detail the types of problems solved primarily using supervised methods, namely modeling, optimization, control and monitoring, design and discovery, support to sensorial analysis, and reaction prediction. As for unsupervised methods, they are mostly used for dimensionality reduction, data visualization, and information extraction. Additionally, a subfield of ML is deep learning (DL), which engages deep neural networks (DNNs). DNN constitutes a set of nodes, each of which receives individual inputs and eventually converts them to outputs, either singly or in multiple sessions using algorithms to solve problems. In quantitative structure-activity relationship (QSAR) modeling, deep learning models have achieved state-of-the-art results in molecular property prediction as well as property uncertainty quantification. It is worth noting that the ML-based molecular design approach is different from the mathematical optimization-based approach. Mathematical optimizationbased approaches require large amounts of experimental process data, such as reaction rates, which are difficult to obtain with a single form of benchmark. In contrast, machine learning models performing molecular design tasks require only structural information of molecules or simple molecular property information, which are more readily available and accurate than experimental processes. In addition, machine Learning for molecular design has more trainable parameters than mathematical optimization. In general, the more trainable parameters, the higher the accuracy of the trained model given a sufficient amount of data. With the development of Industry 4.0, the successful application of artificial intelligence in areas such as image recognition and text processing has also facilitated its use in drug discovery, including the design optimization of small molecule drugs [7][8][9]. Key to the development of computer-aided chemistry is the availability of large reaction datasets and high-performance computing, for example, in molecular design, retrosynthetic planning, reaction prediction, and optimization of reaction conditions [10][11][12][13][14][15][16][17][18][19].
aided chemistry is the availability of large reaction datasets and high-performance computing, for example, in molecular design, retrosynthetic planning, reaction prediction, and optimization of reaction conditions [10][11][12][13][14][15][16][17][18][19].  This paper reviews the applications of AI in various areas of the chemical industry. First, AI can be used for molecular structure-function relationship analysis. Moreover, applications of AI to chemical reactions include retrosynthetic planning, condition recommendation, and forward reaction prediction. In addition, AI allows the automation of compound synthesis and reduces the repetitive work of laboratory staff. aided chemistry is the availability of large reaction datasets and high-performance computing, for example, in molecular design, retrosynthetic planning, reaction prediction, and optimization of reaction conditions [10][11][12][13][14][15][16][17][18][19].  This paper reviews the applications of AI in various areas of the chemical industry. First, AI can be used for molecular structure-function relationship analysis. Moreover, applications of AI to chemical reactions include retrosynthetic planning, condition recommendation, and forward reaction prediction. In addition, AI allows the automation of compound synthesis and reduces the repetitive work of laboratory staff. This paper reviews the applications of AI in various areas of the chemical industry. First, AI can be used for molecular structure-function relationship analysis. Moreover, applications of AI to chemical reactions include retrosynthetic planning, condition recommendation, and forward reaction prediction. In addition, AI allows the automation of compound synthesis and reduces the repetitive work of laboratory staff.

. Molecular Property Prediction
Molecular property prediction is an important problem in computer-aided molecular design, and excellent deep-learning models for molecular property prediction can greatly accelerate the progress of experimental studies. Two main types of models are prominent in molecular property prediction-graphical neural networks and sequence-based neural networks, which differ in their representation of different molecules, with the former

Molecular Property Prediction
Molecular property prediction is an important problem in computer-aided molecula design, and excellent deep-learning models for molecular property prediction can greatl accelerate the progress of experimental studies. Two main types of models are prominen in molecular property prediction-graphical neural networks and sequence-based neura networks, which differ in their representation of different molecules, with the former re quiring molecular graphical information and the latter requiring string representations o molecular structures (Figure 4) [20]. The direct use of matrices to record molecular structure information is a widely used method of molecular representation known as molecular graphs. Molecular graphs ca be trained using graph neural networks. Lu et al. reported the prediction of molecula properties using multilevel Graph Convolutional Neural Networks (MGCN). Differen layers of convolutional layers learn the atomic feature information and chemical bond feature information of the molecule and then process the information to predict the mo lecular properties [21]. In QM9, the MGCN model gains a mean absolute error (MAE) o 0.0642 eV in the HOMO-LUMO gap. The model has excellent predictive performance wit generalization capability. Gilmer et al. used a Message Passing Neural Network (MPNN to predict the QM9 public data set and obtained better performance than any previou model [22,23]. The ratio of the MAE of the MPNN models to the provided chemical accu racy estimate was reported, with a HOMO-LUMO gap of 1.60 eV in QM9. In the frame work of the MPNN model, the design of appropriate functions can effectively improv the prediction effect. The directed-MPNN model was used by Yang et al. for the extractio of molecular graph features and predicting the properties of molecules, and the mode was tested on 19 public datasets and 16 industry datasets, and the model performanc was better than previous models on most tasks [24]. Compared to other papers, the pape gives an MAE of 2.766 ± 0.022 for multi-task prediction of the QM9 database and provide more comparison of model performance. The recording of molecules using strings is another mainstream molecular represen tation method, of which the most widely used is SMILES [25]. Deep learning models fo natural language processing are well suited to process these sequences, which record mo lecular information. There is no more effective model for string processing in recent year than the Transformer [26]. Honda   The direct use of matrices to record molecular structure information is a widely used method of molecular representation known as molecular graphs. Molecular graphs can be trained using graph neural networks. Lu et al. reported the prediction of molecular properties using multilevel Graph Convolutional Neural Networks (MGCN). Different layers of convolutional layers learn the atomic feature information and chemical bond feature information of the molecule and then process the information to predict the molecular properties [21]. In QM9, the MGCN model gains a mean absolute error (MAE) of 0.0642 eV in the HOMO-LUMO gap. The model has excellent predictive performance with generalization capability. Gilmer et al. used a Message Passing Neural Network (MPNN) to predict the QM9 public data set and obtained better performance than any previous model [22,23]. The ratio of the MAE of the MPNN models to the provided chemical accuracy estimate was reported, with a HOMO-LUMO gap of 1.60 eV in QM9. In the framework of the MPNN model, the design of appropriate functions can effectively improve the prediction effect. The directed-MPNN model was used by Yang et al. for the extraction of molecular graph features and predicting the properties of molecules, and the model was tested on 19 public datasets and 16 industry datasets, and the model performance was better than previous models on most tasks [24]. Compared to other papers, the paper gives an MAE of 2.766 ± 0.022 for multi-task prediction of the QM9 database and provides more comparison of model performance.
The recording of molecules using strings is another mainstream molecular representation method, of which the most widely used is SMILES [25]. Deep learning models for natural language processing are well suited to process these sequences, which record molecular information. There is no more effective model for string processing in recent years than the Transformer [26]. Honda et al. reported the use of the Transformer for the prediction of molecular properties in 2019 [27]. Schwaller et al., on the other hand, applied the Transformer model to the prediction of reaction yields [28]. Chithrananda et al. then built several pre-trained models for chemical molecules using the BERT model, which allowed for a significant reduction in training time for later Transformer-based models [29]. Su et al. used these pre-trained models for a transfer learning study to predict the energy gap of metalloporphyrin, spending only one-third of the training time that would have been spent if transfer learning had not been used [30]. Jo et al., on the other hand, used MPNN for processing SMILES information, and the model obtained better results when performing classification tasks on multiple datasets [31]. The molecular graph-based models and sequence-based models, though both perform well in molecular property pre-Processes 2023, 11, 330 5 of 21 diction tasks, have their own advantages. The molecular structure information recorded in molecular graphs is significantly richer than that of sequence methods, and the prediction of molecular properties will be more accurate. The use of sequences to record molecular information has high freedom and can reduce the training cost more easily using transfer learning methods. The two families of models should be selected according to the research content in the next study, or multimodal models can be used to combine their advantages.

Molecular Design
Computer-aided molecular design is another important research direction in cheminformatics, and the design of suitable molecules according to requirements has been a dream function for chemists [32]. Similar to molecular property prediction, both graph generation models and text generation models in the field of deep learning can be used for the molecular design ( Figure 5).
Su et al. used these pre-trained models for a transfer learning study to predict the energy gap of metalloporphyrin, spending only one-third of the training time that would have been spent if transfer learning had not been used [30]. Jo et al., on the other hand, used MPNN for processing SMILES information, and the model obtained better results when performing classification tasks on multiple datasets [31]. The molecular graph-based models and sequence-based models, though both perform well in molecular property prediction tasks, have their own advantages. The molecular structure information recorded in molecular graphs is significantly richer than that of sequence methods, and the prediction of molecular properties will be more accurate. The use of sequences to record molecular information has high freedom and can reduce the training cost more easily using transfer learning methods. The two families of models should be selected according to the research content in the next study, or multimodal models can be used to combine their advantages.

Molecular Design
Computer-aided molecular design is another important research direction in cheminformatics, and the design of suitable molecules according to requirements has been a dream function for chemists [32]. Similar to molecular property prediction, both graph generation models and text generation models in the field of deep learning can be used for the molecular design ( Figure 5). In 2018, Gómez-Bombarelli et al. reported the design of new molecules using Variational Auto-Encoder (VAE), a study that will perform molecule generation while mapping the encoded potential chemical space to the corresponding molecular properties, allowing the model to explore the chemical space more efficiently and purposefully [33]. Segler et al., on the other hand, applied recurrent neural networks based on Long Short-Term Memory (LSTM) for ab initio drug design [34]. In this model, transfer learning and reinforcement learning are introduced to improve the validity of the designed new molecules. In the same year, Cao et al. applied Generative Adversarial Network (GAN) to chemical molecule generation, and reinforcement learning also was introduced in the model to In 2018, Gómez-Bombarelli et al. reported the design of new molecules using Variational Auto-Encoder (VAE), a study that will perform molecule generation while mapping the encoded potential chemical space to the corresponding molecular properties, allowing the model to explore the chemical space more efficiently and purposefully [33]. Segler et al., on the other hand, applied recurrent neural networks based on Long Short-Term Memory (LSTM) for ab initio drug design [34]. In this model, transfer learning and reinforcement learning are introduced to improve the validity of the designed new molecules. In the same year, Cao et al. applied Generative Adversarial Network (GAN) to chemical molecule generation, and reinforcement learning also was introduced in the model to score the generated molecules in order to be able to generate molecules that meet the desired target [35]. Flam-Shepherd et al. added MPNN to the decoder and encoder of the VAE model, which greatly improved performance of the VAE model [36].
The two most difficult problems to overcome in computer-aided molecular design are the generation of legitimate chemical molecules and the generation of molecules with target properties or target characteristics, in other words, distribution learning for molecular design and goal-directed molecular optimization [32]. Comparing the performance of molecular design models is not a trivial task. Brown et al. 2019 proposed the GuacaMol platform, which gives different evaluation criteria for the two task models [37]. From current approaches, the use of transfer learning in a separate generative model can improve the chance of generating valid molecules. On the other hand, the development of novel molecular representation methods with greater robustness, such as SELFIES, can also be effective for the task of distribution-learning of molecular design [38]. In addition, in goal-directed molecular optimization with targets, when the design targets can be quickly computed by computer (e.g., LogP, TPSA, etc.), reinforcement learning can help the model to find the target molecules faster. Furthermore, when the desired property cannot be obtained by simple computation, the potential chemical space in the model can be mapped to the corresponding property before the molecule is designed.
For the design of new molecules, one of the important application areas of AI is interpretable machine learning [39]. For example, Verkhivker et al. developed and implemented interpretable machine learning models for the molecular design of Tyrosine Kinase Inhibitors by combining ChemVAE embedding architecture and cluster decomposition [40]. Recently, a computer-aided molecular design (CAMD) framework for molecular design has been reported. Hatamleh et al. developed a CAMD framework for mosquito repellents to mitigate the drawbacks of currently used repellents [41]. In this framework, a data-driven Hyperbox-based machine learning approach was used to predict the mosquito rejection properties of molecules in the absence of a mechanistic prediction model. Ooi et al. proposed a CAMD-based approach to design fragrance molecules and used a Hyperbox classifier to predict fragrance properties [42]. The resulting model can be interpreted as a parsing decision support rule that establishes a quantitative relationship between the structural parameters of a molecule and its odor characteristics. In addition, a novel data-driven rough set-based machine learning (RSML) model was used as a predictive or diagnostic modeling tool for odor properties to design fragrance molecules [43]. The RSML generates deterministic rules based on the relationship between the topology of fragrant molecules and the odor characteristics from existing odor databases. The generated rules are then integrated into CAMD problems as constraints. The results show that the new method is capable of identifying non-intuitive and promising fragrant molecules that can be used for various applications.
Moreover, in addition to molecular design, several fields are beginning to take advantage of the integration of ML and systems biology, including pathways identification and analysis, modeling of metabolisms and growth, and 3D protein modeling [44]. For example, AI is being used for the dynamic modeling of signaling networks, which helps to understand cellular pathways and facilitate drug discovery. It allows cataloging the changes in gene expression and signaling that occur when cells are exposed to various perturbations, building a network-based understanding of biology [45][46][47]. For example, in metabolic engineering, ML models, including naive Bayes, decision trees, and logistic regression trained on the pathway information of many organisms, were used in MetaCyc to predict the presence of a novel metabolic pathway in a newly-sequenced organism [44]. In general, the ML models used for pathway prediction showed better performance than standard mathematical and statistical methods. Nevertheless, pathway discovery still relies heavily on traditional approaches such as gene sequence similarity and network analysis. Therefore, better ML algorithms/methods for improving Dynamic and Constraint-based Metabolic Modelling, such as FBA modeling, are needed [44].

AI for Synthetic Route Planning
AI has been successful in planning synthetic routes performed in the laboratory or evaluated by chemists, including (1) retrosynthetic planning, (2) forward reaction prediction, and (3) condition recommendation. In chemistry, the origins of Computer-assisted synthesis planning (CASP) can be traced back to the translation of retrosynthetic logic into computer code by Corey in the 1960s [48]. Nevertheless, early synthetic route planning relied entirely on the expertise of chemists and did not use statistical learning based on large amounts of data [49][50][51]. Given the limitation of computational resources, complex algorithms cannot be widely used in synthetic planning. Fortunately, with the growing availability of molecular property datasets, reaction datasets, and increased computational Processes 2023, 11, 330 7 of 21 power, AI for synthetic planning is once again gaining widespread attention [52][53][54][55][56]. In the last 20 years, patterns of reactivity inferred from published response data by AI have become viable alternatives to algorithms based on "expert" rules. It can automate the extraction and training of data, making it easily scalable to merge new responses, which eases the burden on scientists. Today, the retrosynthesis of complex molecules, high-fidelity prediction of reaction outcomes, and automation of chemical reactions are still major research fields.

Retrosynthetic Planning
Rule-based and rule-free methods are the main approaches used for retrosynthesis. The rule-based method is conceptually similar to the process by which an organic chemist selects a known reaction type to apply to a specific synthetic target. It has been well implemented in state-of-the-art detailed synthetic planners, but building an expert-encoded rule is laborious and inherently dependent on the expertise of scientists [57]. Consequently, the automatic generation of reaction rules from accessible reaction databases has attracted the attention of scientists [58]. The reaction rules are generated automatically by extracting reaction templates from the reactions in the database, clustering, and processing them with additional molecules [59,60]. Other methods apply the templates directly to the target where filters, such as similarity-based neural networks, are often used to apply only a chemically relevant subset of the template library to reduce the required computational power [61][62][63][64][65][66][67]. Although rule-based approaches are common in the most advanced detailed synthetic planners, the main drawback is the huge computational cost involved in extracting a library of rules or templates. Moreover, the complexity of assessing between all existing rules and new rules increases as the number of codified rules increases, which may ultimately make the problem intractable. In contrast, the rule-free method maps the target compounds directly to potential starting materials, bypassing the need to build a library of reaction rules. It represents molecules as text, such as SMILES strings, thus making the prediction a natural language processing problem [68]. With different types of neural machine translation architectures, forward reaction or retrosynthetic prediction can be achieved. The Molecular Transformer architecture is currently the most popular method of treating chemistry as a language, capable of producing valid SMILE strings more accurately [69,70]. Compared to rule-based methods, rule-free methods are more general and have lower associated computational costs.
Inspired by the use of Molecular Transformers for forward reaction prediction, some retrosynthetic models based on the same architecture have attracted a lot of attention [15,71,72]. Zheng et al. developed a template-free self-correcting retrosynthesis predictor (SCROP) that uses a transformer neural network to predict retrosynthesis [14]. By converting retrosynthesis planning into a molecular linear symbolic problem for machine translation, the method achieves an accuracy of 59.0% on a standard benchmark dataset utilizing a grammar corrector for neural networks. Wang et al. proposed a single-step template-free and Transformer-based method called RetroPrime, which aims to address the issues that the output of the Transformer-based retrosynthesis model tends to suffer from insufficient diversity and high chemical implausibility [73]. What's more, Tetko et al. investigated the impact of a text-like representation of chemical reactions (SMILES) and the natural language processing (NLP) neural network Transformer architecture on predicting retrosynthetic reactions [74]. Lin et al. used the Transformer architecture to treat each reaction prediction task as a data-driven sequence-to-sequence problem, achieving superior performance for single-step inverse synthesis tasks ( Figure 6) [70]. The top-1 accuracy of the retrosynthesis methods discussed above ranged from 41-54% [75]. Even though the increased batch size and training time of the Transformer model by Duan et al. achieved a top-1 accuracy of 54.1% on the 50 k USPTO dataset [76]. In contrast, the RetroTRAE developed by Cernak et al. is free of all SMILES-based translation problems, yielding a top-1 accuracy of 58.3% on the USPTO test dataset [75]. Although the top-1 accuracy is gained using the proprietary training and test sets, it is questionable how models with specific sets of the same chemical transformations can be used in specific processes. Recently, graphenhanced transformer and hybrid models were reported, achieving 44.9% top-1 accuracy and more diverse reactant suggestions, respectively, but without substantial improvements over previous work [77,78]. Notably, except for the work of Lin et al., all transformerbased retrosynthesis methods are limited to a single step [69]. Additionally, reagent, catalyst, and solvent conditions were not simultaneously predicted in the retrosynthesis planning. The Molecular Transformer model introduced by Schwaller et al. incorporates a hypergraph exploration strategy for automatic retrosynthesis planning without human intervention [72,79]. Meanwhile, the single-step retrosynthesis model predicts the reactants, reagents, solvents, and catalysts for each retrosynthesis step, bringing the retrosynthesis technology to a new technological level. treat each reaction prediction task as a data-driven sequence-to-sequence problem, achiev-ing superior performance for single-step inverse synthesis tasks ( Figure 6) [70]. The top-1 accuracy of the retrosynthesis methods discussed above ranged from 41-54% [75]. Even though the increased batch size and training time of the Transformer model by Duan et al. achieved a top-1 accuracy of 54.1% on the 50 k USPTO dataset [76]. In contrast, the RetroTRAE developed by Cernak et al. is free of all SMILES-based translation problems, yielding a top-1 accuracy of 58.3% on the USPTO test dataset [75]. Although the top-1 accuracy is gained using the proprietary training and test sets, it is questionable how models with specific sets of the same chemical transformations can be used in specific processes. Recently, graph-enhanced transformer and hybrid models were reported, achieving 44.9% top-1 accuracy and more diverse reactant suggestions, respectively, but without substantial improvements over previous work [77,78]. Notably, except for the work of Lin et al., all transformer-based retrosynthesis methods are limited to a single step [69]. Additionally, reagent, catalyst, and solvent conditions were not simultaneously predicted in the retrosynthesis planning. The Molecular Transformer model introduced by Schwaller et al. incorporates a hypergraph exploration strategy for automatic retrosynthesis planning without human intervention [72,79]. Meanwhile, the single-step retrosynthesis model predicts the reactants, reagents, solvents, and catalysts for each retrosynthesis step, bringing the retrosynthesis technology to a new technological level.

Forward Reaction Prediction
AI is also widely used for forward reaction prediction. Reaction prediction, which predicts possible products from starting materials and conditions, can be used to virtually screen the proposed reactions or to validate the proposed synthesis steps.
Compared to retrosynthesis, the forward reaction prediction has only one answer, leading to a more straightforward quantitative assessment. Currently, AI-based models

Forward Reaction Prediction
AI is also widely used for forward reaction prediction. Reaction prediction, which predicts possible products from starting materials and conditions, can be used to virtually screen the proposed reactions or to validate the proposed synthesis steps.
Compared to retrosynthesis, the forward reaction prediction has only one answer, leading to a more straightforward quantitative assessment. Currently, AI-based models used for reaction prediction include (1) inferring reaction rules from predefined lists of rules or templates, (2) graph convolutional neural networks that predict changes in atoms and bonds between starting materials to products, and (3) sequence-to-sequence model of the prediction product SMILES. Similar to retrosynthetic planning, reliable data tend to favor the quality of forward reaction prediction results. In the absence of precise data, such as concentration, time, and temperature data, reaction prediction becomes a tricky problem. For example, Lee et al. found that retraining a sequence-to-sequence forward prediction model on its own data did improve the accuracy of company-specific chemistry [15].
Coley et al. combined the traditional use of reaction templates with the flexibility of pattern recognition offered by neural networks to develop a framework for predicting reaction outcomes [80]. In 5-fold cross-validation, it is shown that the trained model is very successful in forward reaction prediction. Similarly, Aspuru-Guzik et al. combined predictor variables with SMARTS transformations to construct a system of predictable products [81]. Finally, the usability of the system was verified with questions from organic chemistry textbooks. Coley et al. used Weisfeiler-Lehman networks to model higherorder interactions between changes occurring at nodes in a molecule to effectively explore the space of product molecules and predict the outcome of organic reactions [82]. The experimental results show that the accuracy of the model is comparable to the performance of domain experts. Furthermore, given the reactant, reagent, and solvent conditions, Coley's group proposed a supervised learning method to predict the product [16]. By mapping the text sequence representing the reactants to the text sequence representing the products, the reaction prediction can be viewed as a translation problem. For example, Schwaller et al. enabled the prediction of complex organic chemical reactions with template-free sequenceto-sequence models [83]. The model achieves 80.3% top-1 accuracy without relying on auxiliary knowledge, such as response templates or explicit atomic features. Recently, Schwaller's group forward prediction was considered a machine translation problem and developed the Molecular Transformer model [69]. The model can make predictions by inferring correlations between the chemical motifs in the reactants, reagents, and products in the data set, achieving over 90% top-1 accuracy.
Another parameter of interest for the forward reaction prediction is the reaction yield. It can guide the chemist in choosing the route that maximizes the total yield as well as assists in retrosynthetic planning. The model for reaction prediction was mainly built on the high-throughput experiment dataset. Perera et al. studied 15 pairs of electrophilic and nucleophilic reagents for the Suzuki-Miyaura reaction using the HTE technique, and each obtained different products [84]. Doyle et al. trained a random forest algorithm using a high-throughput data set to accurately predict the yield of other Buchwald-Hartwig coupling reactions with multidimensional variables after being trained with thousands of Buchwald-Hartwig coupling reaction data [85]. Similarly, Schwaller et al. used Doyle's high-throughput dataset to predict yields for a total of 3955 Buchwald-Hartwig reactions containing [86]. Andrzej et al. predicted the yield of 16 phosphate ligands for nickelcatalyzed Suzuki cross-coupling by training a linear regression model with two larger data sets obtained by high-throughput experiments (HTE) (Figure 7) [87]. Further, Schwaller's group combined the encoder converter model with a regression layer, and the excellent reaction yield prediction performance of the high model was demonstrated on two highthroughput experimental reaction sets [28]. Although high-throughput experiments are capable of screening multiple reaction variables at the nanomolar level, this technique covers a very narrow chemical space dataset. Structure-based descriptors (molecular fingerprints and molecular maps) are faster and easier to compute for any molecule. Hirst et al. demonstrated the applicability of support vector regression (SVR) in predicting reaction yields using combined data [88]. Schwaller's group treated organic molecules as a language and introduced SMILES strings of reactions into the model to predict reaction yields [68,83]. Additionally, the use of encoder-only transformers, such as Bidirectional Encoder Representations from Transformers (BERT), has led to advances in response yield prediction. The superiority of yield prediction compared to one-hot encoding was demonstrated by Sandfort et al. using a concatenation of multiple molecular "fingerprints" as a representation of alternative reactions [89]. Moreover, Akinori et al. developed a Message Passing Neural Network (MPNN) model for chemical yield prediction for Buchwald-Hartwig cross-coupling yields [90]. Sequence-to-sequence models are not only useful when working with language tokens but also provide high-quality descriptors to predict reaction properties, such as reaction yields.
prediction. The superiority of yield prediction compared to one-hot encoding was demonstrated by Sandfort et al. using a concatenation of multiple molecular "fingerprints" as a representation of alternative reactions [89]. Moreover, Akinori et al. developed a Message Passing Neural Network (MPNN) model for chemical yield prediction for Buchwald-Hartwig cross-coupling yields [90]. Sequence-to-sequence models are not only useful when working with language tokens but also provide high-quality descriptors to predict reaction properties, such as reaction yields.

Condition Recommendation
For the forward reaction to proceed smoothly, it is necessary to explore the reaction conditions that will achieve the desired transformation. Typically, chemists screen reaction conditions based on their own experience and are biased. Instead, based on a priori knowledge, the AI can more objectively infer the appropriate conditions. However, recommendations for specific reaction conditions were limited to a single reaction class [91,92]. The main reason is the lack of high-quality data, which makes the model difficult to develop. Mainly including (1) quantity, volume, or concentration, (2) reaction time or kinetics, and (3) order of addition of reagents and catalysts. Despite the difficulties, AI has demonstrated the ability to make reaction condition recommendations for more diverse reaction sets. These models provide a strong basis for empirical optimization of reaction conditions but still lack the full details needed for implementation. The discovery of more general reaction conditions requires consideration of a broad region of chemical space derived from a large matrix substrate that intersects with the high-dimensional matrix of reaction conditions. In their optimization of the Suzuki-Miyaura cross-coupling reaction, Aspuru-Guzik et al. identified the phosphine ligand as a classification parameter critical for determining the reaction outcome [93]. Thus, a strategy using computational molecular feature clustering was developed to reveal the conditions for selectively obtaining the

Condition Recommendation
For the forward reaction to proceed smoothly, it is necessary to explore the reaction conditions that will achieve the desired transformation. Typically, chemists screen reaction conditions based on their own experience and are biased. Instead, based on a priori knowledge, the AI can more objectively infer the appropriate conditions. However, recommendations for specific reaction conditions were limited to a single reaction class [91,92]. The main reason is the lack of high-quality data, which makes the model difficult to develop. Mainly including (1) quantity, volume, or concentration, (2) reaction time or kinetics, and (3) order of addition of reagents and catalysts. Despite the difficulties, AI has demonstrated the ability to make reaction condition recommendations for more diverse reaction sets. These models provide a strong basis for empirical optimization of reaction conditions but still lack the full details needed for implementation. The discovery of more general reaction conditions requires consideration of a broad region of chemical space derived from a large matrix substrate that intersects with the high-dimensional matrix of reaction conditions. In their optimization of the Suzuki-Miyaura cross-coupling reaction, Aspuru-Guzik et al. identified the phosphine ligand as a classification parameter critical for determining the reaction outcome [93]. Thus, a strategy using computational molecular feature clustering was developed to reveal the conditions for selectively obtaining the desired product isomers in high yields. What's more, Aspuru-Guzik's group reported a simple closed-loop workflow that can be used to discover general reaction conditions using data-guided matrix down-selection, uncertainty minimization machine learning, and robotic experiments [94]. By applying it to the heteroaryl Suzuki-Miyaura cross-coupling reaction, conditions were identified that doubled the average yield relative to a widely used benchmark previously developed using conventional methods. A practical roadmap was provided for solving multidimensional chemical optimization problems with large search spaces.

AI for Automated Synthesis
Applications of AI in chemical reactions include not only synthetic route planning but also automated synthesis. Traditionally, scientists have been exposed to hazardous, repetitive chemical manipulations for long periods, resulting in a significant waste of resources and time [95,96]. Additionally, cost and condition constraints prevent scientists from conducting too many experiments to obtain desired results. Most importantly, traditional chemical synthesis relies heavily on labor-intensive practices such as scientific training, planning, experience, observation, and interpretation. Fortunately, AI is changing the productivity of modern manufacturing, and modern automation of organic chemistry operations is gradually freeing the hands and minds of organic chemists [85,97,98]. For example, with an auto mated platform, the anti-arrhythmic drug lidocaine, the anti-epileptic drug rufinamide, and the anti-cardiovascular drug sildenafil have been synthesized automatically without human intervention [99,100]. Exactly, AI alleviates the operator from tedious work and manual intervention.

Robotic Lab Platform
Automated chemistry is based on the modularity of common physical operations, such as liquid handling robots, robotic grippers for plate or vial transfer, and computer-controlled heater/shaker blocks, to help scientists reduce labor-intensive laboratory tasks [101]. The platform mainly consists of (1) continuous flow technology combined with process analysis technologies and robotics, (2) automated operation modes combined with hardware for traditional intermittent reactions, and (3) robotics replacing the operator's operation. A simple paradigm for automated chemistry is to automate the operational and sample transfer steps between existing laboratory hardware, such as the mobile robotic chemists of Burger et al. [102]. Mo et al. built a robotic desktop system for the high-throughput collection of TLC data with an image analysis program that automatically calculates compound Rf values. This work reduces the reproducibility of experiments by replacing scientists with robots for repetitive TLC sampling [103]. Currently, the intelligence of chemical synthesis is still in the development stage. Cronin's group developed a modular standard robotic platform to automate laboratory-scale chemical synthesis [99]. With the robotic platform, the authors synthesized three pharmaceutical compounds, Nytol, rufinamide, and sildenafil, without human intervention. The yields and purity of the products and intermediates were comparable to or better than those obtained manually. What's more, the Chemputer synthesis robot can perform many different reactions, including solid-phase synthesis and iterative cross-coupling [104]. Interestingly, the system can simultaneously reuse only 22 different steps in 10 unique modules, and the code can access 17 different reactions, making it possible to link multi-step synthesis to run many different protocols and reactions in a single machine. Although this robotic platform is encouraging, the synthesis of complex organic compounds is still largely artificial. To reduce experimental reproducibility, Jamison et al. developed a plug-and-play continuous flow chemical synthesis system [105]. The system has a flexible robotic arm that can perform all synthetic operations instead of the scientist, automatically synthesizing 15 drugs, including Aspirin, Lidocaine, Diazepam, Warfarin, etc. (Figure 8) [106]. Notably, the system remains insoluble for process intensification (e.g., reducing reaction time), reducing solid formation to avoid blockage, etc. Additionally, predicting the appropriate purification method is challenging, especially for non-column chromatography methods. Moreover, the optimization of multi-step reactions can be complicated by the propagation of parameters. In addition to replacing scientists in labor-intensive laboratory operations, robots can help scientists with other complex tasks. For example, Cronin's group invented robots that automatically read the literature and form a generalized autonomous synthesis workflow [99]. However, manual error correction is still required. Burger et al. used a mobile robot to find photocatalysts that break down water into hydrogen [102]. The robot, driven by a Bayesian search algorithm, performed 688 experiments in an experimental space of 10 variables over eight days. Jamison et al. developed robots that can vary downstream dwell time and control the addition sequence to minimize undesired reactivity [107]. Robotic reconfigurability and convergent synthesis flexibility play an increasingly important role in assisting with idea generation, experimental design, execution, and optimization to enhance manual experiments.
parameters. In addition to replacing scientists in labor-intensive laboratory operations, robots can help scientists with other complex tasks. For example, Cronin's group invented robots that automatically read the literature and form a generalized autonomous synthesis workflow [99]. However, manual error correction is still required. Burger et al. used a mobile robot to find photocatalysts that break down water into hydrogen [102]. The robot, driven by a Bayesian search algorithm, performed 688 experiments in an experimental space of 10 variables over eight days. Jamison et al. developed robots that can vary downstream dwell time and control the addition sequence to minimize undesired reactivity [107]. Robotic reconfigurability and convergent synthesis flexibility play an increasingly important role in assisting with idea generation, experimental design, execution, and optimization to enhance manual experiments. Robotic lab platforms are used not only in the chemical industry but also in other fields, such as for the automated synthesis of chemical peptides and materials. Peptide Nucleic Acid (PNA) is a synthetic DNA or RNA analog with a peptide chain backbone structure, and traditional synthetic methods require several days to synthesize a biologically active sequence. Bradley et al. have invented a fully automated flow synthesis robot called "Tiny Tides" that allows rapid "one-pot" synthesis of peptide nucleic acid sequences (PPNA) with cell-penetrating peptides [108]. This automated synthesis technology reduces the synthesis time of PPNA from several days to just two hours. Similarly, Aspuru-Guzik's group reported an algorithm-driven modular robotics-based platform applied to discover thin film materials [109]. Cronin's group develops an autonomous chemical synthesis robot for exploring, discovering, and optimizing nanostructures driven by real-time spectral feedback, theory, and machine learning algorithms [110]. Additionally, advances in robotics have played a role in precision medicine to improve modern medicine and quality of life, including the delivery of drugs, biologics, genes, and living cells, as detailed in the related review [111]. Robotic lab platforms are used not only in the chemical industry but also in other fields, such as for the automated synthesis of chemical peptides and materials. Peptide Nucleic Acid (PNA) is a synthetic DNA or RNA analog with a peptide chain backbone structure, and traditional synthetic methods require several days to synthesize a biologically active sequence. Bradley et al. have invented a fully automated flow synthesis robot called "Tiny Tides" that allows rapid "one-pot" synthesis of peptide nucleic acid sequences (PPNA) with cell-penetrating peptides [108]. This automated synthesis technology reduces the synthesis time of PPNA from several days to just two hours. Similarly, Aspuru-Guzik's group reported an algorithm-driven modular robotics-based platform applied to discover thin film materials [109]. Cronin's group develops an autonomous chemical synthesis robot for exploring, discovering, and optimizing nanostructures driven by real-time spectral feedback, theory, and machine learning algorithms [110]. Additionally, advances in robotics have played a role in precision medicine to improve modern medicine and quality of life, including the delivery of drugs, biologics, genes, and living cells, as detailed in the related review [111].
Fully automated chemical synthesis using AI robots instead of humans is a future trend, with advantages not only in faster and more efficient synthesis but also in the production of compounds that are difficult to synthesize manually. As can be seen, experimental methods based on robotic lab platforms have been used successfully to solve high-dimensional problems in physics, chemistry, and life sciences. It is worth noting that although the necessary hardware units for such tasks are commercially available, cost, standardization, and efficiency issues have made scaling up difficult. Furthermore, current automated multistep synthesis relies on iterative or linear processes and requires compromises in versatility and equipment usage, which means that machines cannot perform multi-step synthesis to run many different protocols and reactions [112][113][114][115]. It is believed that in the future, robotic chemists will further change chemical synthesis.

Automated Synthesis
In the field of chemistry, AI can be used to optimize chemical reaction conditions experimentally. In the past, Chemists devote considerable time to evaluating various reaction parameters such as substrates, catalysts, reagents, additives, solvents, concentrations, temperatures, and reactor types. Despite the prevalence of established techniques such as single-objective optimization algorithms, design of experiments (DoE), and other existing techniques, reaction optimization is still often a difficult and time-consuming process for chemists [116][117][118]. For example, the Single-objective optimization algorithms cannot explore the entire chemical space, thus yielding an overall sub-optimal process. Design of experiments (DOE) methods such as Latin Hypercube Sampling (LHS) typically generate samples that cover the design space as uniformly as possible to improve the accuracy of the overall metamodel. However, it is not the most efficient method if the design goals are clearly defined. Additionally, when a complex model is required to achieve a predefined design goal, the sole experimental methods are inefficient. Consequently, there are often multiple factors to consider during process optimization, such as reaction yield, process cost, impurity levels, and environmental impact. Multi-objective optimization can address multiple (conflicting) objectives encountered in many chemical engineering applications, for example, conversion and selectivity in chemical reactions [119]. Multi-objective optimization techniques such as the parametric approach, epsilon constraint method, or genetic algorithms are used as solution strategies for the multi-objective optimization of chemical reactions [120,121]. However, since they require many functional evaluations and partially derived information is not available, these approaches do not apply to automated chemical reaction systems. Surprisingly, the Bayesian optimization method is a derivative-free global stochastic optimization method for the automatic optimization of multi-objective experimental parameters in chemistry, materials, and other fields [122,123]. For example, both Doyle et al. and Jensen's group used Bayesian algorithms to achieve optimization for performing single or multi-objective reaction parameters [122,124]. Additionally, some excellent multi-objective Bayesian optimization algorithms have been gradually developed, such as Thompson Sampling Efficient Multi-Objective (TS-EMO), Phoenics, Gryffin, and Chimera, etc. (Figure 9) [125][126][127][128][129]. Bayesian optimization is an iterative response surface-based global optimization algorithm that has shown excellent performance in the optimization of chemical reaction parameters [130][131][132]. It aims to balance the exploration of areas of uncertainty with the use of available information to obtain high-quality configurations in fewer evaluations. In many cases, Bayesian optimization algorithms outperform expert practitioners and other state-of-the-art global optimization algorithms [133]. Currently, multi-objective Bayesian optimization algorithms, including Thompson Sampling Efficient Multi-Objective (TS-EMO), ParEGO, and Expected Hypervolume Improvement (EHI), aim to approximate the Pareto front. Lapkin's group study showed that the TS-EMO algorithm has comparable or better data efficiency than both EHI and ParEGO [130]. Further, TS-EMO performs well on a set of mathematical test functions for a given budget compared to the externally commonly used genetic algorithm NSGA-II. Based on the advantages of the TS-EMO algorithm, Lapkin et al. achieved self-optimization for the Sonogashira reaction, Claisen- Bayesian optimization is an iterative response surface-based global optimization algorithm that has shown excellent performance in the optimization of chemical reaction parameters [130][131][132]. It aims to balance the exploration of areas of uncertainty with the use of available information to obtain high-quality configurations in fewer evaluations. In many cases, Bayesian optimization algorithms outperform expert practitioners and other state-of-the-art global optimization algorithms [133]. Currently, multi-objective Bayesian optimization algorithms, including Thompson Sampling Efficient Multi-Objective (TS-EMO), ParEGO, and Expected Hypervolume Improvement (EHI), aim to approximate the Pareto front. Lapkin's group study showed that the TS-EMO algorithm has comparable or better data efficiency than both EHI and ParEGO [130]. Further, TS-EMO performs well on a set of mathematical test functions for a given budget compared to the externally commonly used genetic algorithm NSGA-II. Based on the advantages of the TS-EMO algorithm, Lapkin et al. achieved self-optimization for the Sonogashira reaction, Claisen-Schmidt condensation reaction, N-benzylation reaction, and N-benzylation reaction with flow chemistry systems [130,133]. The optimal conditions corresponding to the trade-off curve (Pareto front) between environmental and economic objectives were successfully identified. The TS-EMO algorithm combined with flow chemistry systems demonstrates the ability to identify optimal reaction conditions and trade-offs (Pareto fronts) between conflicting optimization objectives such as yield, cost, space-time yield, and E-factor in a data-efficient manner [134]. The TS-EMO algorithm applies not only to classical singlestep reactions but also to the optimization of multi-step reaction parameters. Lapkin's group combined the TSEMO algorithm with a self-optimizing platform to optimize the Claisen-Schmidt condensation reaction with subsequent liquid-liquid separation, involving three objectives [133]. By optimizing multi-step sequential reactions, AI shows how to re-evaluate optimal reaction conditions with changing downstream post-processing specifications during active learning. The diversity of possible combinations of reagents, solvents, stoichiometry, and temperature for reactions makes the development of new products fraught with difficulties. For example, studies have shown that most catalytic reactions have over 50 million potential conditions, making a robust exploration of the parameter space impractical [135]. Recent work has shown that machine learning and molecular descriptors of a solvent or catalyst can be used to extrapolate performance from a small number of experiments to a large library, but which machine learning strategy to apply in a particular case remains difficult [136,137]. Therefore, Lapkin et al. released an open-source software package called Summit based on Bayesian optimization to optimize the reaction [125]. Summit includes a benchmark that enables the comparison of the performance of different ML strategies, where researchers can test the efficiency of each strategy through virtual experiments. The platform was used to achieve process route development for the SNAr reaction and to screen the optimal catalyst and ligand for the Pd-catalyzed cross-coupling reaction. Furthermore, AI makes it possible to obtain functional molecules with high selectivity from renewable biomaterials and biowastes. The preparation of p-cymene from waste terpene mixtures was reported by Lapkin et al. [138]. This work used the TS-EMO algorithm to optimize the first two steps of the reaction to obtain maximum conversion and selectivity for the production of functional molecules from biomaterials and biowaste. In brief, Bayesian optimization algorithms are tools to develop accurate reaction models without prior knowledge, with a large number of input variables, and with competing objectives. Models developed for individual steps can be used for potential process design and scale-up.
Other examples of multi-objective algorithms developed for the chemical process include Phoenics, Gryffin, and Chimera [127][128][129]. These algorithms avoid the problem of classical Bayesian algorithms that select data in the order of parameter points. Phoenics uses Bayesian neural networks (BNNs) to construct kernel density estimates of the objective function, and its acquisition function allows the selection of batches of evaluations that run in parallel. Importantly, Phoenics is suitable for the optimization of continuous parameters, such as temperature and concentration, and can be used for the optimization of chemical reaction conditions and material properties. Aspuru-Guzik's group developed Gryffin for optimizing categorical parameters such as solvent selection. The algorithm uses categorical kernel densities that can be relaxed to continuous ones. In addition, it allows the provision of expert knowledge in the form of descriptors for each classification choice and is successfully used for the optimization of chemical reaction conditions. Usually, there are also multiple competing objectives in materials science. Chimera is a generic multi-objective optimization method. It allows for defining a hierarchy of objective preferences that are combined into a single function optimized with any chosen algo-rithm. Importantly, both the previously mentioned TS-EMO algorithm and the algorithm described here can be combined with an automation platform to automate experiments. For example, Aspuru-Guzik's group deployed ChemOS, together with Phoenics, Gryffin, and Chimera, for the autonomous optimization of manufacturing processes of thin-film materials, multicomponent polymer OPV blends, and reaction conditions of stereoselective Suzuki coupling [93,109,139]. For other excellent work, Aspuru-Guzik's group has described specifically in the review [140]. Other AI-based automation platforms are also being reported. Vlachos's group has developed the NEXTorch platform using state-ofthe-art Bayesian optimization algorithms to enable the sampling of continuous variables and discrete values of subtypes [141]. It can help not only chemical synthesis in laboratory experiments but also multi-scale computational tasks from molecular-scale design to reactor-scale optimization.
Nevertheless, AI in automated synthesis still faces many challenges. First, the inline/online analysis still needs further development, especially in terms of measurement accuracy, instrument response speed, and compatibility with heterogeneous synthesis. In addition, the equipment for automated synthesis is too expensive for research laboratories in developing countries to afford.

Conclusions and Outlook
In this review, different aspects of artificial intelligence-enabled chemical process intensification are discussed. In chemistry, AI enables structure-function relationship analysis, including the prediction of molecular properties and the design of molecules. In addition, here is a brief summary of the use of computer-aided synthesis planning (CASP): retrosynthetic planning, condition recommendation, and forward reaction prediction in the Pharmaceutical and Chemical Industry. Moreover, the robotic lab platform enables automated organic synthesis to reduce the repetitive work of laboratory staff. Finally, AI techniques enable the optimization of chemical reaction conditions with multiple objectives, achieving a trade-off between optimal reaction conditions and conflicting optimization objectives (e.g., yield, cost, spacetime yield (STY), and E factor).
Although AI is booming in the chemical industry, it still faces many challenges. Optimal predictions depend on the availability of a stable and high-quality dataset, and the challenge is to obtain sufficient and reliable data. Second, while the arithmetic (quantum and cloud-based approaches) is improving, there are still limitations from the user's perspective. The shortage of data science talent in chemical engineering means that increased collaboration between chemistry and other scientific disciplines may help accelerate the integration of AI with other fields.