Integrating Synthetic Accessibility Scoring and AI-Based Retrosynthesis Analysis to Evaluate AI-Generated Drug Molecules Synthesizability

Motente, Mokete; Chude-Okonkwo, Uche A. K.

doi:10.3390/ddc4020026

Open AccessArticle

Integrating Synthetic Accessibility Scoring and AI-Based Retrosynthesis Analysis to Evaluate AI-Generated Drug Molecules Synthesizability

by

Mokete Motente

and

Uche A. K. Chude-Okonkwo

^*

Institute for Artificial Intelligent Systems, University of Johannesburg, Auckland Park 2006, South Africa

^*

Author to whom correspondence should be addressed.

Drugs Drug Candidates 2025, 4(2), 26; https://doi.org/10.3390/ddc4020026

Submission received: 14 April 2025 / Revised: 22 May 2025 / Accepted: 27 May 2025 / Published: 31 May 2025

(This article belongs to the Section In Silico Approaches in Drug Discovery)

Download

Browse Figures

Versions Notes

Abstract

Background: One of the challenges of applying artificial intelligence (AI) methods to drug discovery is the difficulty of laboratory synthesizability for many AI-discovered molecules. Often, in silico techniques and metrics such as the computationally enabled synthesizability score and AI-based retrosynthesis analysis are used. Methods: In this paper, we present a predictive synthesizability method that integrates the gains of synthetic accessibility scoring and the benefits of AI-driven retrosynthesis analysis tools to evaluate the synthesizability of AI-generated lead drug molecules. Results: We explored the proposed method by using it to analyze the synthesizability of a set of 123 novel molecules generated using AI models. The analysis of the synthesis route of the four best molecules from the set in terms of synthesizability, as identified using the proposed method, is presented. Conclusions: This strategy enables quick initial screening and more comprehensive actionable synthetic pathways, thereby balancing speed and detail, and favoring simple routes to avoid the risk of pursuing non-synthesizable compounds in the drug development pipeline.

Keywords:

drug discovery; artificial intelligence; synthesizability; synthetic accessibility score; retrosynthesis

1. Introduction

Drug discovery aims to find novel molecules that can modulate specific molecular targets implicated in diseases to produce the desired therapeutic responses. The traditional drug discovery process is often labor-intensive, typically spanning over a decade and costing upwards of a billion dollars per successful drug [1]. Despite significant investments, the success rate remains low, with only about 10% of drug candidates entering clinical trials, and only one may eventually receive approval [2]. To address the challenges of the complexity, development time, and cost of drug discovery, the integration of AI techniques, particularly generative models, has demonstrated promising results that can transform drug discovery by enhancing the development time, efficiency, and accuracy of candidate molecules design and optimization [3,4,5]. For instance, in [3], the deep generative technique was utilized to identify several active Discoidin Domain Receptor 1 (DDR1) inhibitors in just 21 days.

Typically, AI models and computational models that combine combinatorial methods and machine learning can be used to generate a large number of new drug molecules, as shown in [5,6], where thousands of new drug molecules were generated for the treatment of hypertension. However, translating AI-generated molecules into tangible compounds requires practical synthesis. In fact, the synthesizability of molecules generated by AI methods remains a challenge, as discussed in [7,8,9], where the importance of enhancing the synthesizability through recent synthetic planning methods is emphasized. Traditionally, the determination of the synthesizability of molecular compounds is carried out experimentally by expert chemists relying on heuristic methods based on empirical rules and experience. However, for a large set of molecules typical of the output of the AI model, the traditional approach to determining the synthesizability of molecules would be cumbersome. Therefore, there is a need to consider other methods that are consistent and less cumbersome. An option that can be adopted to address this challenge is to ensure that the synthesizability of the novel molecules is accounted for in the generation process. This requires the development of generative models with reaction-aware architectures, which is not an easy task. Hence, most contemporary AI-based molecular generative models basically generate as many molecules as possible and do post-filtering to determine their synthesizability.

A synthesizability estimation option for a large set of molecules is in silico-based synthetic accessibility scoring. Synthetic accessibility scoring [10] is a computational method for estimating how easy it is to synthesize a drug-like molecule considering molecular fragment contributions and molecular complexity. The method in [10] was validated by comparing the ease of synthesis as estimated by experienced medicinal chemists for a set of 40 molecules. The synthetic accessibility scoring is often based on simplified and heuristic techniques, which may not adequately capture the complexities of current synthetic chemistry [11]. As a result, even if a molecule has a high score, poor yields or expensive reagents may make its synthesis impracticable. It also does not provide reaction pathways to synthesis. Therefore, it is basically suitable for providing a quick estimation of the synthesizability of molecules.

An in silico synthesizability analytical technique that offers reaction pathways to synthesis and can handle large sets of molecules is data-driven retrosynthetic analysis. This analytical approach integrates AI to enhance the efficiency of synthesizing complex molecules by automating the identification of synthetic routes and optimizing reaction conditions. Hence, the method can handle a large set of molecules. However, compared to synthetic accessibility scoring, data-driven retrosynthetic analysis involves significantly more computational complexity with computational tasks for large datasets running in hours and days. Therefore, it is essential that only molecules with high probability undergo retrosynthetic analysis.

In this paper, we integrate synthetic accessibility scoring and a data-driven retrosynthesis reliability assessment method to evaluate the synthesizability of AI-generated lead drug molecules. We term this integrated strategy predictive synthetic feasibility analysis. Specifically, this integrated strategy combines traditional computational synthetic accessibility scoring and an AI-driven predictive retrosynthesis confidence assessment method to determine the synthesizability of molecules from a set of novel lead drug molecules generated in [12] using AI. Unlike the synthetic accessibility scoring method, the AI-driven retrosynthesis confidence assessment method considers factors like the context of reactions involved. The integrated strategy enables quick initial qualitative and quantitative screening of large sets of molecules for actionable synthetic routes, thereby balancing speed and detail and favoring easy synthesis routes to avoid the risk of pursuing non-synthesizable compounds in the drug development pipeline. Once a set of molecules is identified as being easy to synthesize, the full retrosynthesis analysis will be conducted. In this paper, the retrosynthetic analysis of the top molecules identified by the proposed method as being the easiest regarding synthesizability is presented. Note that the term lead compound is used in this work to designate a potential drug compound that is yet to undergo preclinical evaluation.

2. Results

In this section, we employ the method described in Figure 10 to present the synthesizability analysis of the molecules in the dataset, D. First, we determine the values of

Φ_{s c o r e}

and

C I

for the molecules in D, then we plot the

Φ_{s c o r e} - C I

characteristics of the molecules for different thresholds that indicate their predictive synthetic feasibility. Second, we present the AI-predicted retrosynthetic routes of the four (4) molecules with the best predictive synthesis feasibility and the expert chemist’s opinion on retrosynthesis routes. We note that all the figures in this section were generated using the RDKit in Python version 3.12.

The values of

Φ_{s c o r e}

for all the elements of D are calculated using the RDKit tool, which is based on the method developed by [9]. Figure 1 shows the

Φ_{s c o r e}

violin plot for the 123 molecules in D. It can be seen that the synthetic accessibility of most of the molecules is concentrated between

Φ_{s c o r e} = 3

and

Φ_{s c o r e} = 4

. However, determining the threshold of

Φ_{s c o r e}

that offers good synthesizability of the molecules is not really assessable with this information. In Figure 2, we show the

C I

violin plot for the 123 molecules in D. The values of the CIs for all the elements of D are calculated using the IBM RXN for Chemistry AI tool [13]. The results in the graph show that a considerable number of molecules can be synthesized with over

80 %

confidence. However, we do not clearly specify the threshold CI value that indicates a ‘good’ value for a synthesizable molecule.

Combining the information in

Φ_{s c o r e}

and

C I

, we present the predictive synthesis feasibility analysis,

Γ_{T h 1 / T h 2}

, for arbitrary values of the thresholds, Th1 and Th2. The

Φ_{s c o r e} - C I

characteristics are shown in Figure 3. for different threshold values of

T h 1 \land T h 2

. The

Φ_{s c o r e}

and

C I

for the best four molecules with the most promising synthetic scores are shown in Table 1.

2.1. Retrosynthetic Feasibility Analysis of Compound A

The principal synthesis precursors to realizing the target molecule are shown in Table 2. The precursor, 1,4–Dioxane, is a cyclic ether used as a solvent, and Palladium is used as a catalyst in cross-coupling reactions. Potassium carbonate is a base, Butyl boronic acid is a reactant used in a compound used in Suzuki coupling, and Ethyl 2-(3-bromo-4-hydroxyphenyl)acetate is an ester containing bromo and hydroxy substituents on a phenyl ring. The reaction occurs in two steps, as shown in Figure 4. The first step entails debromination of the starting material, (ethyl 2-(3-bromo-4-hydroxyphenyl) acetate), and the debromination reaction is catalyzed by (Palladium (tetrakis triphenylphosphine), Pd(PPh₃)₄). The base (K₂CO₃) facilitates the conversion of N butyl boronic acid into a more reactive species, and the two starting materials react at elevated temperatures (50–80 °C) to enhance the reaction rate. This type of reaction is referred to as the Suzuki–Miyaura reaction [14] since they form a new carbon to carbon (C-C) bond between the phenyl group of the alkyl group and the alkyl group of the boronic acid. The second step entails ammonolysis (addition of ammonia (NH₃) to form amines or nitrides) of the first step product (ethyl 2-(3-butyl-4-hydroxyphenyl)acetate), and the reaction is carried out in methanol (CH3-OH) as a solvent, and it is also carried out in elevated temperatures to increase the speed of the reaction.

2.2. Retrosynthetic Feasibility Analysis of Compound B

The principal synthesis precursors to realizing the target molecule are shown in Table 3. The precursors, THF and Dichloromethane, are polar and non-flammable chlorinated solvents, respectively. Triethylamine and Triphenylphosphine are organic compounds and nucleophilic catalysts, respectively. The compound, 1-(2-azidoethyl)-4-methoxy-2-methylbenzene, is an azide-functionalized aromatic compound. The reaction to synthesizing O=C(NCCc1ccc(O)cc1C)Cc2cccc3ccccc32 occurs in three steps, as shown in Figure 5. The first step entails hydrating 1-(aminoethyl)-4-methoxybenzene to convert it to an amide species in the presence of triphenylphosphine (PPh3) as a catalyst. These types of reactions are called Staudinger reactions [15], which are reactions of organic azides with phosphines to produce iminophosphorane. The second step involves deprotonation of the amide group in 2-(4-methoxy-2-methylphenyl)ethan-1-amine, and the deprotonation is enhanced by using triethylamine as a base. The reactant (1-naphthoyl chloride) is also dechlorinated, resulting in the reaction between the two starting materials to form N-(4-methoxy-2-methylphenethyl)-2-(naphthalen-2-yl) acetamide and hydrogen chloride (HCl). The last step entails the formation of an alcohol N-(4-hydroxy-2-methylphenethyl)-2-(naphthalen-2-yl) acetamide, and the reaction is carried out at −78 °C to avoid side reactions that may occur because of utilizing boron tribromide (BBr3).

2.3. Retrosynthetic Feasibility Analysis of Compound C

The principal synthesis precursors to realizing the target molecule are shown in Table 4. The precursor, 2-Hydroxy-5-(3-(4-hydroxyphenyl)propyl)benzaldehyde is a phenolic aldehyde with antioxidant and potential bioactive properties. Ammonia and sulfuric acid are reactants in the synthesis route, while sodium chlorite acts as a strong oxidizing agent. The compounds methanol and ethanol are solvents in the reactions. The reaction to synthesizing Oc1ccc(CCCc2ccc(O)cc2)cc1C(=O)N occurs in three steps, as shown in Figure 6. Reaction step 1 is the reaction between 2-hydroxy-5-(3-(4-hydroxyphenyl)propyl)benzaldehyde and sodium chlorite (

N a C l O_{2}

), which typically results in the oxidation of the aldehyde group (-CHO) to a carboxylic acid group (-COOH). Reaction step 2 is the reaction between 2-hydroxy-5-(3-(4-hydroxyphenyl)propyl)benzoic acid, sulfuric acid, and methanol, which is a classic esterification reaction. This process is often referred to as Fischer esterification [16], in which the carboxylic acid group (-COOH) reacts with methanol in a strong acid catalyst (sulfuric acid) to form the acid’s methyl ester. Reaction step 3 entails the nucleophilic substitution of the ethoxy group of the ketone group with an amine group, resulting in the formation of the amine species and, consequently, the targeted molecules. However, for the AI-predicted synthesis pathways presented above, substituting an ethoxy group with an amine is not straightforward because the ethoxy group is a poor group that leaves under normal conditions [17]. Therefore, an alternative reaction pathway is presented in Figure 7.

Reaction step 1 (Figure 6a) is the same as reaction step 1 in Figure 6a. Reaction step 2 (Figure 6b) entails converting the carboxylic acid group to the corresponding acyl chloride, a much better leaving group using thionyl chloride (SOCl2), which is readily available. Reaction step 2 entails reacting the acyl chloride group with ammonia (

N H_{3}

) in methanol at elevated temperatures (reflux). As shown in Figure 6c, the ammonia acts as a nucleophile and replaces the acyl group with an amino group via nucleophilic substitution.

2.4. Retrosynthetic Feasibility Analysis of Compound D

The principal synthesis precursors to realizing the target molecule are shown in Table 5. The precursors N-(3-amino-4-methoxyphenyl)acetamide and Vinylacetate chloride are used as reactants in the synthesis process, while Dichloromethane and Triethylamine are a solvent and a base or catalyst, respectively. Reaction step 1 shown in Figure 8a involves the acylation on the aromatic ring of the acetamide derivative and is expected to proceed via the nucleophilic acyl substitution, where the lone pair (–NH2) in the amide attacks the electrophilic carbonyl carbon of vinyl acetate chloride, resulting in the chloride ion (Cl⁻) acting as the leaving group, resulting in the formation of the amide bond between the amine and the vinyl acetate chloride. Reaction step 2 shown in Figure 8b typically involves the demethylation of the methoxy group (–OCH₃) on the aromatic ring, yielding a hydroxyl group (–OH). Boron tribromide is a strong Lewis acid commonly used to cleave methyl ethers in organic compounds [18]. It is carried out at low temperatures (e.g., −78 °C to 0 °C) to minimize side reactions and ensure selective demethylation. Reaction step 3 shown in Figure 8c entails the conversion of the acrylamide double bond to form the corresponding butyramide via hydrogenation of acrylamide derivative in the presence of a hydrogenating catalyst (Pd catalyst) and methanol acting as a hydrogen source.

2.5. The CI of Synthesis vs. Steps Analysis

In Figure 9, the CI of synthesis vs. steps graph for the retrosynthesis of each of the compounds is presented to visualize the progression of the retrosynthetic analysis of the overall steps to the target, Oc1ccc(cc1CCCC)CC(=O)N. It can be observed that there are only two steps to compound A with an overall CI of 0.946, which makes it stand out as the most synthesizable due to its high overall confidence and fewer synthesis steps. The overall CIs for compounds C and D are close at 0.887 and 0.885, respectively. It can be seen that the overall CI for compound B is the lowest at 0.861, which makes it the least synthesizable of the four compounds. From Figure 9, it can be seen that step 2 is the significant bottleneck step that affects the synthesizability of compound B. To improve the synthesizability of compound B, alternative reaction types achieve the same transformation, but with easier reactions.

3. Review of Relevant Literature

The challenge of synthesizing new drug molecules is widely recognized in medicinal chemistry and has attracted researchers to explore strategies for efficient synthetic routes. In [10], a quantitative method was introduced to estimate the synthetic accessibility (SA) score of drug-like molecules. This approach leverages a large database of known synthetic fragments to assess the ease with which a molecule can be synthesized. Although SA scoring provides numerical values that assess the difficulty in synthesizing a molecule, it may not fully account for reaction feasibility or retrosynthetic pathways [11]. Therefore, a method of retrosynthesis is required.

Retrosynthetic analysis is a systematic process of breaking down a target molecule into simpler precursors to devise a synthetic route. In [19], an overview of retrosynthetic analysis is provided, explaining its conceptual framework and application in organic synthesis. The use of artificial intelligence in retrosynthetic analysis is explored in [20,21,22]. In [20], a large database of reaction patterns is used to train a neural network to predict plausible reaction pathways. A review of AI-driven retrosynthesis tools is presented in [21]. In [22], an open-source retrosynthetic planning tool is introduced that combines a Monte Carlo algorithm with a neural network. This tool explores synthetic routes by prioritizing high-probability reactions, enabling rapid pathway generation.

However, while AI-based retrosynthesis models perform considerably well in many cases, these models face challenges in handling rare or novel reactions not well represented in the training data. The authors in [11] emphasized the need for integrated approaches that combine SA scores with retrosynthetic tools to improve predictive reliability. Hence, the method proposed in this paper combines SA scores with retrosynthetic tools to improve molecular synthesis prediction.

4. Methods

Figure 10 shows the schematic block diagram of the proposed method. The framework comprises the generated molecules dataset, the complexity-based synthetic accessibility estimation system, the data-driven retrosynthesis analytical tool, and the predictive synthetic feasibility analysis.

Figure 10. The schematic block diagram of the proposed method.

4.1. AI-Generated Molecules Dataset

The dataset

D = d_{1}, d_{2}, d_{3}, \dots d_{123}

contains 123 new beta-blocker-like lead drug molecules generated using a generative model. The dataset and the AI-based method for generating the molecules are presented in [12]. Figure 11 shows the AI-based framework that was used to generate the molecules in D. In the framework, a set of 13,313 molecules was used to train a variational autoencoder (VAE)-based generative model. The VAE structure consists of an encoder, a decoder, and a latent space defined by the mean,

Z_{μ}

, and standard deviation,

Z_{σ}

, of a Gaussian distribution. The output of this process is a set of 123 unique molecules generated by the VAE-based generative model, which forms the dataset, D, used in the rest of this paper.

4.1.1. Complexity-Based Synthetic Accessibility Estimation

Given D, we desire to assess the synthesizability of the molecules in the dataset. This can be estimated by employing the complexity-based synthetic accessibility scoring method. Synthetic accessibility is a computational task that assesses how easy or difficult it is to synthesize a chemical compound in a laboratory. It is calculated by considering factors such as the size, the presence of specific functional groups, the number of rings, similarity to known compounds, and the number of stereogenic centers in the molecule. The complexity-based synthetic accessibility score,

Φ_{s c o r e}

is expressed by [10]:

Φ_{s c o r e} = f_{(f r a g . s c o r e)} - f_{(c o m p . p e n a l t y)},

(1)

where

f_{(f r a g . s c o r e)}

reflects the frequency of the occurrence of molecular fragments in an extensive database of known compounds, and

f_{(c o m p . p e n a l t y)}

penalizes molecules with high structural complexity considering factors such as the number of rings, the number of stereogenic centers, and the presence of unusual structural characteristics. The complexity-based synthetic accessibility score approach helps to quickly assess the overall feasibility of synthesis of compounds in D, allowing the exclusion of molecules that are likely to be difficult to synthesize. The values of

Φ_{s c o r e}

typically range from 1 to 10, with lower scores indicating easier synthesizability.

4.1.2. Data-Driven Retrosynthetic Analysis

Retrosynthetic analysis, in general, is the method of simplifying a target molecular structure to precursor structures that will help identify multiple synthetic routes to the synthesis of the target molecule [19]. For instance, given a molecule

d_{i} ϵ D

, retrosynthetic analysis sought to determine the various possible synthesis routes of

d_{i}

such that

d_{i} = f_{k} (f_{k - 1} (\dots f_{1} (E \subseteq P)), f_{0} (P) = P,

(2)

where

P = {P_{1}, P_{2}, P_{3}, \dots P_{N}}

is the set of precursors, E is a subset of P, and

F = {f_{1}, f_{2}, f_{3}, \dots f_{K}}

is the set of reaction steps.

Given the power set of P,

x (P)

such that

x (P) = {x_{1} (P), x_{2} (P), x_{3} (P), \dots x_{M} (P)}

, we can define the ease of a synthesis route to

d_{i}

by the confidence level (

C I

) reposed in the synthesis given by

C I = H {f_{k} (f_{k - 1} (\dots f_{1} (x_{i} (P)))},

(3)

where H is the confidence level operator and

0 \geq C I \leq 1

holds.

Conventionally, retrosynthesis prediction primarily relies on rule-based algorithms and expert knowledge [19]. However, this approach often limits the exploration of synthetic routes due to high labor costs and constrained search capabilities [20]. Moreover, for a large number of molecules, the conventional manual approach is very laborious. An exciting and more recent approach to retrosynthetic analysis is the data-driven approach, where the potentials of artificial intelligence methods are harnessed to predict synthesis routes [21]. The data-driven retrosynthetic analysis leverages the ability of artificial intelligence algorithms to automatically learn chemistry knowledge from experimental datasets to predict reactions and retrosynthesis routes creatively. Examples of data-driven retrosynthetic analytical tools include IBM RoboRXN for Chemistry [23], AiZynthFinder [22], and Synthia™.

4.1.3. Predictive Synthetic Feasibility Analysis

The complexity-based synthetic accessibility scoring method focuses only on intrinsic complexity and does not evaluate feasibility regarding specific reaction steps or pathways. On the other hand, AI-based retrosynthetic analysis depends heavily on the quality of training data and may fail for uncommon reactions or novel chemistry. However, by combining synthetic accessibility scores and AI-based retrosynthesis, ranked synthetic routes with associated predicted accessibility metrics can be generated, leading to more informed decision making. We term this approach the Predictive Synthetic Feasibility Analysis, prioritizing routes with low complexity

Φ_{s c o r e}

and high predicted CI. We define the predictive synthetic feasibility analysis for a given threshold,

Γ_{(} T h 1 / T h 2)

, by

Γ_{T h 1 / T h 2} = (Φ_{s c o r e} \leq Φ_{s c o r e, T h 1}) \land (C I \geq C I_{T h 2}),

(4)

where

Φ_{s c o r e, T h 1}

and

C I_{T h 2}

are defined threshold values for the synthetic accessibility score and the confidence level, respectively.

5. Conclusions

In this paper, we have demonstrated the synergistic integration of synthetic accessibility scoring and AI-based retrosynthesis analysis as a strong framework for assessing the synthesizability of AI-generated drug molecules. We explored the proposed method by using it to analyze the synthesizability of 123 novel molecules generated using AI models. An analysis was done of the synthesis route of the four best molecules in terms of synthesizability. The results underscore the strengths of the proposed framework in high-throughput evaluation, where millions of candidate molecules can be quickly screened for their synthetic feasibility. The analysis presented using the proposed method indicates that over 50% of the molecules can be synthesized, with the ease of synthesis varying across the set. Considering mainly the cost of the precursors for the synthesis of the four (4) molecules discussed in this work, the estimated synthesis cost will be between $1000 and $5000. However, we note that while rough estimates can be made using literature and computational tools, costs may remain uncertain until late-stage optimization.

Author Contributions

U.A.K.C.-O. and M.M. conceived the presented idea. U.A.K.C.-O. conducted the computational simulation and analysis, and M.M. provided the synthesis analysis. U.A.K.C.-O. and M.M. wrote the manuscript. U.A.K.C.-O. coordinated the project. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by SAI Postdoctoral Fellowship, University of Johannesburg.

Institutional Review Board Statement

This study did not involve human subjects or animals, and thus no Institutional Review Board approval was required.

Informed Consent Statement

No human participants were involved in this study, and therefore informed consent was not required.

Data Availability Statement

The dataset of the 123 molecules used in this work is available by request made to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Fokunang, E.; Fokunang, C. Overview of the advancement in the drug discovery and contribution in the drug development process. J. Adv. Med. Pharm. Sci 2022, 24, 10–32. [Google Scholar] [CrossRef]
Banerjee, S.; Nandi, K.; Badmanaban, R.; Mandal, S.K.; Sen, D.J.; Dholwani, K.K.; Saha, D. Drug discovery & clinical trials: Login passwords to new life. Int. J. Sci. Res. Arch. 2022, 7, 13–35. [Google Scholar]
Zhavoronkov, A.; Ivanenkov, Y.A.; Aliper, A.; Veselov, M.S.; Aladinskiy, V.A.; Aladinskaya, A.V.; Terentiev, V.A.; Polykovskiy, D.A.; Kuznetsov, M.D.; Asadulaev, A.; et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 2019, 37, 1038–1040. [Google Scholar] [CrossRef] [PubMed]
Kumar, R.; Sharma, A.; Alexiou, A.; Ashraf, G.M. Artificial intelligence in de novo drug design: Are we still there? Curr. Top. Med. Chem. 2022, 22, 2483–2492. [Google Scholar] [CrossRef]
Chude-Okonkwo, U.A.; Lehasa, O.M. Machine Learning-aided Computational Fragment-based Design of Small Molecules for Hypertension Treatment. Intell.-Based Med. 2024, 10, 100171. [Google Scholar]
Qian, W.; Huang, J.; Guo, S.; Duan, B.; Xie, W.; Liu, J.; Zhang, C. Searching for the analogues of 1,1-dinitro-2,2-diamino ethylene (FOX-7) by high-throughput computation and machine learning. FirePhysChem 2023, 3, 339–349. [Google Scholar] [CrossRef]
Bhisetti, G.; Fang, C. Artificial intelligence–enabled de novo design of novel compounds that are synthesizable. Artif. Intell. Drug Des. 2021, 2390, 409–419. [Google Scholar]
Wang, M.; Li, S.; Wang, J.; Zhang, O.; Du, H.; Jiang, D.; Wu, Z.; Deng, Y.; Kang, Y.; Pan, P.; et al. ClickGen: Directed exploration of synthesizable chemical space via modular reactions and reinforcement learning. Nat. Commun. 2024, 15, 10127. [Google Scholar] [CrossRef]
Gao, W.; Coley, C.W. The synthesizability of molecules proposed by generative models. J. Chem. Inf. Model. 2020, 60, 5714–5723. [Google Scholar] [CrossRef]
Ertl, P.; Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 2009, 1, 8. [Google Scholar] [CrossRef]
Skoraczyński, G.; Kitlas, M.; Miasojedow, B.; Gambin, A. Critical assessment of synthetic accessibility scores in computer-assisted synthesis planning. J. Cheminform. 2023, 15, 6. [Google Scholar] [CrossRef] [PubMed]
Chude-Okonkwo, U.A.; Lehasa, O. Integrated Framework of Fragment-based Method and Generative Model for Lead Drug Molecules Discovery. Intell. Syst. Appl. 2025, 26, 200508. [Google Scholar] [CrossRef]
O’Neill, S. AI-driven robotic laboratories show promise. Engineering 2021, 7, 1351. [Google Scholar] [CrossRef]
Martin, R.; Buchwald, S.L. Palladium-catalyzed Suzuki- Miyaura cross-coupling reactions employing dialkylbiaryl phosphine ligands. Acc. Chem. Res. 2008, 41, 1461–1473. [Google Scholar] [CrossRef]
Poulou, E.; Hackenberger, C.P. Staudinger Ligation and Reactions–From Bioorthogonal Labeling to Next-Generation Biopharmaceuticals. Isr. J. Chem. 2023, 63, e202200057. [Google Scholar] [CrossRef]
Khan, Z.; Javed, F.; Shamair, Z.; Hafeez, A.; Fazal, T.; Aslam, A.; Zimmerman, W.B.; Rehman, F. Current developments in esterification reaction: A review on process and parameters. J. Ind. Eng. Chem. 2021, 103, 80–101. [Google Scholar] [CrossRef]
Senger, N.A.; Bo, B.; Cheng, Q.; Keeffe, J.R.; Gronert, S.; Wu, W. The element effect revisited: Factors determining leaving group ability in activated nucleophilic aromatic substitution reactions. J. Org. Chem. 2012, 77, 9535–9540. [Google Scholar] [CrossRef]
Michel, B.Y. Dimethylboron Bromide (Me2BBr): A Scarcely Recognized Mild and Versatile Reagent with Astonishing Potential. Synlett 2008, 2008, 2893–2894. [Google Scholar] [CrossRef][Green Version]
Liu, L.; Zheng, J. Retrosynthesis-Introduction to the Analysis and Mechanism. Appl. Comput. Eng. 2023, 3, 143–160. [Google Scholar] [CrossRef]
Zhong, Z.; Song, J.; Feng, Z.; Liu, T.; Jia, L.; Yao, S.; Hou, T.; Song, M. Recent advances in deep learning for retrosynthesis. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2024, 14, e1694. [Google Scholar] [CrossRef]
Jiang, Y.; Yu, Y.; Kong, M.; Mei, Y.; Yuan, L.; Huang, Z.; Kuang, K.; Wang, Z.; Yao, H.; Zou, J.; et al. Artificial intelligence for retrosynthesis prediction. Engineering 2023, 25, 32–50. [Google Scholar] [CrossRef]
Genheden, S.; Thakkar, A.; Chadimová, V.; Reymond, J.L.; Engkvist, O.; Bjerrum, E. AiZynthFinder: A fast, robust and flexible open-source software for retrosynthetic planning. J. Cheminform. 2020, 12, 70. [Google Scholar] [CrossRef] [PubMed]
Schwaller, P.; Laino, T.; Gaudin, T.; Bolgar, P.; Hunter, C.A.; Bekas, C.; Lee, A.A. Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction datasets. ACS Cent. Sci. 2019, 30, 1572–1583. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The

Φ_{s c o r e}

violin plot for the 123 molecules in D.

Figure 1. The

Φ_{s c o r e}

violin plot for the 123 molecules in D.

Figure 2. The

C I

violin plot for the 123 molecules in D.

Figure 2. The

C I

violin plot for the 123 molecules in D.

Figure 3. The variation in the predictive synthesis feasibility of the molecules in D for (a) Th1 = 5 and Th2 = 0.75, (b) Th1 = 4 and Th2 = 0.75, (c) Th1 = 3 and Th2 = 0.75, (d) Th1 = 2 and Th2 = 0.75, (e) Th1 = 5 and Th2 = 0.9, (f) Th1 = 4 and Th2 = 0.9, (g) Th1 = 3 and Th2 = 0.9, (h) Th1 = 2 and Th2 = 0.9, (i) Th1 = 5 and Th2 = 0.95, (j) Th1 = 4 and Th2 = 0.95, (k) Th1 = 3 and Th2 = 0.95, and (l) Th1 = 2 and Th2 = 0.95.

Figure 4. (a) Reaction step 1 and (b) reaction step 2 to the target molecule, compound A.

Figure 5. (a) Reaction step 1, (b) reaction step 2, and (c) reaction step 3 to the target molecule, compound B.

Figure 6. (a) Reaction step 1, (b) reaction step 2, and (c) reaction step 3 to the target molecule, compound C.

Figure 7. Alternative reaction steps to compound C: (a) reaction step 1, (b) reaction step 2, and (c) reaction step 3 to the target molecule.

Figure 8. (a) Reaction step 1, (b) reaction step 2, and (c) reaction step 3 to the target molecule, compound D.

Figure 9. The CI of synthesis vs. steps graph for the retrosynthesis of each of the compounds.

Figure 11. In silico AI-based framework for generating the molecules in D.

Table 1. The

Φ_{s c o r e}

and

C I

for molecules captured by

Γ_{2 / 0.94}

.

Table 1. The

Φ_{s c o r e}

and

C I

for molecules captured by

Γ_{2 / 0.94}

.

Parameter	$Φ_{score}$	$CI$
Oc1ccc(cc1CCCC)CC(=O)N	1.977917	0.976
O=C(NCCc1ccc(O)cc1C)Cc2cccc3ccccc32	1.907631	0.969
Oc1ccc(CCCc2ccc(O)cc2)cc1C(=O)N	1.904253	0.969
O=C(Nc1c(O)ccc(NC(=O)C)c1)CCC	1.760964	0.968

Table 2. Principal synthesis precursors to compound A.

Target Molecule: Oc1ccc(cc1CCCC)CC(=O)N
Molecular Formula: $C_{12} H_{17} N O_{2}$
Precursors for Synthesis
i. 1,4 -Dioxane, (solvent)	ii. Palladium, Pd(PPh3)4, (catalyst)
iii. Potassium carbonate, K2CO3, (base)	iv. Butyl boronic acid (reactant)
v. ethyl 2-(3-bromo-4-hydroxyphenyl)acetate (reactant)

Table 3. Principal synthesis precursors to compound B.

Target Molecule: O=C(NCCc1ccc(O)cc1C)Cc2cccc3ccccc32
Molecular Formula: $C_{25} H_{23} N O_{2}$
Precursors for Synthesis
i. Tetrahydrofuran (THF)	ii. Triethylamine
iii. Triphenylphosphine (catalyst)	iv. Dichloromethane
v. 1-(2-azidoethyl)-4-methoxy-2-methylbenzene

Table 4. Principal synthesis precursors to compound C.

Target Molecule: Oc1ccc(CCCc2ccc(O)cc2)cc1C(=O)N
Molecular Formula: $C_{17} H_{18} N O_{3}$
Precursors for Synthesis
i. 2-hydroxy-5-(3-(4-hydroxyphenyl)propyl)benzaldehyde	ii. Sodium chlorite
iii. Ethanol	iv. Ammonia
v. Sulfuric acid	vi. Methanol

Table 5. Principal synthesis precursors to compound D.

Target Molecule: O=C(Nc1c(O)ccc(NC(=O)C)c1)CCC
Molecular Formula: $C_{12} H_{15} N_{2} O_{3}$
Precursors for Synthesis
i. N-(3-amino-4-methoxyphenyl)acetamide	ii. Dichlromethane
iii. Triethylamine	iv. Vinylacetate chloride

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Motente, M.; Chude-Okonkwo, U.A.K. Integrating Synthetic Accessibility Scoring and AI-Based Retrosynthesis Analysis to Evaluate AI-Generated Drug Molecules Synthesizability. Drugs Drug Candidates 2025, 4, 26. https://doi.org/10.3390/ddc4020026

AMA Style

Motente M, Chude-Okonkwo UAK. Integrating Synthetic Accessibility Scoring and AI-Based Retrosynthesis Analysis to Evaluate AI-Generated Drug Molecules Synthesizability. Drugs and Drug Candidates. 2025; 4(2):26. https://doi.org/10.3390/ddc4020026

Chicago/Turabian Style

Motente, Mokete, and Uche A. K. Chude-Okonkwo. 2025. "Integrating Synthetic Accessibility Scoring and AI-Based Retrosynthesis Analysis to Evaluate AI-Generated Drug Molecules Synthesizability" Drugs and Drug Candidates 4, no. 2: 26. https://doi.org/10.3390/ddc4020026

APA Style

Motente, M., & Chude-Okonkwo, U. A. K. (2025). Integrating Synthetic Accessibility Scoring and AI-Based Retrosynthesis Analysis to Evaluate AI-Generated Drug Molecules Synthesizability. Drugs and Drug Candidates, 4(2), 26. https://doi.org/10.3390/ddc4020026

Article Menu

Integrating Synthetic Accessibility Scoring and AI-Based Retrosynthesis Analysis to Evaluate AI-Generated Drug Molecules Synthesizability

Abstract

1. Introduction

2. Results

2.1. Retrosynthetic Feasibility Analysis of Compound A

2.2. Retrosynthetic Feasibility Analysis of Compound B

2.3. Retrosynthetic Feasibility Analysis of Compound C

2.4. Retrosynthetic Feasibility Analysis of Compound D

2.5. The CI of Synthesis vs. Steps Analysis

3. Review of Relevant Literature

4. Methods

4.1. AI-Generated Molecules Dataset

4.1.1. Complexity-Based Synthetic Accessibility Estimation

4.1.2. Data-Driven Retrosynthetic Analysis

4.1.3. Predictive Synthetic Feasibility Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI