Next Article in Journal
Metronomic 5-Fluorouracil and Vinorelbine Reduce Cancer Stemness and Modulate EZH2/NOTCH-1/STAT3 Signaling in Triple-Negative Breast Cancer Spheroids
Previous Article in Journal
Designing Neural Dynamics: From Digital Twin Modeling to Regeneration
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

KRASAVA—An Expert System for Virtual Screening of KRAS G12D Inhibitors

by
Oleg V. Tinkov
1,*,
Pavel E. Gurevich
2,1,
Sergei A. Nikolenko
1,
Shamil D. Kadyrov
1,
Natalya S. Bogatyreva
1,3,
Veniamin Y. Grigorev
4,
Dmitry N. Ivankov
5,1 and
Marina A. Pak
1
1
Ligand Pro, Moscow 121205, Russia
2
Artificial Intelligence Center, Moscow 121205, Russia
3
Institute of Protein Research, Russian Academy of Sciences, Pushchino 142290, Russia
4
Institute of Physiologically Active Compounds, Federal Research Center of Problems of Chemical Physics and Medicinal Chemistry, Russian Academy of Sciences, Chernogolovka 142432, Russia
5
Center for Molecular and Cellular Biology, Moscow 121205, Russia
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2026, 27(1), 120; https://doi.org/10.3390/ijms27010120 (registering DOI)
Submission received: 9 November 2025 / Revised: 10 December 2025 / Accepted: 18 December 2025 / Published: 22 December 2025
(This article belongs to the Special Issue Recent Advances in Computer-Aided Drug Design)

Abstract

The development of KRAS G12D inhibitors represents an effective therapeutic strategy for treating oncological pathologies. Existing quantitative structure-activity relationship (QSAR) models for KRAS G12D inhibitors have several limitations, primarily the lack of applicability domain determination and virtual screening implementation. In this study, we propose a set of regression QSAR models for KRAS G12D inhibitors by employing various molecular descriptors and machine learning methods. Our consensus model achieved a Q2 test value of 0.70 on an external test set, covering 78% of the data within the applicability domain. We integrated this consensus model into our Python-based framework KRASAVA. The platform predicts inhibitory activity while considering the applicability domain, assesses compounds for compliance with Muegge’s bioavailability rules, and identifies PAINS, toxicophores, and Brenk filters. Furthermore, we structurally interpreted the QSAR models to propose several promising inhibitors and performed molecular docking on these candidates using GNINA. For the reference inhibitor MRTX1133, we reproduced the crystal structure pose with an RMSD of 0.76 Å (PDB ID: 7T47). The key interactions with amino acid residues Asp12, Asp69, His95, Arg68, and Gly60, identified for both MRTX1133 and our proposed compounds, demonstrate a strong consistency between the molecular docking and QSAR results.

1. Introduction

The KRAS gene (Kirsten rat sarcoma viral oncogene) encodes a protein from the Ras GTPase family playing a central role in cellular signaling by regulating proliferation, differentiation, and cell survival [1,2]. Normally, the KRAS protein cycles between an active (GTP-bound) and inactive (GDP-bound) state. The G12D mutation, involving the mutation of glycine with aspartic acid at position 12, is among the most common and aggressive KRAS mutations [3,4,5,6]. The G12D mutation constitutively activates KRAS, directly causing uncontrolled cell proliferation, tumor formation, and alterations in the tumor microenvironment. This direct causal link makes KRAS G12D a critically important oncological target, whose inhibition can fundamentally alter disease progression [7]. The G12D mutation locks KRAS in the active state, leading to continuous stimulation of downstream signaling pathways such as MAPK/ERK and PI3K/AKT. Persistent activation of these pathways drives uncontrolled cell growth, tumor formation, and modification of the tumor microenvironment [8]. The G12D mutation is the most common subtype of KRAS mutations in pancreatic cancer and a frequently occurring dominant subtype in colorectal cancer [9]. Consequently, G12D is not merely a marker but an active driver of cancer pathogenesis. Targeting this mutation addresses the core mechanism of oncogenesis, explaining its importance in drug development [10]. Recent progress in the development of KRAS G12D inhibitors, including MRTX1133 and ASP3082, demonstrate the clinical relevance of targeting the G12D mutation [11,12].
In modern pharmacology, drug discovery remains a complex and resource-intensive process. Quantitative structure–activity relationships (QSAR) serve as a fundamental statistical tool to correlate the chemical structure of molecules with their biological activity through molecular descriptors [13]. QSAR offers significant value by predicting the biological activity of novel compounds, thereby reducing costly and labor-intensive experimental assays and accelerating the drug discovery timeline. QSAR models play a pivotal role in the early stages of drug development, enabling the efficient elimination of compounds with undesirable properties prior to extensive laboratory testing [14]. A critical aspect of drug development is the prediction and understanding of toxicity and bioavailability profiles, as toxicity causes preclinical failure for approximately 30% of compounds [15], and inadequate pharmacokinetics eliminates up to 15% of candidates at the preclinical stage [16].
For example, MRTX1133 is a highly selective, non-covalent inhibitor of the mutant KRAS G12D protein. Preclinical studies demonstrated its exceptional potency and specificity (IC50 < 2 nM, ~700–1000-fold selectivity over KRAS WT) and induced significant tumor regression in xenograft and immunocompetent mouse models [17,18]. Despite the promise of MRTX1133 as a KRAS G12D inhibitor, preclinical pharmacokinetic studies revealed challenges for its therapeutic development. A 2024 study [19] in rats reported very low oral bioavailability of MRTX1133 at only 2.92% and a short plasma half-life of 1.12 h after oral administration. These data indicated potential difficulties in achieving and maintaining adequate therapeutic concentrations in humans, a critical factor for successful clinical translation. Clinical trials of MRTX1133 (phase 1/2), initiated in March 2023, were prematurely terminated in early 2025 after phase 1 completion. The reasons included unstable and inadequate pharmacokinetics alongside highly variable and unsatisfactory bioavailability data [20].
This example highlights the necessity of thoroughly evaluating both inhibitory activity and ADMET properties for KRAS G12D inhibitors. Traditional oral bioavailability evaluation relies on rules introduced by Lipinski [21], Muegge [22], Ghose [23], Veber [24], and Egan [25]. Among these, Muegge’s rules incorporate the largest set of parameters for bioavailability assessment, making them especially useful at early development stages. Muegge’s rules define an expanded drug-likeness filter that sets explicit thresholds—molecular weight between 200 and 600 Da, logP ≤ 5, ≤10 hydrogen-bond acceptors, ≤5 hydrogen-bond donors, ≤15 rotatable bonds, TPSA ≤ 150 Å2, and ≤7 rings—to rigorously prioritize bioavailability compounds during early-stage virtual screening. Notably, MRTX1133 does not comply with Muegge’s criteria, showing a molecular weight exceeding 600 Da and possessing more than seven ring systems. Various medicinal chemistry filters are applied early in drug development to assess compound compliance with established bioavailability rules and to exclude structural alerts such as Brenk filters and PAINS. Brenk filters consist of 105 structural fragments that increase toxicity risk, impair pharmacokinetic properties, and generally reduce suitability for drug candidates [26]. Pan-assay interference compounds (PAINS) represent chemical entities that frequently produce false-positive results in high-throughput screening. PAINS exhibit nonspecific interactions across multiple biological targets rather than selectively acting on the intended target [27].
Given the relevance of KRAS inhibition as a therapeutic strategy, several research groups have developed satisfactory QSAR models linking chemical structure to inhibitory activity against KRAS [28,29,30,31]. Srisongkram et al. constructed robust QSAR models using a dataset of 1033 compounds, achieving Q c v 2 = 0.60 and Q e x t 2 = 0.62 [28]. Srisongkram and Weerapreeyakul [29] proposed both classification and regression QSAR models targeting drug repurposing of FDA-approved compounds as KRAS G12C inhibitors, using a dataset of 1255 molecules with reported metrics of Q c v 2 = 0.60, Q e x t 2 = 0.62, Accuracy cv = 0.84, Accuracy ext = 0.85. Both studies [28,29] employed compounds with experimentally measured IC50 values against KRAS G12C and applied the XGBoost gradient boosting algorithm together with molecular descriptors, including PubChem and substructural fingerprints. The authors utilized SHAP-based interpretation to identify key fragment contributions. The QSAR results from both studies received further validation through molecular docking.
Studies [30,31] proposed classification-based QSAR models for KRAS G12D inhibitors. Despite acceptable statistical performance, these models present a major limitation: they lack an explicitly defined applicability domain, violating a key QSAR modeling principle established by the OECD expert group [32]. Without this domain, QSAR models cannot meet regulatory suitability criteria. Additionally, these studies [30,31] do not clearly detail data curation and preprocessing procedures, a critical step given the risk of errors and inconsistencies in publicly available chemical databases and other sources [33]. To identify the most active compounds during virtual screening, it is advisable to employ regression models that predict the inhibitory activity levels in greater detail than classification models.
Importantly, the effective application of QSAR models for KRAS inhibitors in medicinal chemistry requires appropriate software tools, either desktop or web-based. Unfortunately, the aforementioned studies [28,29,30,31] did not provide such tools.
In this study, we aimed to construct regression QSAR models for KRAS G12D inhibitors using various molecular descriptors and machine learning methods and to integrate the developed QSAR model into our web-based Python v3.11 framework, KRASAVA (KRAS Automated Virtual Assistant), enabling virtual screening of KRAS G12D inhibitors with simultaneous bioavailability assessment based on Muegge’s rules.
The main stages of this study are presented in Figure 1: (1) collection of experimental data of KRAS G12D inhibitors; (2) data validation taking into account generally accepted recommendations [33]; (3) division of the total dataset into training and testing sets; (4) exploratory data analysis; (5) calculation of molecular descriptors; (6) development, validation and structural interpretation of QSAR regression models; (7) rational molecular design; (8) development of the KRASAVA framework as a reproducible Jupyter Notebook v7.5., executable via Google Colab with subsequent implementation of the best QSAR models; (9) molecular docking of the promising compounds under study. A detailed description of each stage is provided in the Methods and Materials section.

2. Results and Discussion

We developed several QSAR models for KRAS G12D inhibitors as described in Materials and Methods. Table 1 contains the statistical parameters of the QSAR models. All developed models of KRAS G12D inhibitors exhibit satisfactory statistical characteristics and demonstrate comparable predictive performance. The models developed using ECFP4, Topological Path-Based fingerprints, and 2D RDKit descriptors and the CatBoost algorithm demonstrated the best statistical performances (Table 1, the QSAR models highlighted in bold). The p-values for these models under y-randomization are below 0.02, confirming the absence of random correlations in the proposed QSAR models. The consensus model was developed by integrating the above three best models. The applicability domain in consensus forecasting is calculated using ECFP4 fingerprints, since they have the smallest data coverage compared to other types of descriptors used in the consensus model.
The predictive ability of the consensus QSAR model was further validated using a second independent external test set. Due to the relatively small structural space described by the proposed QSAR model, only 21 compounds were included in the applicability domain, with a Q2ts of 0.73, demonstrating a sufficiently high level of predictive ability for the proposed consensus QSAR model.
Additionally, we developed 63 QSAR models using the OCHEM platform; see their statistical parameters in Appendix A, Table A1. We obtained the best results by the Random Forest method using, again, ECFP4 descriptors: Q c v 2 = 0.66; Q t s 2 = 0.69. The model is publicly available at https://ochem.eu/model/20748154 (accessed on 10 December 2025).
To investigate the influence of structural features on the inhibitory activity of KRAS inhibitors, we performed a structural interpretation of the QSAR models developed using Klekota-Roth and PubChem descriptors (Figure 2 and Table A2 and Table A3). In addition, we carried out a matched molecular pair analysis (MMPA) for the best QSAR model developed using the OCHEM platform (Table 2).
Summarizing the obtained results, we can identify the following main structural modifications that increase the inhibitory activity of KRAS G12D inhibitors:
1.
Substitution of a nitrile group with a hydroxy or alkyne group (molecular transformations 1 and 5 in Table 2), which is consistent with the high contribution to activity of descriptors No. 4 and 9 shown in Figure 2A and Table A2, as well as descriptors No. 4, 6, and 9 shown in Figure 2B and Table A3;
2.
Replacement of a methoxy group with an alkyne group (molecular transformation 2 in Table 2);
3.
Replacement of a pyridine fragment with a 1-methylpyrrolidine fragment (molecular transformation 3 in Table 2);
4.
Substitution of a methoxy group with a hydroxy group (molecular transformation 4 in Table 2);
5.
Replacement of an ethylene fragment with a pyrrolizidine fragment (molecular transformation 6 in Table 2);
6.
Introduction of a hydroxyl group in the para-position (molecular transformation 7 in Table 2), which correlates with the high contribution to activity of descriptors No. 4 and 9 (Figure 2A, Table A2) and descriptors No. 4, 6, and 9 (Figure 2B, Table A3);
7.
Elongation of the linker connected to the imidazole ring (molecular transformation 8 in Table 2);
8.
Transformation of a phenyl group into a naphthyl group, consistent with the high contribution to activity of descriptors No. 2, 3, and 5 (Figure 2A, Table A2) (molecular transformation 9 in Table 2);
9.
Replacement of a pyridine fragment with an imidazole ring (molecular transformation 10 in Table 2).
We integrated the consensus QSAR model into the KRASAVA framework, freely available at https://github.com/ovttiras/QSAR_KRAS_inhibitors_v2/blob/main/KRASAVA%20v2.ipynb (accessed on 10 December 2025), a reproducible Jupyter Notebook, executable via Google Colab. In KRASAVA, one can enter the information on the chemical structures of compounds under investigation via the SMILES linear notations [34], or files in CSV or SDF format. One of the processing steps in KRASAVA is the automatic validation and standardization of the input chemical structures. If a user enters an invalid structure, the application returns the index number of the compound in the uploaded CSV or SDF file, along with the SMILES notation of the corresponding structure.
For chemical structures that successfully pass validation and standardization, the application checks for the availability of experimental IC50 values for KRAS G12D in the ChEMBL database. If experimental data are available, the framework displays the mean value, standard deviation of the experimental IC50 values, and the corresponding ChEMBL compound ID [35]. In this case, activity prediction is not performed.
When analyzing individual compounds, the KRASAVA framework implements the assessment of compliance with Muegge’s rules. Previously, we proposed a set of structural fragments—toxicophores—that consistently increase the level of acute oral toxicity in rats [36]. We integrated the identification of these fragments, along with Brenk filters and PAINS, into the KRASAVA framework (Figure 3), allowing preliminary evaluation of oral bioavailability and potential toxicity.
In addition, the framework includes identification of molecular fragments associated with the most significant molecular descriptors (Figure 2) that were determined through structural interpretation for the purposes of molecular design.
Based on the identified structure-activity relationships and using the capabilities of the KRASAVA framework, we performed a rational molecular design using the compound 4-(3,8-diazabicyclo[3.2.1]octan-3-yl)-8-fluoro-2-[[(2R,8S)-2-fluoro-1,2,3,5,6,7-hexahydropyrrolizin-8-yl]methoxy]-7-[5-methoxy-2-(trifluoromethoxy)phenyl]pyrido [4,3-d]pyrimidine (PubChem CID 156124915, SCHEMBL23053462, BDBM573509, Example 361), having the pIC50 value of 5.57, according to the patent US-11453683-B1 [37].
If the structure of compound BDBM573509 is modified by replacing the methoxy group with a hydroxy group (pattern No. 4) and the trifluoromethoxy group (a methoxy derivative) with an alkyne group (pattern No. 2), the inhibitory activity increases—the calculated pIC50 value for the new compound (compound 1) is 7.98 (Figure 4). In the modified compound 1, unlike the original molecule BDBM573509, pattern No. 6 is also satisfied, according to which a hydroxy group in the para-position relative to other substituents enhances the inhibitory activity against KRAS G12D.
Further modification, specifically the elimination of the fluorine atom initially located as a substituent in the pyrrolizidine ring, also increases the inhibitory activity—for compound 2, the predicted pIC50 value is 8.05 (Figure 4). According to the ChEMBL database, the experimental pIC50 value for the known KRAS G12D inhibitor MRTX1133 (PubChem CID 162369732, CHEMBL5081048, PDB ID 6IC) is 8.25 ± 0.47.
In addition, to optimize bioavailability, we performed modifications of the training set compounds and obtained the proposed compound 3 (Figure 4) with a predicted pIC50 value of 7.49. Using the KRASAVA framework, we predicted the inhibitory activities of compounds 13 by considering their inclusion within the applicability domain of the consensus QSAR model.
For comparative analysis of predictive performance, we also calculated the inhibitory activities of compounds 13 using the best QSAR model developed on the OCHEM platform (https://ochem.eu/model/20748154) (accessed on 10 December 2025). For compound 1 and compound 2, this model predicts a pIC50 value of 8.80, while for compound 3, the predicted value is 8.1.
Compared to MRTX1133, the investigated compounds 2 and 3 (Figure 5 and Table 3) exhibit a better combination of physicochemical properties, suggesting a potentially acceptable level of bioavailability.
It should be noted that we did not find the structures and experimental IC50 values for the investigated compounds 1–3 in the SureChEMBL (https://www.surechembl.org/) (accessed on 10 December 2025), PubChem (https://pubchem.ncbi.nlm.nih.gov/) (accessed on 10 December 2025), or BindingDB (https://www.bindingdb.org/rwd/bind/index.jsp) (accessed on 10 December 2025) databases. However, further studies are required to assess the patent landscape for compounds 13 through a detailed analysis of Markush structures in existing patents. Moreover, compounds 1 and 3 are considered solely as examples of rational molecular design based on the structural interpretation of QSAR models, aimed at the balanced search for active KRAS G12D inhibitors with acceptable bioavailability.
For comparative analysis of the QSAR modeling results, we additionally performed molecular docking of compound BDBM573509, compounds 13, using MRTX1133 as a reference compound. It binds to a specific pocket (S-IIP) on the surface of mutant KRAS G12D, which forms only in the presence of the mutation and in the inactive (GDP-bound) state of the protein. According to [38], MRTX1133 forms key interactions with Asp69, Asp12, Gly60, Glu62, His95, Arg68, as well as weaker interactions with Gln99, Glu63, and Ser65 (Figure 6).
Table 4 presents the results of molecular docking for compound BDBM573509, compounds 13, and MRTX1133. According to the data in Table 4, there is a weak correlation between the pIC50 values and the docking scores. Previously, it was shown that molecular docking was explored in parallel with QSAR modeling, but molecular docking failed to correctly discriminate between experimentally active and inactive compounds [39]. In this regard, the priority task in this study was not to calculate the values of docking scores, but primarily to analyze the interactions of the studied compounds with key amino acid residues in the active center of the KRAS G12D enzyme.
Figure 6, Figure 7, Figure A1, Figure A2, Figure A3, Figure A4 and Figure A5 show the interactions of compound BDBM573509, compounds 13, and MRTX1133 with key amino acid residues in the KRAS G12D active site. One can see the following key interactions from Figure A1, Figure A2, Figure A3, Figure A4 and Figure A5:
  • Ionic interactions with Asp12—compound MRTX1133;
  • Gly60 hydrogen bond—compound MRTX1133, compound 1 and compound 2;
  • His95 hydrogen bond—compounds MRTX1133, BDBM573509, compound 1 and compound 2;
  • Arg68 hydrogen bond—compounds MRTX1133, BDBM573509, compound 1 and compound 3;
  • Asp69 hydrogen bond—compound MRTX1133 and compound 3.
The 2D visualization of the interactions of compound BDBM573509, compounds 13, and MRTX1133 with key amino acid residues is provided in Appendix A (Figure A1, Figure A2, Figure A3, Figure A4 and Figure A5). The identified interactions are supported by previous studies on similar compounds [38].
Analyzing the data in Table 2 and Figure 2 and Figure 6, we noted a consistent correlation between the structural interpretation of the QSAR models and molecular docking results. According to the QSAR model interpretation, descriptors describing the hydroxy substituent (descriptors No. 2 and 4 in Figure 2A, as well as descriptor No. 5 in Figure 2B) contribute significantly to activity, which is confirmed by the formation of hydrogen bonds with Asp69 in MRTX1133.
The present study of inhibitors has several distinctive features compared to existing studies also devoted to QSAR analysis of KRAS inhibitors. The main advantage of the present study compared with previous works by Srisongkram et al. [28,29] is twofold. First, we have developed and validated QSAR models for KRAS G12D inhibitors, which are of greater medicinal chemistry interest than the KRAS G12C inhibitors investigated by Srisongkram et al. The development of KRAS G12D inhibitors is particularly promising for the treatment of pancreatic and colorectal cancers, where the clinical need is high. Second, compared with the work of Srisongkram et al. [28,29], we propose a freely available framework that significantly automates the virtual screening of KRAS G12D inhibitors. A comparative analysis of this study with existing ones is presented in Table 5, which demonstrates that our research has several advantages. For example, in this study, when developing QSAR regression models, we defined the applicability range, performed a structural interpretation, and, most importantly, integrated the proposed consensus models into the KRASAVA framework, which enables intensified virtual screening of KRAS G12D inhibitors, taking into account preliminary assessments of oral bioavailability and toxicity. The developed QSAR models (Table 1), as well as the algorithms for their construction, statistical characteristics in the form of Jupyter Notebook program files, and the results of molecular docking, are freely available at the GitHub repository https://github.com/ovttiras/QSAR_KRAS_inhibitors_v2 (accessed on 10 December 2025).

3. Methods and Materials

3.1. Dataset of KRAS G12D Inhibitors

For QSAR modeling, we extracted a dataset of compounds with experimentally reported IC50 values against KRAS G12D from publication [40]. In accordance with established recommendations for data curation in cheminformatics [33], we verified and preprocessed the initial dataset, consisting of 645 compounds and publicly available at https://zenodo.org/records/11137638 (accessed on 10 December 2025). Mixtures, compounds with incorrect or inconsistent chemical structures, and salts were excluded from further consideration. During the curation process, nine structurally invalid entries and four salt forms were identified and removed. The original Python code used for dataset validation is publicly available at https://github.com/ovttiras/QSAR_KRAS_inhibitors_v2/blob/main/DATA%20curation%20and%20cleaning.ipynb (accessed on 10 December 2025).
Since the objective of this study was to construct regression QSAR models capable of predicting IC50 values toward KRAS G12D, only entries containing explicit numerical IC50 values were retained; records with non-numeric values containing “>” or “<“ qualifiers were excluded. Experimental bioactivity values expressed in molar IC50 were converted into their negative decimal logarithm (pIC50), which is conventionally used in QSAR studies due to its improved linearity with respect to biological response.
For compounds having two or more reported experimental pIC50 measurements, we calculated the mean and standard deviation. We retained only entries with a standard deviation not exceeding 0.5 log units, in accordance with a previously described procedure for handling duplicate measurements [41].
To evaluate the predictive performance of the QSAR models, we divided the initial dataset comprising 566 compounds into training (ws) and test (ts) sets. The final sorted dataset was ordered by increasing pIC50, and every fifth compound was assigned to the test set, while the remaining entries formed the training set. As a result, the training set and the test set comprised 452 and 114 compounds, respectively. We exported both subsets into SDF format using RDKit v 2025.03.6 in Python v3.11; they are publicly available at https://github.com/ovttiras/QSAR_KRAS_inhibitors_v2/tree/main/datasets (accessed on 10 December 2025).
The distribution of activity values in the training and test sets is shown in Figure 8A. As can be seen, the two sets exhibit similar ranges and frequencies of experimental pIC50 values. The investigated compounds cover a sufficiently broad pIC50 range of more than five logarithmic units, which has a positive effect on the descriptive and predictive performance of the developed QSAR models. As previously demonstrated in [42], an adequate QSAR model requires an activity range of at least one logarithmic unit.
Visualization of the chemical space of the training and test sets in the molecular weight (MW)—lipophilicity (LogP) coordinate system is presented in Figure 8B. Analysis of this plot indicates a sufficiently high level of chemical diversity in both subsets, as evidenced by the wide ranges of molecular weight and lipophilicity of the investigated compounds.
In order to further assess the predictive power of the developed models, a second external test set was formed by aggregating compounds with experimental pIC50 values for KRAS G12D from the ChEMBL [35], BindingDB [43] databases, and a study [44] devoted to the development of KRAS G12D inhibitors. When collecting data, the data verification and preprocessing methodology described above was applied. The total volume of the second independent test sample was 1266 compounds.

3.2. Development of QSAR Models

Molecular structures were described using ECFP4 (radius = 2, nBits = 1024, useFeatures = False, useChirality = False), MACCS (166-bit structural keys), Klekota-Roth, PubChem descriptors, Topological Torsion, Atom Pairs, and Topological Path-Based fingerprints. Descriptor calculation was performed using the RDKit v2025.03.6 [45] and PaDEL-Descriptor v0.1.11 (PadelPy) [46,47] libraries in Python v3.11. We built QSAR models using the scikit-learn [48] and CatBoost v1.2.6 [49] libraries, applying gradient boosting (CatBoost), support vector machines (SVM), and fully connected neural networks with a multilayer perceptron (MLP) architecture. Hyperparameters of the models were automatically optimized using GridSearchCV implemented in scikit-learn. Both the model hyperparameters and the implementation scripts in Jupyter Notebook format are publicly available at https://github.com/ovttiras/QSAR_KRAS_inhibitors_v2 and can be utilized for QSAR modeling of other types of biological activity.
To assess the robustness of the models, five-fold internal cross-validation (5-fold CV) was performed [50]. The inclusion of test set compounds within the applicability domain (AD) was evaluated using the similarity distance approach [51]. A test compound is considered to belong to the QSAR model’s applicability domain if its similarity distance does not exceed the threshold value Dc, calculated using Equation (1):
D c = Z σ + y _ ,
where y _ and σ are the mean and standard deviation of the Euclidean distances in the descriptor chemical space between all compounds in the training set and their nearest neighbors; Z is a constant, typically set to 0.5.
Data coverage (Cov) within the applicability domain was calculated as the ratio of the number of test set compounds falling within the AD to the total number of compounds in the test set. The predictive performance of the QSAR models was evaluated using the coefficient of determination (Q2):
Q 2 = 1   i ( y i y ^ i ) 2 i ( y i y m e a n ) 2 ,
And the root mean square error (RMSE):
RMSE = i = 1 m ( y i y ^ i ) 2 m 1 ,
where y i is the observed activity of the i-th compound, y ^ i is the predicted activity of the i-th compound, y m e a n is the mean observed activity, and m is the number of compounds in the dataset. We calculated the Q2 and RMSE values using the scikit-learn library.
To improve predictive power, we developed a consensus model. The predicted consensus activity value was calculated as the average of the predictions from the three best QSAR models.
For validation of adequate QSAR models, the y-randomization method with 50 iterations was applied using the permutation_test_score module in the scikit-learn library. The absence of random correlation in the QSAR models was estimated using the Q2_rand determination coefficient and p-value. If a p-value is sufficiently small, usually below a certain threshold (e.g., 0.05), there is no random correlation in the relationship described by a model [52].
During structural interpretation, we evaluated the contribution of descriptors to the QSAR models using the SHAP library in Python v3.11 [53]. We assessed feature importance in SHAP v0.44.0 based on Shapley values, which quantify the contribution of each descriptor to the model predictions.
For a comparative analysis of the predictive performance of the developed QSAR models of KRAS inhibitors, we additionally employed the OCHEM web platform (https://ochem.eu) (accessed on 10 December 2025), which provides various sets of molecular descriptors and machine learning algorithms. The OCHEM platform integrates several software packages for calculating a wide range of descriptor sets. In this study, we used eleven descriptor sets for QSAR model development, including OEState [54] combined with AlogPS [55], CDK [56], Dragon v7 [57], QNPR descriptors [58], Extended Connectivity Fingerprint 4 (ECFP4) [59], alvaDesc [60], Fragmentor (length: 2–4) [61], MOLD2 [62], and MORDRED [63]. QSAR models were developed using various machine learning methods, including Associative Neural Networks (ASNN) [55], Random Forest, Gradient Boosting implemented in the XGBoost library [64], Deep Neural Networks (DNN) [65], k-Nearest Neighbors (KNN) [66], and Multiple Linear Regression Analysis (MLRA) [67]. In addition, several deep learning approaches were applied: Transformer Convolutional Neural Network (TRANSNNI) [68], Attentive FP [69], and Chemprop [70]. The models were built using optimized parameter settings for each machine learning method provided by the OCHEM platform. The applicability domain was assessed using the “distance-to-model” concept, specifically the “BAGGING-STD” approach described in the OCHEM user manual [71].

3.3. Framework KRASAVA

We implemented the consensus QSAR model in KRASAVA (https://github.com/ovttiras/QSAR_KRAS_inhibitors_v2/blob/main/KRASAVA%20v2.ipynb) (accessed on 10 December 2025), a Python-based framework realized as a reproducible Jupyter Notebook, executable via Google Colab. The KRASAVA framework was created in Python v3.11 and utilizes the RDKit v2025.03.6 [45], scikit-learn v1.3.1 [48], NumPy v1.22.1 [72], and Pandas v1.3.5 [73] libraries.

3.4. Molecular Docking

We used GNINA v1.3 [74] because its machine-learning-based scoring functions provide superior performance compared to Vina. Although Gnina follows the same docking workflow as Vina, it achieves higher accuracy by rescoring ligand poses with convolutional neural network scoring functions after the initial Vina scoring. This additional ML-based rescoring step improves the reliability of binding pose predictions. Gnina demonstrated strong performance in practical applications [75] and independent benchmark studies [76], further supporting its reliability.
To validate the selected docking protocol, a re-docking procedure was conducted, in which the best docking pose of the ligand was superimposed onto the co-crystallized ligand conformer, and the root-mean-square deviation (RMSD) was measured. According to [77], the generally accepted RMSD threshold for re-docking, given the chosen preprocessing and docking protocol, should be ≤2 Å.
For molecular docking of potential KRAS G12D inhibitors, the 7T47 protein model (resolution: 1.27 Å, single chain A) was selected from the RCSB Protein Data Bank (https://www.rcsb.org/) (accessed on 10 December 2025), as it contains a co-crystallized KRAS G12D inhibitor MRTX1133 (PubChem CID: 162369732, CHEMBL5081048, PDB Chemical Component ID: 6IC), which served as a reference compound [11].
Ligand and protein preprocessing were performed in Chimera [78] using the Dock Prep module. Ligand preprocessing included protonation and geometry optimization using the GAFF2/AM1-BCC force field. During the preprocessing of the molecular target, the native 7T47 protein model was cleaned of co-crystallized ligands (GCP, GDP, glycerol, and acetate ion), water molecules, and the magnesium ion. Chain A was protonated at pH 7.4, partial charges were added, and missing amino acid residues were restored.
In GNINA, the binding site was defined based on the position of MRTX1133 (−23.00 Å; 5.14 Å; 23.02 Å) using the autobox_ligand option with default parameters (exhaustiveness = 16, num_modes = 9) and a 4 Å margin in all directions.
During the re-docking of MRTX1133, the RMSD value was 0.76 Å (Figure 9), indicating that the applied docking protocol successfully reproduces ligand conformations consistent with crystallographic data. The RMSD value was calculated using the mcs_rmsd function implemented in the useful_rdkit_utils library v 0.93 [79].
Molecular docking results were visualized using the ProteinsPlus web application [80]. All docking data are publicly available at https://github.com/ovttiras/QSAR_KRAS_inhibitors_v2/tree/main/Docking (accessed on 10 December 2025).

4. Conclusions

The conducted study enables the following outcomes:
  • Development of a series of regression QSAR models for KRAS G12D inhibitors using ECFP4, Klekota-Roth, PubChem, MACCS, Topological Torsion, Atom Pairs, Topological Path-Based fingerprints, and RDKit descriptors, along with CatBoost, SVM, and MLP algorithms, as well as models developed via the OCHEM platform;
  • Structural interpretation of QSAR models for KRAS G12D inhibitors, identifying the most significant fragments and molecular transformations;
  • Integration of the consensus QSAR model into the KRASAVA framework, enabling retrieval of experimental data for investigated compounds, as well as virtual screening of potential KRAS G12D inhibitors with preliminary assessment of bioavailability through Muegge’s rules compliance, and evaluation of acute toxicity via identification of key toxicophores and Brenk filters;
  • Rational molecular design of compounds based on the structural interpretation results and capabilities of the KRASAVA framework, leading to the proposal of two most promising KRAS G12D inhibitors;
  • Comparative analysis of the proposed compounds through molecular docking, examining the nature of their interactions with the KRAS G12D binding site, and validating the results obtained from QSAR structural interpretation.
The results of this study are expected to reduce financial, temporal, and labor costs associated with the synthesis and testing of new KRAS G12D inhibitor drugs. We consider experimental validation an important direction for future studies.

Author Contributions

Conceptualization, O.V.T. and D.N.I.; methodology, O.V.T., V.Y.G. and M.A.P.; software, O.V.T.; validation, S.A.N., S.D.K. and N.S.B.; investigation, O.V.T. and D.N.I.; data curation, O.V.T. and S.A.N.; visualization, P.E.G. and M.A.P.; supervision, D.N.I.; project administration, N.S.B. All authors have read and agreed to the published version of the manuscript.

Funding

Part of this work was supported by the budget of the Institute of Physiologically Active Compounds of the Russian Academy of Sciences (IPAC RAS), State Targets—2024 [topic No. FFSG-2024–0019].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in the [GitHub repository] at [https://github.com/ovttiras/QSAR_KRAS_inhibitors_v2] (accessed on 10 December 2025), reference number [ovttiras/QSAR_KRAS_inhibitors_v2].

Conflicts of Interest

Authors Oleg V. Tinkov, Pavel E. Gurevich, Sergei A. Nikolenko, Shamil D. Kadyrov, Natalya S. Bogatyreva, Dmitry N. Ivankov, and Marina A. Pak were employed by the company “Ligand Pro”. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The authors declare that this study received funding partially from the Ligand Pro company and partially from the budget of the Institute of Physiologically Active Compounds of the Russian Academy of Sciences (IPAC RAS), State Targets—2024 [topic No. FFSG-2024–0019]. The funders were not involved in the study, design, collection, analysis, interpretation of data, the writing of this article, or the decision to submit it for publication.

Abbreviations

The following abbreviations are used in this manuscript:
QSARQuantitative Structure–Activity Relationships
ADApplicability Domain
5-fold CVFive-fold Internal Cross-validation

Appendix A

Appendix A.1

Table A1. Statistical characteristics of the developed QSAR models of KRAS inhibitors.
Table A1. Statistical characteristics of the developed QSAR models of KRAS inhibitors.
DescriptorsAlgorithmsTraining Set,
5-Fold CV
Test Set, All Compounds
Q c v 2 RMSE Q t s 2 RMSE
ALogPS, OEstateRFR0.570.800.590.79
ASNN0.590.780.500.87
KNN0.510.850.460.91
MLRA01.40.440.93
DNN0.580.790.500.91
XGBOOST0.570.800.590.79
CDKRFR0.610.760.570.81
ASNN0.590.780.530.85
KNN0.500.860.490.88
MLRA0.470.8901.3
DNN0.580.800.500.89
XGBOOST0.610.760.600.79
DragonRFR0.580.790.580.80
ASNN0.640.730.660.73
KNN0.510.850.530.85
MLRA01.50.360.99
DNN0.540.830.580.80
XGBOOST0.570.800.540.84
Fragmentor (length: 2–4)RFR0.600.770.620.77
ASNN0.550.820.610.77
KNN0.490.870.500.88
MLRA0.540.830.540.83
DNN0.450.910.50.85
XGBOOST0.570.800.600.79
MOLD2RFR0.590.780.580.80
ASNN0.590.780.500.84
KNN0.480.880.490.88
MLRA0.490.8707
DNN0.450.910.520.86
XGBOOST0.580.790.590.79
MORDREDRFR0.590.780.580.8
ASNN0.610.760.670.71
KNN0.520.850.490.89
MLRA02.808
DNN0.510.850.50.88
XGBOOST0.570.800.570.81
QNPRRFR0.590.780.620.77
ASNN0.450.900.520.86
KNN0.380.960.360.99
MLRA0.420.930.520.86
DNN0.430.920.610.77
XGBOOST0.550.820.570.81
ECFP4RFR0.630.750.690.69
ASNN0.630.740.660.72
KNN0.560.810.600.79
MLRA0.231.070.41
DNN0.510.860.40.94
XGBOOST0.610.760.670.71
RDKITRFR0.550.820.560.82
ASNN0.680.70.670.71
KNN0.530.840.510.87
MLRA02.40.31
DNN0.510.860.40.94
XGBOOST0.540.830.560.82
alvaDescRFR0.590.780.590.79
ASNN0.650.720.650.73
KNN0.510.860.510.87
MLRA01.609
DNN0.530.830.560.82
XGBOOST0.590.780.630.76
-AttFP AttFP0.540.830.650.73
-ChemProp0.530.840.610.77
-TRANSNNI0.600.770.660.72
Table A2. Most significant Klekota-Roth descriptors for constructing QSAR models.
Table A2. Most significant Klekota-Roth descriptors for constructing QSAR models.
ID and SMARTS or Identifier of SubstructureDescriptor Visualization
KRFP1932

[!#1]c1[cH]c([!#1])c([!#1])c([!#1])[cH]1
Ijms 27 00120 i021
KRFP4740

Oc1ccc2ccccc2c1
Ijms 27 00120 i022
KRFP3139

c1ccc2ccccc2c1
Ijms 27 00120 i023
KRFP2949

[OH]
Ijms 27 00120 i024
KRFP3592

Cc1cccc2ccccc12
Ijms 27 00120 i025
KRFP1566

[!#1]c1[cH][cH][cH][cH]c1[!#1]
Ijms 27 00120 i026
KRFP3751

CCN(C)C
Ijms 27 00120 i027
KRFP3719

CCCCCCC
Ijms 27 00120 i028
KRFP1148

[!#1][OH]
Ijms 27 00120 i029
Table A3.  Most significant PubChem fingerprints for constructing QSAR models.
Table A3.  Most significant PubChem fingerprints for constructing QSAR models.
IDSMARTS or Identifier of SubstructureDescriptor Visualization
PubchemFP336C(~C)(~C)(~C)(~N)Ijms 27 00120 i030
PubchemFP157>=3 any ring size 5
PubchemFP160>=3 saturated or aromatic heteroatom-containing ring size 5
PubchemFP590C-C:C-O-[#1]Ijms 27 00120 i031
PubchemFP714Cc1ccc(O)cc1Ijms 27 00120 i032
PubchemFP659O-C-C-N-CIjms 27 00120 i033
PubchemFP152>=2 saturated or aromatic nitrogen-containing ring size 5
PubchemFP797CC1CC(C)CCC1Ijms 27 00120 i034
PubchemFP699O-C-C-C-C-C(C)-CIjms 27 00120 i035
Figure A1. Key interactions of the co-crystallized ligand MRTX1133 with amino acid residues in the active site of the KRAS G12D 7T47 model.
Figure A1. Key interactions of the co-crystallized ligand MRTX1133 with amino acid residues in the active site of the KRAS G12D 7T47 model.
Ijms 27 00120 g0a1
Figure A2. Key interactions of the co-crystallized ligand BDBM573509 with amino acid residues in the active site of the KRAS G12D 7T47 model.
Figure A2. Key interactions of the co-crystallized ligand BDBM573509 with amino acid residues in the active site of the KRAS G12D 7T47 model.
Ijms 27 00120 g0a2
Figure A3. Key interactions of compound 1 with amino acid residues in the active site of the KRAS G12D 7T47 model.
Figure A3. Key interactions of compound 1 with amino acid residues in the active site of the KRAS G12D 7T47 model.
Ijms 27 00120 g0a3
Figure A4. Key interactions of compound 2 with amino acid residues in the active site of the KRAS G12D 7T47 model.
Figure A4. Key interactions of compound 2 with amino acid residues in the active site of the KRAS G12D 7T47 model.
Ijms 27 00120 g0a4
Figure A5. Key interactions of compound 3 with amino acid residues in the active site of the KRAS G12D 7T47 model.
Figure A5. Key interactions of compound 3 with amino acid residues in the active site of the KRAS G12D 7T47 model.
Ijms 27 00120 g0a5

References

  1. Bannoura, S.F.; Khan, H.Y.; Azmi, A.S. KRAS G12D Targeted Therapies for Pancreatic Cancer: Has the Fortress Been Conquered? Front. Oncol. 2022, 12, 1013902. [Google Scholar] [CrossRef]
  2. Zhu, G.; Pei, L.; Xia, H.; Tang, Q.; Bi, F. Role of Oncogenic KRAS in the Prognosis, Diagnosis and Treatment of Colorectal Cancer. Mol. Cancer 2021, 20, 143. [Google Scholar] [CrossRef]
  3. Zeissig, M.N.; Ashwood, L.M.; Kondrashova, O.; Sutherland, K.D. Next Batter up! Targeting Cancers with KRAS-G12D Mutations. Trends Cancer 2023, 9, 955–967. [Google Scholar] [CrossRef]
  4. Cox, A.D.; Der, C.J. Ras History. Small GTPases 2010, 1, 2–27. [Google Scholar] [CrossRef]
  5. Prior, I.A.; Lewis, P.D.; Mattos, C. A Comprehensive Survey of Ras Mutations in Cancer. Cancer Res. 2012, 72, 2457–2467. [Google Scholar] [CrossRef] [PubMed]
  6. Ryan, M.B.; Corcoran, R.B. Therapeutic Strategies to Target RAS-Mutant Cancers. Nat. Rev. Clin. Oncol. 2018, 15, 709–720. [Google Scholar] [CrossRef]
  7. Li, Y.; Yang, L.; Li, X.; Zhang, X. Inhibition of GTPase KRASG12D: A Review of Patent Literature. Expert Opin. Ther. Pat. 2024, 34, 701–721. [Google Scholar] [CrossRef] [PubMed]
  8. Muñoz-Maldonado, C.; Zimmer, Y.; Medová, M. A Comparative Analysis of Individual RAS Mutations in Cancer Biology. Front. Oncol. 2019, 9, 1088. [Google Scholar] [CrossRef] [PubMed]
  9. Varghese, A.M.; Perry, M.A.; Chou, J.F.; Nandakumar, S.; Muldoon, D.; Erakky, A.; Zucker, A.; Fong, C.; Mehine, M.; Nguyen, B.; et al. Clinicogenomic Landscape of Pancreatic Adenocarcinoma Identifies KRAS Mutant Dosage as Prognostic of Overall Survival. Nat. Med. 2025, 31, 466–477. [Google Scholar] [CrossRef]
  10. Timar, J.; Kashofer, K. Molecular Epidemiology and Diagnostics of KRAS Mutations in Human Cancer. Cancer Metastasis Rev. 2020, 39, 1029–1038. [Google Scholar] [CrossRef]
  11. Hallin, J.; Bowcut, V.; Calinisan, A.; Briere, D.M.; Hargis, L.; Engstrom, L.D.; Laguer, J.; Medwid, J.; Vanderpool, D.; Lifset, E.; et al. Anti-Tumor Efficacy of a Potent and Selective Non-Covalent KRASG12D Inhibitor. Nat. Med. 2022, 28, 2171–2182. [Google Scholar] [CrossRef]
  12. Yoshinari, T.; Nagashima, T.; Ishioka, H.; Inamura, K.; Nishizono, Y.; Tasaki, M.; Iguchi, K.; Suzuki, A.; Sato, C.; Nakayama, A.; et al. Discovery of KRAS(G12D) Selective Degrader ASP3082. Commun. Chem. 2025, 8, 254. [Google Scholar] [CrossRef]
  13. Vasilev, B.; Atanasova, M.; Vasilev, B.; Atanasova, M. A (Comprehensive) Review of the Application of Quantitative Structure–Activity Relationship (QSAR) in the Prediction of New Compounds with Anti-Breast Cancer Activity. Appl. Sci. 2025, 15, 1206. [Google Scholar] [CrossRef]
  14. Tropsha, A.; Isayev, O.; Varnek, A.; Schneider, G.; Cherkasov, A. Integrating QSAR Modelling and Deep Learning in Drug Discovery: The Emergence of Deep QSAR. Nat. Rev. Drug Discov. 2023, 23, 141–155. [Google Scholar] [CrossRef]
  15. Waring, M.J.; Arrowsmith, J.; Leach, A.R.; Leeson, P.D.; Mandrell, S.; Owen, R.M.; Pairaudeau, G.; Pennie, W.D.; Pickett, S.D.; Wang, J.; et al. An Analysis of the Attrition of Drug Candidates from Four Major Pharmaceutical Companies. Nat. Rev. Drug Discov. 2015, 14, 475–486. [Google Scholar] [CrossRef]
  16. van de Waterbeemd, H.; Gifford, E. ADMET in Silico Modelling: Towards Prediction Paradise? Nat. Rev. Drug Discov. 2003, 2, 192–204. [Google Scholar] [CrossRef]
  17. Wei, D.; Wang, L.; Zuo, X.; Maitra, A.; Bresalier, R.S. A Small Molecule with Big Impact: MRTX1133 Targets the KRASG12D Mutation in Pancreatic Cancer. Clin. Cancer Res. 2024, 30, 655–662. [Google Scholar] [CrossRef]
  18. Kemp, S.B.; Cheng, N.; Markosyan, N.; Sor, R.; Kim, I.K.; Hallin, J.; Shoush, J.; Quinones, L.; Brown, N.V.; Bassett, J.B.; et al. Efficacy of a Small-Molecule Inhibitor of KrasG12D in Immunocompetent Models of Pancreatic Cancer. Cancer Discov. 2023, 13, 298–311. [Google Scholar] [CrossRef]
  19. Lu, W.; Zeng, R.; Pan, M.; Zhou, Y.; Tang, H.; Shen, W.; Tang, Y.; Lei, P.; Mikov, M.; Bandyopadhyay, D.; et al. Pharmacokinetics, Bioavailability, and Tissue Distribution of MRTX1133 in Rats Using UHPLC-MS/MS. Front. Pharmacol. 2024, 15, 1509319. [Google Scholar] [CrossRef] [PubMed]
  20. Bristol Exits KRAS G12D. ApexOnco—Clinical Trials News and Analysis. Available online: https://www.oncologypipeline.com/apexonco/bristol-exits-kras-g12d (accessed on 9 December 2025).
  21. Lipinski, C.A.; Lombardo, F.; Dominy, B.W.; Feeney, P.J. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. Adv. Drug Deliv. Rev. 1997, 23, 3–25. [Google Scholar] [CrossRef]
  22. Muegge, I.; Heald, S.L.; Brittelli, D. Simple Selection Criteria for Drug-like Chemical Matter. J. Med. Chem. 2001, 44, 1841–1846. [Google Scholar] [CrossRef]
  23. Ghose, A.K.; Viswanadhan, V.N.; Wendoloski, J.J. A Knowledge-Based Approach in Designing Combinatorial or Medicinal Chemistry Libraries for Drug Discovery. 1. A Qualitative and Quantitative Characterization of Known Drug Databases. J. Comb. Chem. 1998, 1, 55–68. [Google Scholar] [CrossRef]
  24. Veber, D.F.; Johnson, S.R.; Cheng, H.Y.; Smith, B.R.; Ward, K.W.; Kopple, K.D. Molecular Properties That Influence the Oral Bioavailability of Drug Candidates. J. Med. Chem. 2002, 45, 2615–2623. [Google Scholar] [CrossRef]
  25. Egan, W.J.; Merz, K.M.; Baldwin, J.J. Prediction of Drug Absorption Using Multivariate Statistics. J. Med. Chem. 2000, 43, 3867–3877. [Google Scholar] [CrossRef] [PubMed]
  26. Brenk, R.; Schipani, A.; James, D.; Krasowski, A.; Gilbert, I.H.; Frearson, J.; Wyatt, P.G. Lessons Learnt from Assembling Screening Libraries for Drug Discovery for Neglected Diseases. ChemMedChem 2008, 3, 435–444. [Google Scholar] [CrossRef]
  27. Baell, J.B.; Holloway, G.A. New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays. J. Med. Chem. 2010, 53, 2719–2740. [Google Scholar] [CrossRef] [PubMed]
  28. Srisongkram, T.; Khamtang, P.; Weerapreeyakul, N. Prediction of KRASG12C Inhibitors Using Conjoint Fingerprint and Machine Learning-Based QSAR Models. J. Mol. Graph. Model. 2023, 122, 108466. [Google Scholar] [CrossRef]
  29. Srisongkram, T.; Weerapreeyakul, N. Drug Repurposing against KRAS Mutant G12C: A Machine Learning, Molecular Docking, and Molecular Dynamics Study. Int. J. Mol. Sci. 2023, 24, 669. [Google Scholar] [CrossRef] [PubMed]
  30. Nadee, P.; Prompat, N.; Yamabhai, M.; Sangkhathat, S.; Benjakul, S.; Tipmanee, V.; Saetang, J. In Silico Identification of Selective KRAS G12D Inhibitor via Machine Learning-Based Molecular Docking Combined with Molecular Dynamics Simulation. Adv. Theory Simul. 2024, 7, 2400489. [Google Scholar] [CrossRef]
  31. Ajmal, A.; Danial, M.; Zulfat, M.; Numan, M.; Zakir, S.; Hayat, C.; Alabbosh, K.F.; Zaki, M.E.A.; Ali, A.; Wei, D. In Silico Prediction of New Inhibitors for Kirsten Rat Sarcoma G12D Cancer Drug Target Using Machine Learning-Based Virtual Screening, Molecular Docking, and Molecular Dynamic Simulation Approaches. Pharmaceuticals 2024, 17, 551. [Google Scholar] [CrossRef]
  32. OCED. Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)SAR] Models; OECD Publishing: Paris, France, 2014. [Google Scholar] [CrossRef]
  33. Fourches, D.; Muratov, E.; Tropsha, A. Trust, but Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research. J. Chem. Inf. Model. 2010, 50, 1189. [Google Scholar] [CrossRef]
  34. Weininger, D. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 2002, 28, 31–36. [Google Scholar] [CrossRef]
  35. Mendez, D.; Gaulton, A.; Bento, A.P.; Chambers, J.; De Veij, M.; Félix, E.; Magariños, M.P.; Mosquera, J.F.; Mutowo, P.; Nowotka, M.; et al. ChEMBL: Towards Direct Deposition of Bioassay Data. Nucleic Acids Res. 2019, 47, D930–D940. [Google Scholar] [CrossRef] [PubMed]
  36. Polishchuk, P.; Tinkov, O.; Khristova, T.; Ognichenko, L.; Kosinskaya, A.; Varnek, A.; Kuz’min, V. Structural and Physico-Chemical Interpretation (SPCI) of QSAR Models and Its Comparison with Matched Molecular Pair Analysis. J. Chem. Inf. Model. 2016, 56, 1455–1469. [Google Scholar] [CrossRef] [PubMed]
  37. Wang, X.; Burns, A.C.; Christensen, J.G.; Ketcham, J.M.; Lawson, J.D.; Marx, M.A.; Smith, C.R.; Allen, S.; Blake, J.F.; Chicarelli, M.J.; et al. KRas G12D Inhibitors. U.S. Patent US11453683B1, 27 September 2022. Available online: https://patents.google.com/patent/US11453683B1 (accessed on 27 September 2025).
  38. Issahaku, A.R.; Mukelabai, N.; Agoni, C.; Rudrapal, M.; Aldosari, S.M.; Almalki, S.G.; Khan, J. Characterization of the Binding of MRTX1133 as an Avenue for the Discovery of Potential KRASG12D Inhibitors for Cancer Therapy. Sci. Rep. 2022, 12, 17796. [Google Scholar] [CrossRef]
  39. Alves, V.M.; Bobrowski, T.; Melo-Filho, C.C.; Korn, D.; Auerbach, S.; Schmitt, C.; Muratov, E.N.; Tropsha, A. QSAR Modeling of SARS-CoV Mpro Inhibitors Identifies Sufugolix, Cenicriviroc, Proglumetacin, and Other Drugs as Candidates for Repurposing against SARS-CoV-2. Mol. Inform. 2021, 40, 2000113. [Google Scholar] [CrossRef]
  40. Ghazi Vakili, M.; Gorgulla, C.; Snider, J.; Nigam, A.; Bezrukov, D.; Varoli, D.; Aliper, A.; Polykovsky, D.; Padmanabha Das, K.M.; Cox, H., III; et al. Quantum-Computing-Enhanced Algorithm Unveils Potential KRAS Inhibitors. Nat. Biotechnol. 2025, 12, 1–6. [Google Scholar] [CrossRef]
  41. Zakharov, A.V.; Zhao, T.; Nguyen, D.-T.; Peryea, T.; Sheils, T.; Yasgar, A.; Huang, R.; Southall, N.; Simeonov, A. Novel Consensus Architecture To Improve Performance of Large-Scale Multitask Deep Learning QSAR Models. J. Chem. Inf. Model. 2019, 59, 4613–4624. [Google Scholar] [CrossRef]
  42. Gedeck, P.; Rohde, B.; Bartels, C. QSAR—How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets. J. Chem. Inf. Model. 2006, 46, 1924–1936. [Google Scholar] [CrossRef]
  43. Liu, T.; Hwang, L.; Burley, S.K.; Nitsche, C.I.; Southan, C.; Walters, W.P.; Gilson, M.K. BindingDB in 2024: A FAIR Knowledgebase of Protein-Small Molecule Binding Data. Nucleic Acids Res. 2025, 53, D1633–D1644. [Google Scholar] [CrossRef]
  44. Aladinskiy, V.; Mantsyzov, A.B.; Kruse, C.; Noev, A.; Petrov, R.; Reshetnikov, V.; Shi, S.; Ding, X.; Cai, X.; Aliper, A.; et al. Identification of Novel Pan-KRAS Inhibitors via Structure-Based Drug Design, Scaffold Hopping, and Biological Evaluation. ACS Med. Chem. Lett. 2025, 16, 1282–1289. [Google Scholar] [CrossRef]
  45. RDKit. Available online: https://github.com/rdkit (accessed on 1 April 2025).
  46. Yap, C.W. PaDEL-Descriptor: An Open Source Software to Calculate Molecular Descriptors and Fingerprints. J. Comput. Chem. 2011, 32, 1466–1474. [Google Scholar] [CrossRef] [PubMed]
  47. Ecrl/Padelpy: A Python Wrapper for PaDEL-Descriptor Software. Available online: https://github.com/ecrl/padelpy (accessed on 9 December 2025).
  48. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  49. CatBoost—Open-Source Gradient Boosting Library. Available online: https://catboost.ai/ (accessed on 9 December 2025).
  50. Ash, J.R.; Wognum, C.; Rodríguez-Pérez, R.; Aldeghi, M.; Cheng, A.C.; Clevert, D.A.; Engkvist, O.; Fang, C.; Price, D.J.; Hughes-Oliver, J.M.; et al. Practically Significant Method Comparison Protocols for Machine Learning in Small Molecule Drug Discovery. J. Chem. Inf. Model. 2025, 65, 9398–9411. [Google Scholar] [CrossRef]
  51. Alves, V.M.; Capuzzi, S.J.; Braga, R.C.; Korn, D.; Hochuli, J.E.; Bowler, K.H.; Yasgar, A.; Rai, G.; Simeonov, A.; Muratov, E.N.; et al. SCAM Detective: Accurate Predictor of Small, Colloidally Aggregating Molecules. J. Chem. Inf. Model. 2020, 60, 4056–4063. [Google Scholar] [CrossRef] [PubMed]
  52. Ojala, M.; Garriga, G.C. Permutation Tests for Studying Classifier Performance. J. Mach. Learn. Res. 2010, 11, 1833–1863. [Google Scholar]
  53. Shrikumar, A.; Greenside, P.; Kundaje, A. Learning Important Features through Propagating Activation Differences. In Proceedings of the 34th International Conference on Machine Learning, ICML, Sydney, Australia, 6–11 August 2017; Volume 7, pp. 4844–4866. [Google Scholar]
  54. Kier, L.B.; Hall, L.H. An Electrotopological-State Index for Atoms in Molecules. Pharm. Res. 1990, 7, 801–807. [Google Scholar] [CrossRef]
  55. Tetko, I.V.; Tanchuk, V.Y. Application of Associative Neural Networks for Prediction of Lipophilicity in ALOGPS 2.1 Program. J. Chem. Inf. Comput. Sci. 2002, 42, 1136–1145. [Google Scholar] [CrossRef]
  56. Steinbeck, C.; Han, Y.; Kuhn, S.; Horlacher, O.; Luttmann, E.; Willighagen, E. The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics. J. Chem. Inf. Comput. Sci. 2003, 43, 493–500. [Google Scholar] [CrossRef]
  57. Todeschini, R.; Consonni, V. Handbook of Molecular Descriptors; John Wiley & Sons: Hoboken, NJ, USA, 2000; p. 667. [Google Scholar]
  58. Thormann, M.; Vidal, D.; Almstetter, M.; Pons, M. Nomen Est Omen: Quantitative Prediction of Molecular Properties Directly from IUPAC Names. Open Appl. Inform. J. 2007, 1, 28–32. [Google Scholar] [CrossRef]
  59. Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50, 742–754. [Google Scholar] [CrossRef]
  60. AlvaDesc—KNIME—Alvascience. Available online: https://www.alvascience.com/knime-alvadesc/ (accessed on 9 December 2025).
  61. Varnek, A.; Fourches, D.; Hoonakker, F.; Solov’ev, V.P. Substructural Fragments: An Universal Language to Encode Reactions, Molecular and Supramolecular Structures. J. Comput.-Aided Mol. Des. 2005, 19, 693–703. [Google Scholar] [CrossRef]
  62. Hong, H.; Xie, Q.; Ge, W.; Qian, F.; Fang, H.; Shi, L.; Su, Z.; Perkins, R.; Tong, W. Mold2, Molecular Descriptors from 2D Structures for Chemoinformatics and Toxicoinformatics. J. Chem. Inf. Model. 2008, 48, 1337–1344. [Google Scholar] [CrossRef]
  63. Moriwaki, H.; Tian, Y.S.; Kawashita, N.; Takagi, T. Mordred: A Molecular Descriptor Calculator. J. Chem. 2018, 10, 4. [Google Scholar] [CrossRef] [PubMed]
  64. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2016, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
  65. Xu, Y.; Ma, J.; Liaw, A.; Sheridan, R.P.; Svetnik, V. Demystifying Multitask Deep Neural Networks for Quantitative Structure–Activity Relationships. J. Chem. Inf. Model. 2017, 57, 2490–2504. [Google Scholar] [CrossRef] [PubMed]
  66. Itskowitz, P.; Tropsha, A. K Nearest Neighbors QSAR Modeling as a Variational Problem: Theory and Applications. J. Chem. Inf. Model. 2005, 45, 777–785. [Google Scholar] [CrossRef] [PubMed]
  67. Rasulev, B.F.; Toropov, A.A.; Hamme, A.T.; Leszczynski, J. Multiple Linear Regression Analysis and Optimal Descriptors: Predicting the Cholesteryl Ester Transfer Protein Inhibition Activity. QSAR Comb. Sci. 2008, 27, 595–606. [Google Scholar] [CrossRef]
  68. Karpov, P.; Godin, G.; Tetko, I.V. Transformer-CNN: Swiss Knife for QSAR Modeling and Interpretation. J. Cheminform. 2020, 12, 17. [Google Scholar] [CrossRef]
  69. Xiong, Z.; Wang, D.; Liu, X.; Zhong, F.; Wan, X.; Li, X.; Li, Z.; Luo, X.; Chen, K.; Jiang, H.; et al. Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism. J. Med. Chem. 2019, 63, 8749–8760. [Google Scholar] [CrossRef]
  70. Heid, E.; Greenman, K.P.; Chung, Y.; Li, S.C.; Graff, D.E.; Vermeire, F.H.; Wu, H.; Green, W.H.; McGill, C.J. Chemprop: A Machine Learning Package for Chemical Property Prediction. J. Chem. Inf. Model. 2023, 64, 9–17. [Google Scholar] [CrossRef]
  71. OCHEM Introduction—OCHEM User’s Manual—OCHEM Docs. Available online: https://docs.ochem.eu/display/MAN.html (accessed on 9 December 2025).
  72. Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array Programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef] [PubMed]
  73. Mckinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010. [Google Scholar]
  74. McNutt, A.T.; Li, Y.; Meli, R.; Aggarwal, R.; Koes, D.R. GNINA 1.3: The next Increment in Molecular Docking with Deep Learning. J. Chem. 2025, 17, 28. [Google Scholar] [CrossRef]
  75. Carato, P.; Oxombre, B.; Ravez, S.; Boulahjar, R.; Donnier-Maréchal, M.; Barczyk, A.; Liberelle, M.; Vermersch, P.; Melnyk, P. Discovery of Novel Benzamide-Based Sigma-1 Receptor Agonists with Enhanced Selectivity and Safety. Molecules 2025, 30, 3584. [Google Scholar] [CrossRef]
  76. Jiang, Y.; Li, X.; Zhang, Y.; Han, J.; Xu, Y.; Pandit, A.; Zhang, Z.; Wang, M.; Wang, M.; Liu, C.; et al. PoseX: AI Defeats Physics-Based Methods on Protein Ligand Cross-Docking 2025. In Proceedings of the Thirty-ninth Annual Conference on Neural Information Processing Systems, San Diego, CA, USA, 2–7 December 2025. [Google Scholar]
  77. Macip, G.; Garcia-Segura, P.; Mestres-Truyol, J.; Saldivar-Espinoza, B.; Ojeda-Montes, M.J.; Gimeno, A.; Cereto-Massagué, A.; Garcia-Vallvé, S.; Pujadas, G. Haste Makes Waste: A Critical Review of Docking-Based Virtual Screening in Drug Repurposing for SARS-CoV-2 Main Protease (M-pro) Inhibition. Med. Res. Rev. 2022, 42, 744–769. [Google Scholar] [CrossRef]
  78. Pettersen, E.F.; Goddard, T.D.; Huang, C.C.; Couch, G.S.; Greenblatt, D.M.; Meng, E.C.; Ferrin, T.E. UCSF Chimera—A Visualization System for Exploratory Research and Analysis. J. Comput. Chem. 2004, 25, 1605–1612. [Google Scholar] [CrossRef] [PubMed]
  79. PatWalters/Useful_rdkit_utils: Some Useful RDKit Functions. Available online: https://github.com/PatWalters/useful_rdkit_utils (accessed on 9 December 2025).
  80. Ehrt, C.; Schulze, T.; Graef, J.; Diedrich, K.; Pletzer-Zelgert, J.; Rarey, M. ProteinsPlus: A Publicly Available Resource for Protein Structure Mining. Nucleic Acids Res. 2025, 53, W478–W484. [Google Scholar] [CrossRef] [PubMed]
Figure 1. General workflow of the study.
Figure 1. General workflow of the study.
Ijms 27 00120 g001
Figure 2. Results of the structural interpretation of QSAR models of KRAS inhibitors developed using Klekota-Roth (A) and PubChem (B) descriptors. Symbol # denotes the number of feature.
Figure 2. Results of the structural interpretation of QSAR models of KRAS inhibitors developed using Klekota-Roth (A) and PubChem (B) descriptors. Symbol # denotes the number of feature.
Ijms 27 00120 g002
Figure 3. Highlighting of the identified structural fragment that consistently increases acute oral toxicity in rats. When a toxicophore is detected, the Tanimoto similarity index of the fragment to the molecule is also returned.
Figure 3. Highlighting of the identified structural fragment that consistently increases acute oral toxicity in rats. When a toxicophore is detected, the Tanimoto similarity index of the fragment to the molecule is also returned.
Ijms 27 00120 g003
Figure 4. Promising inhibitors of KRAS G12D. (A) Rational molecular design based on the identified structure-activity relationships of KRAS G12D inhibitors (see Table 2, Figure 2). (B) Compound 3 was proposed as a result of the combinatorial modification of the functional groups of the compounds from the training set. Molecular transformations are highlighted in red in chemical structures.
Figure 4. Promising inhibitors of KRAS G12D. (A) Rational molecular design based on the identified structure-activity relationships of KRAS G12D inhibitors (see Table 2, Figure 2). (B) Compound 3 was proposed as a result of the combinatorial modification of the functional groups of the compounds from the training set. Molecular transformations are highlighted in red in chemical structures.
Ijms 27 00120 g004
Figure 5. Visualization of the compliance of the investigated compounds with Muegge’s rules using bioavailability radar plots.
Figure 5. Visualization of the compliance of the investigated compounds with Muegge’s rules using bioavailability radar plots.
Ijms 27 00120 g005
Figure 6. Key interactions of amino acid residues in the active site of the KRAS G12D 7T47 model (violet) with (A) co-crystalized ligand MRTX1133 (gray) and (B) BDBM573509 (yellow). Ionic and hydrogen bonds are shown as magenta dashed lines. Gly60 interacts with the ligand via the oxygen backbone atom. Additionally, the side chains of the amino acids of the active site that interact with a ligand via hydrophobic contacts are displayed.
Figure 6. Key interactions of amino acid residues in the active site of the KRAS G12D 7T47 model (violet) with (A) co-crystalized ligand MRTX1133 (gray) and (B) BDBM573509 (yellow). Ionic and hydrogen bonds are shown as magenta dashed lines. Gly60 interacts with the ligand via the oxygen backbone atom. Additionally, the side chains of the amino acids of the active site that interact with a ligand via hydrophobic contacts are displayed.
Ijms 27 00120 g006
Figure 7. Key interactions of amino acid residues in the active site of the KRAS G12D 7T47 model (violet) with (A) compound 1 (blue), (B) compound 2 (pink), and (C) compound 3 (green). Ionic and hydrogen bonds are shown as magenta dashed lines. Gly60 interacts with the ligand via the oxygen backbone atom. Additionally, the side chains of the amino acids of the active site that interact with a ligand via hydrophobic contacts are displayed.
Figure 7. Key interactions of amino acid residues in the active site of the KRAS G12D 7T47 model (violet) with (A) compound 1 (blue), (B) compound 2 (pink), and (C) compound 3 (green). Ionic and hydrogen bonds are shown as magenta dashed lines. Gly60 interacts with the ligand via the oxygen backbone atom. Additionally, the side chains of the amino acids of the active site that interact with a ligand via hydrophobic contacts are displayed.
Ijms 27 00120 g007
Figure 8. Description of the training and the test sets. (A) Distribution of experimental pIC50 values for the training and the test sets. (B) Visualization of the distribution of chemical space in the molecular weight (MW)—lipophilicity (LogP) coordinate system for the training and the test sets. (C) Visualization of the chemical space.
Figure 8. Description of the training and the test sets. (A) Distribution of experimental pIC50 values for the training and the test sets. (B) Visualization of the distribution of chemical space in the molecular weight (MW)—lipophilicity (LogP) coordinate system for the training and the test sets. (C) Visualization of the chemical space.
Ijms 27 00120 g008
Figure 9. Superimposition of the docked ligand (violet) and the co-crystallized ligand MRTX1133 (gray) during validation of the docking procedure for the protein model 7T47 PDB. The obtained RMSD value was 0.76 Å.
Figure 9. Superimposition of the docked ligand (violet) and the co-crystallized ligand MRTX1133 (gray) during validation of the docking procedure for the protein model 7T47 PDB. The obtained RMSD value was 0.76 Å.
Ijms 27 00120 g009
Table 1. Statistical characteristics of the developed QSAR models of KRAS inhibitors.
Table 1. Statistical characteristics of the developed QSAR models of KRAS inhibitors.
DescriptorsAlgorithmsTraining Set,
5-Fold CV
Test Set
All CompoundsCovAD Compounds
Q c v 2 RMSE Q t s 2 RMSE Q t s 2 RMSE
Topological Torsion fingerprintsCatBoost0.650.730.670.710.750.610.74
SVM0.660.710.650.730.600.74
MLP0.630.740.600.790.550.79
MACCSCatBoost0.480.880.460.910.670.370.97
SVM0.480.880.460.910.370.98
MLP0.440.910.440.930.360.99
PubChemCatBoost0.580.790.650.730.740.590.75
SVM0.600.780.600.780.490.85
MLP0.570.800.600.780.510.81
KlekotaRothCatBoost0.620.760.660.720.810.650.72
SVM0.630.740.640.740.640.74
MLP0.500.860.640.750.630.75
Atom Pairs fingerprintsCatBoost0.570.800.530.850.790.520.87
SVM0.610.760.560.820.610.79
MLP0.570.800.590.790.580.82
ECFP4CatBoost0.650.730.690.680.780.660.70
SVM0.680.690.680.700.660.70
MLP0.660.720.630.750.610.74
Topological Path-Based fingerprintsCatBoost0.600.770.640.750.820.670.70
SVM0.650.720.620.760.620.76
MLP0.540.820.580.800.560.81
RDKitCatBoost0.640.730.650.730.820.670.71
SVM0.600.770.580.810.620.76
MLP0.510.860.520.860.530.85
Consensus
(ECFP4 + Topological Path-Based fingerprints + RDKit)
CatBoost0.680.690.710.690.780.700.66
Table 2. Top 10 molecular transformations (MT) affecting KRAS inhibitory activity.
Table 2. Top 10 molecular transformations (MT) affecting KRAS inhibitory activity.
#Molecular Transformations
and SMIRKS
# MTΔMeanAn Example of a Molecular Transformation
(Molecular Pair)
Reducing Inhibitory Activity
1Ijms 27 00120 i001

[O] * -> * C#N
8−2.0 ± 0.85Ijms 27 00120 i002
pIC50 = 9.30        pIC50 = 7.69
2Ijms 27 00120 i003

[C][C] * -> [C]O *
5−1.9 ± 1.0Ijms 27 00120 i004
pIC50 = 8.96        pIC50 = 7.60
3Ijms 27 00120 i005

[C]N1[C][C][C][C@H]1 * -> [C]c1[c][c]c(*)n[c]1
5−1.5 ± 0.23Ijms 27 00120 i006
pIC50 = 8.57        pIC50 = 6.81
4Ijms 27 00120 i007

[O] * -> [C]O *
9−1.4 ± 1.5Ijms 27 00120 i008
pIC50 = 9.40        pIC50 = 5.57
5Ijms 27 00120 i009

[C][C] * -> * C#N
4−1.4 ± 0.63Ijms 27 00120 i010
pIC50 = 8.96        pIC50 = 7.69
Increasing inhibitory activity
6Ijms 27 00120 i011

* [C][C] * -> * [C]C12[C][C][C]N1[C](*)[C][C]2
625 ± 0.22Ijms 27 00120 i012
pIC50 = 5.30        pIC50 = 8.12
7Ijms 27 00120 i013

* c1[c][c]c(*)[c][c]1 -> [O]c1[c]c(*)[c]c(*)[c]1
42.0 ± 0.2Ijms 27 00120 i014
pIC50 = 6.71        pIC50 = 8.82
8Ijms 27 00120 i015

* c1[c][c]n[c][c]1 -> [C]c1n[c][c]n1[C] *
41.8 ± 0.36Ijms 27 00120 i016
pIC50 = 5.82        pIC50 = 8.02
9Ijms 27 00120 i017

* c1[c][c]c(*)[c][c]1 -> * [C]1[C][C](*)c2[c][c][c][c]c2[C]1
111.8 ± 1.0Ijms 27 00120 i018
pIC50 = 5.79        pIC50 = 9.30
10Ijms 27 00120 i019

* c1[c][c]n[c][c]1 -> [C]c1n[c][c]n1[C] *
41.6 ± 0.3Ijms 27 00120 i020
pIC50 = 6.18        pIC50 = 8.02
# is number; the yellow color in chemical structures highlights molecular transformations.
Table 3. Key physicochemical parameters of the investigated compounds and reference values according to Muegge’s bioavailability rules.
Table 3. Key physicochemical parameters of the investigated compounds and reference values according to Muegge’s bioavailability rules.
ParametersMRTX1133Compound 2Compound 3Muegge Rules
Molecular weight(MW), Da600.7514.6499.5200–600
Octanol-water coefficient(LogP)4.713.473.88≤5
Number of hydrogen bond donors (HBD)222≤5
Number of hydrogen bond acceptors(HBAs)887≤10
Number of rotatable bonds554≤15
Topological polar surface area (TPSA), Å286.6486.6499.52≤150
Number of rings875≤7
Table 4. Molecular docking results.
Table 4. Molecular docking results.
CompoundpIC50,
* Experimental Data
Affinity, kcal/molIntra, kcal/molCNN Pose ScoreCNN Affinity, pK
BDBM5735095.57 *−10.51−0.340.68098.134
compound 17.98−11.81−0.910.79358.193
compound 28.05−11.70−0.870.75858.094
compound 37.49−8.443.640.66127.348
MRTX11338.25 ± 0.47 *−13.55−0.550.81208.554
Table 5. Comparative analysis of QSAR studies on KRAS G12D inhibitors.
Table 5. Comparative analysis of QSAR studies on KRAS G12D inhibitors.
Parameter for ComparisonPanik et al. [30]Ajmal et al. [31]This Study
Type of developed QSAR modelsBinary
classification
Binary
classification
Regression
Description of the experimental data preprocessing procedure in accordance with mandatory requirements [33]NoNoYes
Molecular descriptorsPubChem2D MOEECFP4, Klekota-Roth, PubChem, MACCS, Topological Torsion, Atom Pairs, Topological Path-Based fingerprints, RDKit, OEState, ALogPS, CDK, Dragon, QNPR, alvaDesc, Fragmentor, MOLD2, MORDRED
Machine learning methodsRandom forest, k-nearest neighbors, support vector machine, XGBoost, LightGBM, CatBoostRandom forest, k-nearest neighbors, support vector machineRandom forest, k-nearest neighbors, support vector machine, XGBoost, LightGBM, CatBoost, Multilayered perceptron, deep neural network, associative neural networks, multiple linear regression analysis, transformer convolutional neural network, Attentive FP, Chemprop
Definition of the applicability domain —the third mandatory principle of QSAR modeling according to OECD [32]NoNoYes
Structural interpretation—the fifth recommended principle of QSAR modeling according to OECD [32]NoNoYes
Application of y-randomization for the identification of chance correlationNoNoYes
Form of QSAR model implementationNoNoJupyter Notebook, executable via Google Colab
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tinkov, O.V.; Gurevich, P.E.; Nikolenko, S.A.; Kadyrov, S.D.; Bogatyreva, N.S.; Grigorev, V.Y.; Ivankov, D.N.; Pak, M.A. KRASAVA—An Expert System for Virtual Screening of KRAS G12D Inhibitors. Int. J. Mol. Sci. 2026, 27, 120. https://doi.org/10.3390/ijms27010120

AMA Style

Tinkov OV, Gurevich PE, Nikolenko SA, Kadyrov SD, Bogatyreva NS, Grigorev VY, Ivankov DN, Pak MA. KRASAVA—An Expert System for Virtual Screening of KRAS G12D Inhibitors. International Journal of Molecular Sciences. 2026; 27(1):120. https://doi.org/10.3390/ijms27010120

Chicago/Turabian Style

Tinkov, Oleg V., Pavel E. Gurevich, Sergei A. Nikolenko, Shamil D. Kadyrov, Natalya S. Bogatyreva, Veniamin Y. Grigorev, Dmitry N. Ivankov, and Marina A. Pak. 2026. "KRASAVA—An Expert System for Virtual Screening of KRAS G12D Inhibitors" International Journal of Molecular Sciences 27, no. 1: 120. https://doi.org/10.3390/ijms27010120

APA Style

Tinkov, O. V., Gurevich, P. E., Nikolenko, S. A., Kadyrov, S. D., Bogatyreva, N. S., Grigorev, V. Y., Ivankov, D. N., & Pak, M. A. (2026). KRASAVA—An Expert System for Virtual Screening of KRAS G12D Inhibitors. International Journal of Molecular Sciences, 27(1), 120. https://doi.org/10.3390/ijms27010120

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop