Next Article in Journal
The Effect of Small Asymmetric Shoulder Loads on Postural Control in Older People
Previous Article in Journal
A New Method for Detecting Plastic-Mulched Land Using GF-2 Imagery
Previous Article in Special Issue
DFT-Computation-Assisted EPR Study on Oxalate Anion-Radicals, Generated in γ-Irradiated Polycrystallites of H2C2O4·2H2O, Cs2C2O4, and K2C2O4·H2O
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Data-Driven and Structure-Based Modelling for the Discovery of Human DNMT1 Inhibitors: A Pathway to Structure–Activity Relationships

1
Laboratory of Chemistry, Analysis & Design of Food Processes, Department of Food Science and Technology, University of West Attica, Agiou Spyridonos 28, 12243 Egaleo, Greece
2
Department of Biomedical Engineering, University of West Attica, Agiou Spyridonos 28, 12243 Egaleo, Greece
3
Institute of Chemical Biology, National Hellenic Research Foundation, 48 Vassileos Constantinou Ave., 11635 Athens, Greece
4
General Hospital of Veria, 59132 Veria, Greece
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2025, 15(22), 11984; https://doi.org/10.3390/app152211984
Submission received: 6 October 2025 / Revised: 6 November 2025 / Accepted: 7 November 2025 / Published: 11 November 2025
(This article belongs to the Special Issue Development and Application of Computational Chemistry Methods)

Abstract

Nowadays, the explosive growth of knowledge in the epigenetics field has highlighted DNA methyltransferase 1 (DNMT1) as a key regulator of genomic methylation patterns and a promising therapeutic target in several diseases. In light of the increasing clinical interest in epigenetic enzymes, the present study aimed to develop a robust computational framework for the discovery of novel DNMT1 inhibitors, merging both structure and data-driven strategies. Particularly, the study compiled a dataset of established DNMT1 inhibitors and calculated a series of molecular properties, thus enabling the training of a machine learning model to capture critical structure–activity relationships (SARs). When benchmarked against known active compounds, the model effectively discriminated between putative inhibitors and non-inhibitors with high accuracy. In parallel, molecular docking was conducted to screen additional uncharacterized compounds, estimating their binding affinity to human DNMT1. Their respective properties were then extracted and fed into the aforementioned model to predict their inhibitory potential. Our comparative evaluation against known human DNMT1 inhibitors demonstrated high predictive accuracy, confirming the reliability of the proposed integrated approach. By uniting molecular docking with data-driven SAR modelling, this workflow offers an expedited fast-track avenue for identifying promising human DNMT1 inhibitors while reducing experimental overhead. The results highlight the effectiveness of combining cheminformatics, machine learning, and in silico techniques to guide rational drug design, and accelerate the discovery of novel epigenetic inhibitors.

1. Introduction

Epigenetics has emerged as one of the most dynamic and rapidly expanding fields in biomedical research, focusing on the heritable regulation of gene expression without altering the underlying DNA sequence [1,2]. These reversible modifications—such as DNA methylation, histone acetylation, and various chromatin-remodelling events—exert a powerful influence on fundamental biological processes, from embryonic development to disease pathogenesis [3,4]. Among the key mediators of these processes are DNA methyltransferases (DNMTs), enzymes responsible for establishing and maintaining methylation patterns throughout the genome. In particular, DNMT1 has attracted significant attention for its central role in perpetuating methylation signatures during DNA replication, thereby acting as a regulator of epigenetic memory [5]. Aberrant DNMT1 activity has been implicated in the dysregulation of multiple gene networks, ultimately contributing to a variety of pathological states, including cancer, neurodegenerative disorders, and other multifactorial diseases [6,7,8]. According to Valente et al. [9], non-nucleoside inhibitors have been shown to utilized as potential inhibitors of DNA methyltransferases in cancer cells (leukemia U937; breast cancer, MDA-MB-231; Burkitt’s lymphoma, RAJI; and prostate cancer, PC-3). Similarly, natural compounds such as Antroquinonol D reduce the methylation status and reactivate the expression of multiple TSGs in MDA-MB-231 breast cancer cells [10]. These findings underscore the therapeutic relevance of DNMT1-targeted modulation. Moreover Epigallocatechin-3-gallate (EGCG), a polyphenolic constituent of green tea, is another example of a well-characterized natural DNMT1 inhibitor. It binds within the catalytic pocket of DNMT1, overlapping with the S-adenosyl-L-homocysteine (SAH) site, and inhibits methylation by forming hydrogen bonds with residues such as Glu1265, Arg1310, Arg1311, and Lys1482 [10]. These findings highlight DNMT1 as a highly promising therapeutic target for the development of novel pharmacological agents, as well as natural products as potent inhibitors of hDNMT1.
Although epigenetic therapies have rapidly gained momentum, only a handful of compounds targeting DNMTs and other epigenetic modifiers have entered clinical use to date, with the U.S. Food and Drug Administration (FDA) approving a limited number of “epi-drugs” for the treatment of specific multifactorial diseases [11]. Crucially, these diseases remain underserved by current therapies, indicating a significant need for novel strategies to modulate epigenetic pathways beyond them [12,13]. Towards this goal, medicinal chemists and drug discovery teams have begun employing advanced computational methods to streamline the identification and optimization of potential inhibitors against epigenetic targets [14]. These in silico strategies—ranging from molecular docking and pharmacophore modelling to machine learning-based structure–activity relationship (SAR) analyses—offer fast, cost-effective avenues for early-stage hit and lead discovery [15,16]. By reducing both time and resource expenditures, computational approaches enable researchers to evaluate vast libraries of chemical entities, effectively narrowing down candidates for experimental validation.
The discovery of DNMT inhibitors poses particular challenges, including the need to balance potency, selectivity, and pharmacokinetic properties. High-throughput screening (HTS) campaigns, while useful, can be time-consuming and costly, generating large volumes of data that may lead to high false-positive or false-negative rates [10]. Early studies primarily focused on direct or indirect DNMT inhibition in oncology; however, recent findings underscore broader therapeutic avenues for epigenetic modulation, prompting a shift toward exploring DNMT1-targeting agents in other disease contexts [4]. This shift has been facilitated by accumulating structural data on DNMT isoforms, which enable more refined in silico approaches such as homology modelling, structure-based virtual screening, and molecular dynamics simulations [17,18,19].
Importantly, the rise in machine learning in drug discovery has further amplified the potential impact of computational pipelines. By leveraging large datasets of known active and inactive compounds, predictive models can capture complex molecular features that govern ligand–target interactions [20]. When combined with structure-based assessments—such as docking simulations to evaluate binding affinity and orientation—machine learning tools can significantly expedite lead identification [21]. This integrated approach allows for rapid hypothesis testing, the refinement of candidate molecules, and a data-driven assessment of compound libraries prior to any wet-lab effort.
In light of these advances, the present study aimed to develop a robust computational framework for the discovery of novel DNMT1 inhibitors by merging both structure and data-driven strategies. A curated dataset of known human DNMT1 (hDNMT1) inhibitors was used to train a machine learning model capable of discerning critical structure–activity relationships. Meanwhile, molecular docking simulations were performed on a series of compounds to gauge their binding affinity profiles against human DNMT1. Compounds that predict as the most promising candidates were subsequently assessed using the predictive model to estimate their inhibitory potential. This two-pronged methodology not only increased confidence in the selected candidates but also minimized reliance on time-consuming experimental screens. The comprehensive evaluation against existing human DNMT1 inhibitors further underscored the approach’s predictive strength, paving the way for future work aimed at experimental validation and subsequent lead optimization. By integrating cheminformatics, machine learning, and in silico techniques, the present study emphasizes the promise of a synergistic framework to guide rational drug design and accelerate the discovery of next-generation epigenetic inhibitors. Unlike earlier DNMT1 modelling efforts that applied these methods independently, the present workflow unites similarity screening, molecular docking, and machine learning-based SAR analysis in a single predictive loop, allowing mutual validation of structural and data-driven predictions and reducing false-positive rates.

2. Materials and Methods

2.1. In Silico Methodology

2.1.1. Similarity-Based Virtual Screening (SBVS) Workflow

SwissSimilarity (http://www.swisssimilarity.ch/, accessed on 10 June 2025) [22], a powerful web-based tool, was employed to perform similarity-based virtual screening to a series of chemical libraries in order to identify molecules similar to Epigallocatechin-3-gallate (EGCG). For this scope, EGCG’s SMILES code was used as an input and then ZINC, Asinex, AsisChem, ChemBridge, Enamine, Maybridge, Otava, Specs, TimTec, Vitas-M, Life Chemicals, and ChemDiv libraries were picked as the pool of screening compounds. Afterwards, a variety of screening methods including FP2, ECFP4, MHFP6, 2D Pharmacophore Matching, Electroshape, ErG and Combined 2D/3D Screening Method was selected. The inclusion of the Electroshape method adds a valuable layer to the screening process as it considers not only the molecular shape but also the electrostatic potential distribution, which are key determinants of molecular interactions, especially in protein binding [22].
The virtual screening process involved a comprehensive evaluation of 7693 compounds derived from all chemical libraries available in SwissSimilarity, utilizing all the described screening methods. Following the application of the similarity threshold of >0.60, 198 compounds were identified as the most promising candidates. Particularly, the similarity score was ranged from 0.608 to 0.921. The proposed compounds demonstrated a significant structural resemblance to EGCG, while some others exhibited electrostatic and 3D shape similarity. The Electroshape method identified the largest number of compounds (92 compounds), followed by FP2 (67 compounds), ErG (28 compounds), and the Combined method (11 compounds).
In Table S1 (Supplementary Materials) the library name and code of the selected compounds, the SMILES (Simplified Molecular Input Line Entry System) depiction, the screening method, and their similarity score relative to EGCG are presented.

2.1.2. Molecular Docking Protocol

The three-dimensional structure of hDNMT1 isoform was obtained from the RCSB Protein Data Bank (PDB ID: 4WXX, 2.62 Å). Among the available crystallographic entries of hDNMT1, structure 4WXX was chosen because it includes the co-crystallized ligand S-adenosyl-L-homocysteine (SAH) which is well-known to act as a strong inhibitor of both DNA and histone methyltransferases [23]. In continuation, the retrieved crystal structure was prepared using AutoDockTools-1.5.7 software (Scripps Research Institute, La Jolla, CA, USA). In particular, water molecules were removed, missing residues, polar hydrogen atoms, and partial charges (Kollman charges) were then added to the structure to ensure proper electrostatic interactions. All hits derived from the similarity-based virtual screening pipeline were imported into AutoDockTools-1.5.7 software (Scripps Research Institute, La Jolla, CA, USA), polar hydrogens and Gasteiger charges were added, and finally the prepared structures were converted into PDBQT. files. The grid box center and size dimensions were identified as follows: box center coordinates were set to x = −47 Å, y = 61 Å, z = 7 Å, and the box size was defined as x = 19 Å, y = 19 Å, z = 19 Å. Docking simulations were performed using Webina 1.0.5 (https://durrantlab.pitt.edu/webina/, accessed on 5 July 2025), an online interface for AutoDock Vina software [24], and the maximum number of docking modes was set equal to 10 and the exhaustiveness search parameter was set equal to 8. The best docking pose was selected on the basis of the lowest energy docked conformation and the presence of interactions with reported crucial pocket binding residues. All generated poses were visually inspected and analysed using Pymol 3.0 software (Schrödinger, New York, NY, USA).

2.1.3. ADMET Properties Prediction

The ADMET properties of the most promising compounds were predicted by applying ADMETlab 3.0 (https://admetmesh.scbdd.com/—accessed on 31 October 2025) [25].

2.2. Machine Learning Model Deployment

2.2.1. Construction of the Prediction Dataset

A dataset of 118 hDNMT1-targeting compounds was retrieved from the ChEMBL database (https://www.ebi.ac.uk/chembl/), accessed on 20 May 2025), including compounds with experimentally validated half-maximal inhibitory concentration (IC50) values. Each compound was classified as either “active” or “not-active” according to established IC50 thresholds (IC50 limits). Molecular descriptors were calculated using RDKit software (https://www.rdkit.org/), (accessed on 1 March 2025), covering a broad range of structural, physicochemical, and electronic properties.

2.2.2. Descriptor Pre-Processing and Feature Selection

All descriptor values were standardized using z-score normalization, following the formula:
N o r m a l i z e d   v a l u e = d e s c r i p t o r   v a l u e m e a n standard deviation
Normalization parameters (mean and standard deviation) were derived from the ChEMBL dataset and applied consistently to all external compounds of the steps below.
Feature selection was performed in two stages:
This dual-stage selection pipeline mirrors the strategy introduced in our previous work on NOX2 inhibitors [26], which demonstrated the effectiveness of statistical filtering combined with model-based importance for constructing interpretable and high-performing classification models.

2.2.3. Model Training Using Multiple Classifiers

A total of 14 classification algorithms were implemented, comprising: Minimum Distance, K-Nearest Neighbors, Bayesian, LDA, Logistic Regression, Perceptron, Multilayer Perceptron, SVM, Random Forest, CART, XGBoost, Bagging, Extra Trees, AdaBoost, and Gradient Boost.
Each classifier was trained on the ChEMBL dataset using the selected descriptors. To address random variability in model performance due to stochastic initialization, the entire modelling and classification process was repeated across 50 independent epochs. Within each epoch, five-fold cross-validation was employed on the training set to ensure robust estimation of model performance and to mitigate overfitting. The best-performing classifier, as determined by mean accuracy and AUC, was subsequently selected for prediction tasks. This ensemble-based classifier architecture was also inspired by the framework previously validated in Ladika et al. (2025) [26], where a similar multi-model strategy was employed to predict NOX2 inhibition with high reliability and reproducibility.

2.2.4. Application to External Compounds

After model training, predictions were generated for two external compound groups:
  • ZINC-derived compounds: A set of 21 candidate molecules identified via molecular docking and sourced from the ZINC database (https://zinc.docking.org, accessed on 20 June 2024). Descriptors were normalized using ChEMBL-derived parameters prior to classification.
  • In vitro-validated hDNMT1 inhibitors: A reference group of 5 compounds previously experimentally tested in Kritsi et al. (2024) [27]. These compounds (2-(3-(3,4-dimethoxyphenyl)-3-(2-(2-oxo-2H-chromen-7-yl)oxy)acetamido)propanamido)acetic acid; Phlorizin; Orientin; Bergenin; 2-[(7,8-dihydroxy-6-undecylphenazin-2-yl)formamido]pentanedioic acid] were excluded from the training phase and used instead as a retrospective validation set to evaluate the model’s predictive reliability. Correct classification of these experimentally confirmed inhibitors served as an external benchmark for assessing model performance.

3. Results and Discussion

3.1. In Silico Methodology Results

3.1.1. Similarity-Based Virtual Screening

Similarity-based virtual screening was carried out to identify compounds with potential inhibitory activity against the hDNMT1 isoform. For this purpose, epigallocatechin-3-gallate (EGCG), a polyphenolic compound derived from green tea, was used as template [10]. The selection of EGCG was based on the fact that it has been widely studied for its epigenetic properties, particularly its ability to modulate DNA methylation by inhibiting DNMT1. Moreover, EGCG exhibits strong binding affinity toward the hDNMT1 catalytic site, with experimental studies reporting an IC50 value of 0.47 uM, thereby underscoring its efficacy as a hDNMT1 inhibitor [28]. The SwissSimilarity (http://www.swisssimilarity.ch/, accessed on 10 June 2025) platform, a freely accessible web-based tool within the SwissDrugDesign environment (https://www.molecular-modelling.ch/swiss-drug-design.html), (accessed on 10 June 2025), was employed to conduct the virtual screening procedure.
From the generated results, only compounds with a similarity score greater than 0.60 were selected for further consideration. This threshold was applied to ensure that the shortlisted compounds shared significant structural resemblance with EGCG. Structural similarity is crucial in this context, since molecules with chemical frameworks comparable to EGCG are more likely to display analogous binding affinities and biological activities [29]. High similarity ensures that the selected compounds retain the key functional groups and molecular configurations responsible for EGCG’s interaction with hDNMT1. Such features, including the ability to form hydrogen bonds with specific amino acid residues within the catalytic domain, are critical for inhibitory activity. By focusing on structurally related molecules, the probability of identifying functional analogues that preserve or enhance EGCG’s bioactivity is maximized.
In total, 7693 compounds, derived from all chemical libraries available in SwissSimilarity, were virtually screened by applying a variety of screening methods (FP2, ECFP4, MHFP6, 2D pharmacophore, Electroshape, ErG, and the combined 2D/3D method). For the results evaluation, a similarity threshold of >0.60 was set and 198 commercially available compounds were characterized as the most promising. These compounds demonstrated a significant structural resemblance to EGCG, while others exhibited electrostatic and 3D shape similarity. This variation in similarity metrics aligns with the study’s objective of identifying potential analogues by considering both structural and electrostatic features critical for effective hDNMT1binding.
Notably, the similarity scores ranged from 0.608 to 0.921. The compound with the lowest similarity, was retrieved using the Combined method, with a similarity score of 0.608. Conversely, the Electroshape method generated compounds with consistently high similarity scores, ranging from 0.872 to 0.921, demonstrating its effectiveness in identifying analogues that closely mimic EGCG’s electrostatic and 3D shape characteristics.
The number of compounds identified by each method varied significantly. The Electroshape method identified the largest number of compounds (92 compounds), followed by FP2 (67 compounds), ErG (28 compounds), and the Combined method (11 compounds). The predominance of Electroshape highlights its strength in screening compounds with electrostatic and shape similarity, while FP2 excels in identifying structurally close analogues.
Structural analysis of the high-similarity compounds (similarity > 0.90) revealed recurring motifs. All these compounds contained hydroxyl groups (-OH) and aromatic rings, which are characteristic functional groups of EGCG. Additionally, certain compounds included carbonyl groups (C=O) or ether linkages (C-O-C), which are absent in EGCG but may contribute to enhanced chemical properties or binding capacity. EGCG, with its polyphenolic structure, comprises multiple hydroxyl groups and aromatic rings, which are fundamental for its hDNMT1 inhibitory activity. The comparison highlights that, although the identified compounds share key structural features with EGCG, minor variations, such as the presence of carbonyl groups, could affect their bioactivity.

3.1.2. Molecular Docking Studies

Subsequently, molecular docking analyses were conducted on the compounds that fulfilled the similarity-based virtual screening criteria. In total, 198 molecules were docked at the S-adenosyl-L-homocysteine (SAH) binding pocket of the hDNMT1 isoform (PDB ID: 4WXX; resolution 2.62 Å, accessed on 5 July 2025) [30].
The selection of the most promising candidates was guided by the following criteria: (a) their predicted binding affinity, expressed as docking scores, in relation to the co-crystallized reference ligand, and (b) the preservation of key interactions involved in SAH recognition within the catalytic site of hDNMT1. The results analysis indicated 21 compounds that are separated into two main sets (Set 1 and Set 2). Especially, Set 1 contains 7 compounds (Figure 1) which exhibit structural similarity to EGCG and Set 2 comprises 14 compounds (Figure 2) which display electrostatic and 3D shape similarity to EGCG, identified through the Electroshape method in SwissSimilarity.
Compounds of Set 1 (Figure 1) predominantly belong to the categories of polyphenols and flavonoid derivatives, characterized by multiple hydroxyl groups which enhance hydrogen bonding with hDNMT1, aromatic rings common to bioactive phenolic compounds, and in some cases, carbonyl groups which may contribute to alternative binding interactions. The structure of these compounds retains the core polyphenolic backbone of EGCG while incorporating minor variations in substituents that could improve chemical stability or binding specificity. Especially, the comparative inspection of the molecular scaffolds (Compounds 17) allowed their classification into three structural groups. Group 1 (Compounds 12) consists of polyphenolic esters (Compound 1) rich in acyl groups and polyphenolic frameworks (Compound 2) rich in hydroxyl substituents, resembling the general structural features of EGCG and preserving interaction motifs relevant for hDNMT1 inhibition. Group 2 (Compounds 35) exhibits greater substitutional diversity, incorporating moieties such as methoxy and acyl groups that influence steric orientation and electronic distribution, thereby enabling alternative binding modes. Group 3 (Compounds 67) represents hybrid scaffolds combining polyphenolic backbones with extended aromatic systems, integrating features of both Groups 1 and 2 and potentially offering novel binding mechanisms. Collectively, the three groups represent a continuum from structural conservation, through physicochemical diversity, to hybrid scaffolds that broaden the accessible chemical space of hDNMT1 inhibitors.
On the other hand, compounds of Set 2 (Figure 2) are chemically diverse and include heterocyclic compounds with polarized groups, oxygen and nitrogen-rich structures favouring electrostatic interactions, and aliphatic or aromatic derivatives with multifunctional groups that mimic the three-dimensional charge distribution of EGCG. Although the compounds in Set 2 do not share the polyphenolic backbone of EGCG, they offer innovative interaction profiles with the target due to their electrostatic and shape properties. Specifically, the fourteen candidate molecules (Compounds 821, Figure 2) can be organized into three scaffold-based families. Group 1 (Compounds 811) comprises polyphenolic frameworks rich in hydroxyl substituents and fused aromatic rings, preserving interaction motifs reminiscent of EGCG (pi–pi stacking and hydrogen bonding) that are pertinent to hDNMT1. Group 2 (Compounds 1217) consists of heteroaromatic and more heavily substituted derivatives (e.g., N-containing motifs, methoxy groups, and/or halogens), introducing greater electronic diversity and steric complexity that may support alternative binding orientations relative to Group 1. Group 3 (Compounds 1821) includes hybrid scaffolds that combine extended aromatic systems with polar functionalities (carbonyls, amides, additional hydroxyls), integrating elements of both previous groups and offering opportunities for mixed interaction patterns. Collectively, these families outline a continuum from conserved polyphenolic chemotypes (Group 1), through chemically and electronically diverse heteroaromatics (Group 2), to hybrids bridging the two extremes (Group 3), thereby broadening the chemical space relevant to hDNMT1 inhibitor design.
The binding pose evaluation indicated that compounds in Set 1 appear to preserve key interaction patterns with hDNMT1 through their natural resemblance to EGCG, while those in Set 2 provide chemically diverse scaffolds that may engage the enzyme via alternative binding modes (Table 1). Taken together, the two sets constitute a complementary library of candidate inhibitors, with Set 1 offering structural conservation and Set 2 contributing chemical diversity and innovation. Also, a Supplementary Figure S2 presenting a heatmap with the key residues’ interactions of the most promising compounds was also included in Supplementary Materials.
Considering the overall docking poses, Set 1 comprised several compounds with docking scores ranging from −8.446 to −10.206 kcal/mol. Among these, Compound 5 (Figure 1) emerged as the most promising candidate, exhibiting a docking score of −10.21 kcal·mol−1, which is markedly stronger than that of the co-crystallized reference ligand SAH (docking score = −8.61 kcal·mol−1). The present finding indicates a more favorable interaction with hDNMT1 isoform. The docking pose analysis revealed that Compound 5 establishes hydrogen bonds with critical residues, including Phe1145, Gly1150, Leu1151, Asn1578, and Val1580 (Figure 3), all of which are also engaged by SAH. This overlap supports the notion that the compound effectively mimics the binding profile of SAH. Furthermore, Compound 5 interacts through additional hydrogen bonds with Gly1149, Glu1266, and Arg1310, which, although not directly implicated in the SAH–hDNMT1 interface, are located in close spatial proximity to key binding residues. These auxiliary interactions are likely to enhance the stability of the complex, collectively underlining Compound 5 as a highly promising candidate for DNMT1 inhibition (Table 1). Moreover, Compound 7 (docking score = −9.39 kcal·mol−1) (Figure 1) forms hydrogen bonds with Gly1150, Leu1151, Glu1168, Val1580, and Asn1578, which are critical residues involved in SAH binding to hDNMT1. The compound also forms additional hydrogen bonds with Cys1148, and Gly1149, presenting that they could play a role in stabilizing the overall binding. This suggestion renders Compound 7 as a strong candidate for hDNMT1 inhibition (Table 1, Figure 3).
Compounds in Set 2 (Figure 2) displayed docking scores from −8.37 kcal·mol−1 to −10.51 kcal·mol−1 and demonstrated a fruitful interaction profile, supporting their potential as promising hDNMT1 inhibitors. Especially, Compound 10 shows the most favorable docking score equal to −10.51 kcal·mol−1, stronger compared to SAH’s docking score. Also, it interacts via the formation of hydrogen-bonds with the key residues Phe1145, Gly1150, Leu1151, Glu1168, Met1169, Cys1191, and Val1580, and also forms an additional hydrogen bond with Gly1223, offering a further stabilization to the binding (Table 1, Figure 3). Additionally, Compound 12 (docking score = −9.56 kcal·mol−1) presents interesting docking results. Its binding is stabilized through the creation of hydrogen bonds with the crucial amino acids Gly1150, Leu1151, Glu1168, Met1169, Asp1190, Cys1191, HB Val1580 (Table 1, Figure 3) and also the formation of hydrogen bonds with Cys1148, and Gly1149 reinforce the binding. Furthermore, Compounds 13 (docking score = −9.73 kcal·mol−1) and 16 (docking score = −9.80 kcal·mol−1) present interesting interaction pattern, including interaction similar to SAHs (Table 1, Figure 3). The ADMET properties of the four most promising compounds were predicted and illustrated in Table S2 (Supplementary Materials).
The computational results of the present study are consistent with previous reports describing the inhibitory potential of both natural and synthetic hDNMT1 inhibitors. EGCG has been widely studied for its role in modulating epigenetic regulation via DNMT1 inhibition [31], and the results reported that EGCG can directly inhibit DNMT1 activity and reactivate methylation-silenced genes in cancer cell lines [31]. Similarly, Lee et al. (2005) demonstrated that flavonoid compounds, including EGCG, can interfere with DNMT1 function through direct binding and catalytic inhibition [28]. The compounds identified in Set 1 share significant structural similarities with EGCG, particularly in their polyphenolic composition, which is known to contribute to DNMT1 inhibition. This correlation supports the reliability of the virtual screening and docking approach used in the present study. Additionally, the compounds in Set 2, though structurally distinct from EGCG, exhibit electrostatic and 3D similarities, which may explain their favorable docking interactions. Previous studies [32], have highlighted that hDNMT1 inhibitors do not necessarily require structural resemblance to known inhibitors; rather, key functional interactions are sufficient for effective inhibition. Recent in silico studies have further expanded the understanding of DNMT1 inhibition. For instance, Kritsi et al. (2024) [27] employed a combinatorial virtual screening approach for the identification of novel hDNMT1 inhibitors using pharmacophore models, molecular docking and molecular dynamics simulations. Their findings demonstrated that small-molecule inhibitors with specific pharmacophore features could achieve binding affinity to hDNMT1, similar to the top-ranked compounds in this study, reinforcing the significance of structure-based methodologies [27]. Additionally, Yin et al. (2020) used a hybrid in silico screening method that combined ligand-based and structure-based approaches to identify hDNMT1 inhibitors with strong pharmacokinetic properties [33]. Collectively, these comparative studies highlight the strengths of different computational techniques and indicate that a multi-faceted approach, integrating structure-based docking with pharmacokinetic and free energy analyses, could further enhance hDNMT1 inhibitor discovery. Overall, the compounds in Set 2 provide further evidence supporting the hypothesis that both structural and electrostatic considerations are crucial for hDNMT1 inhibition. The integration of recent computational approaches strengthens the reliability of these findings and suggests that future hDNMT1 inhibitor discovery can be further enhanced through hybrid in silico strategies.
To conclude, the in silico methodology indicated 21 compounds that were prioritized as potential hDNMT1 inhibitors, as they show the strongest potential to mimic SAH’s binding profile and subjected to further exploration. In order to further validate these suggestions, the compounds prioritized by docking were subsequently evaluated using the machine learning (ML) prediction framework. This dual approach ensured that the candidates identified as strong inhibitors in the structural context of the hDNMT1 active site were also supported by data-driven classification models. In particular, the overlap between docking-based binding affinities and ML-derived probabilities provided a mean of evaluating inhibitor potential, minimizing the risk of false positives associated with any single method.

3.2. Machine Learning

3.2.1. Statistical Insights into hDNMT1 Inhibitor Differentiation

Understanding the molecular characteristics that differentiate hDNMT1 inhibitors from non-inhibitors is fundamental for constructing interpretable and biologically relevant predictive models. To that end, an initial statistical analysis was conducted on 118 curated compounds from the ChEMBL database (https://www.ebi.ac.uk/chembl/, accessed on 20 May 2025), each labelled as “active” or “not-active” based on their experimentally validated IC50 values. Molecular descriptors were calculated for all compounds using RDKit software (https://www.rdkit.org/), yielding a diverse set of topological, electronic, and fragment-based features. These descriptors served as the foundation for understanding structural properties associated with hDNMT1 inhibitory potential.
To identify features with statistically significant class-based variation, the Mann–Whitney U test was applied across all descriptors. This non-parametric test is widely used in cheminformatics to compare distributions between two groups without assuming normality—an important consideration given the heterogeneous chemical nature of the dataset [34]. Deriving general structure–activity/selectivity relationship patterns for different subfamilies of cyclin-dependent kinase inhibitors using machine learning methods [34]. The analysis revealed 37 descriptors with p-values less than 0.05, indicating significant differences between active and not-active compounds. A summary of these findings is illustrated in Table 2.
Among the most discriminative features were MolWt (molecular weight), SPS (sum of polar surface areas), fr_C_O, fr_furan, NumAromaticCarbocycles, and NumSaturatedHeterocycles. Additionally, the electrotopological indices MaxEStateIndex and MinEStateIndex were identified as significant. These features span a range of structural dimensions—from fragment-level indicators to global physicochemical properties [35] suggesting that hDNMT1 inhibition is influenced by a multifactorial combination of steric, electronic, and sub-structural factors. Specifically, (a) fr_C_O represents the number of carbonyl (C=O) functional groups in the molecule and reflects potential hydrogen-bond acceptor capacity, (b) fr_furan counts the occurrence of furan rings, which are oxygen-containing heterocycles that may enhance pi–pi or dipole–dipole interactions with polar residues, (c) NumAromaticCarbocycles denotes the number of purely carbon-based aromatic rings, indicative of conjugated hydrophobic frameworks involved in stacking interactions, (d) NumSaturatedHeterocycles corresponds to the count of fully saturated non-aromatic rings containing at least one heteroatom (e.g., N, O, or S), often contributing to molecular flexibility and solubility. Also, MaxEStateIndex and MinEStateIndex describe the highest and lowest electrotopological state indices within the molecule, respectively, summarizing the distribution of electronic density and atom accessibility. These indices serve as electronic descriptors capturing the balance between electron-rich and electron-poor regions.
The biological relevance of these descriptors is strongly supported by prior studies of non-nucleoside hDNMT1 inhibitors [9]. Valente et al. (2014) demonstrated that effective hDNMT1 inhibitors frequently contain extended aromatic systems and polar substituents capable of engaging both the SAM and DNA binding sites of the enzyme [9]. Similarly, Uddin et al. (2021) [36] and Chen (2024) [10] highlighted the role of carbonyl and heterocyclic groups in enhancing binding affinity through hydrogen bonding and pi–pi interactions. These observations are consistent with the significance of fr_C_O, fr_furan, and NumAromaticCarbocycles identified in the present work.
Moreover, the elevated MolWt and SPS values observed among active compounds align with structural requirements for occupying the relatively large and polar hDNMT1 catalytic site. This is particularly relevant for coumarin-based and natural product–derived inhibitors, which often feature fused aromatic cores with polar or hydrogen bond–accepting groups. Compounds such as RG108 and MC3353 exemplify this pattern, as shown in previous structural studies [36,37]. Identification of a novel quinoline-based DNA demethylating compound highly potent in cancer cells [36,37].
The inclusion of electronic state indices (MaxEStateIndex, MinEStateIndex) further reinforces the notion that electronic distribution across the molecular framework plays a role in binding interactions. These indices, though less interpretable in isolation, have been widely used in quantitative structure–activity relationship (QSAR) modelling and have shown utility in predicting ligand–protein interactions [38,39].
From a modelling perspective, this statistically guided feature selection serves multiple purposes. First, it ensures that the input features for machine learning are not only statistically grounded but also chemically interpretable—enhancing the model’s transparency. Second, it acts as a regularization step, reducing dimensionality and mitigating overfitting risks, especially in small-to-moderate-sized datasets. This approach is consistent with best practices in cheminformatics, where a combination of statistical filtering and domain knowledge has been shown to improve model robustness and generalizability [40].

3.2.2. Machine Learning Model Performance and Feature Refinement

Following statistical filtering, the subset of 37 descriptors identified as significantly different between hDNMT1 active and not-active compounds was used as input for machine learning modelling. A total of 14 supervised classifiers were implemented to identify the most effective algorithmic strategy for predictive modelling. These included decision trees, support vector machines, ensemble learners (e.g., Random Forest, Extra Trees, Bagging), and boosting-based methods such as AdaBoost, Gradient Boost, and XGBoost.
To ensure robust performance evaluation, each model was trained and tested using a randomized 70:30 split of the ChEMBL dataset. Given the inherent stochasticity of training processes—particularly in algorithms such as XGBoost or neural network-based classifiers—the modelling and evaluation cycle was repeated across 50 independent epochs. This strategy allowed for the calculation of averaged performance metrics and helped mitigate random fluctuations or overfitting artifacts due to specific training–test splits.
Among the tested classifiers, the XGBoost algorithm consistently outperformed all others in terms of classification accuracy and discriminatory power. The XGBoost classifier was selected not only for its highest mean accuracy and AUC but also for interpretability through built-in feature-importance analysis. Although 14 classifiers were initially evaluated, only the best-performing XGBoost model is presented in detail, as the purpose of this study was methodological integration rather than algorithmic benchmarking. The model achieved a mean accuracy of 97.5% and an average area under the ROC curve (AUC) of 0.9894 across all epochs. These metrics reflect a high level of sensitivity and specificity in distinguishing hDNMT1 inhibitors from non-inhibitors. Furthermore, class-wise accuracy rates demonstrated balanced performance, with average accuracies of 95.4% for active compounds and 96.3% for not-active compounds. The ROC curve generated from the XGBoost classifier (Figure 4) showed strong separation between the two classes, further affirming the model’s diagnostic strength.
The success of XGBoost is consistent with its widespread application in cheminformatics, where it is known for its ability to handle high-dimensional descriptor spaces and reduce generalization error through gradient-based boosting and in-built regularization [41]. Prior QSAR studies have shown that boosting algorithms, and XGBoost in particular, are well-suited for identifying subtle, nonlinear structure–activity relationships in pharmacological datasets.
In addition to its performance, XGBoost offers the advantage of built-in feature importance analysis, which was used here to refine the descriptor set from 37 statistically significant features to a core of the eight most informative. These were selected based on their average importance gain across all epochs and included: MaxEStateIndex, MinEStateIndex, SPS, MolWt, NumAromaticCarbocycles, NumSaturatedHeterocycles, fr_C_O, and fr_furan. These results are summarized in Table 3, which presents the top-ranked descriptors exhibiting both statistical significance and high model-based importance.
To complement this selection, box plots were generated for these eight descriptors (Figure S1, Supplementary Materials), illustrating their distribution across active and not-active compound classes. The visual separation of descriptor values between classes highlights their discriminative potential and reinforces their relevance for prediction. MolWt and SPS displayed higher median values in active compounds, while fragment-based descriptors such as fr_C_O and fr_furan were enriched in molecules known to inhibit hDNMT1.
This two-tiered feature reduction approach—combining univariate statistical testing with model-driven ranking—offers an effective compromise between interpretability and predictive power. It reflects a growing best practice in cheminformatics, where domain-informed descriptor curation is layered with algorithmic refinement to develop more transparent, reproducible models [42]. The resulting XGBoost model, both high-performing and chemically interpretable, was thus employed for all subsequent compound prediction tasks.

3.2.3. Prediction of hDNMT1 Activity in Derived Docking Hits, Structural Groupings, and Discussion

The finalized XGBoost classifier was applied to a panel of 21 external compounds comprising derived docking hits. Descriptor values were computed with RDKit software and normalized to the ChEMBL training dataset. The results revealed that most compounds were consistently predicted as active, often across all epochs, while a small subset exhibited inconsistent classifications (Table 4).
Examination of the chemical scaffolds demonstrated that the compounds could be broadly categorized into flavonoid-like and polyphenolic scaffolds, phenolic and heterocyclic phenols, glycosides. These groupings corresponded closely to the descriptors prioritized during model refinement, underscoring the biological plausibility of the predictions.
Several compounds, including Compound 12, Compound 13, Compound 15, and Compound 18, were identified as flavonoid-like scaffolds. These molecules are characterized by multiple aromatic rings, hydroxyl substituents, and moderate rigidity, which result in elevated values for descriptors such as molecular weight, number of aromatic carbocycles, and polar surface area. The high consensus classification of these scaffolds as hDNMT1 inhibitors is consistent with literature evidence that flavonoids and related polyphenols exert inhibitory activity through pi–pi stacking with hDNMT1 residues and hydrogen bonding interactions [43,44].
A second group consisted of phenolic derivatives and heterocyclic phenols, including compounds such as Compound 5, Compound 10, Compound 16, and Compound 20. These molecules are enriched in hydroxyl and carbonyl substituents, contributing to elevated polar surface area and fragment-based descriptor values (e.g., fr_C_O, fr_furan). Such compounds are frequently reported among natural hDNMT inhibitors, particularly in phenolic-rich dietary phytochemicals [44] supporting their predicted activity.
In addition, molecules clustered within phenolic derivatives and heterocyclic phenols, sharing features such as hydroxyl and carbonyl substituents, aromatic ring systems, and moderate molecular weight. These structural traits are reflected in their elevated values for descriptors such as polar surface area (SPS), fragment counts (fr_C_O and fr_furan), and electrotopological indices (MaxEStateIndex and MinEStateIndex). The reliable classification of AT compounds as active inhibitors underscores the model’s ability to generalize beyond the ChEMBL training set and suggests that these synthetic derivatives occupy a similar chemical space to known natural hDNMT1 inhibitors. The structural similarity of this group to established non-nucleoside inhibitors described in the literature [44] reinforces the plausibility of these predictions.
Finally, two compounds—Compound 9 and Compound 21—were predicted not-active, with only 24% and 4% confidence activity classifying them as active, respectively. Both belong to the group of small rigid heterocycles. Their relatively low molecular weight, fewer aromatic carbocycles, and limited fragment diversity likely positioned them near the classification boundary of the model, resulting in unstable predictions. This observation is consistent with QSAR-based studies showing that smaller heterocyclic scaffolds without extended aromaticity are less reliable hDNMT1 inhibitors unless specific substituents are optimized [45].
Overall, the scaffold distribution among the 21 compounds demonstrates strong alignment between predicted activity and known structural classes of hDNMT1 inhibitors. Flavonoid-like molecules, phenolic derivatives, glycosides, were consistently predicted as actives, while small rigid heterocycles showed weaker alignment. This convergence between model predictions and literature reports constitutes indirect validation of the predictive framework. Similar hybrid docking–machine learning strategies have been shown to improve hit prioritization in epigenetic drug discovery and enzyme inhibitor screening [46]. The fact that our model independently identified scaffolds already reported as hDNMT1 inhibitors provides compelling evidence of its reliability and highlights its utility for guiding future experimental testing.
Comparison of the ML outputs with docking results revealed substantial agreement between the two methodologies. Compounds in Set 1, which achieved the most favorable docking scores (−8.40 to −10.20 kcal·mol−1), were consistently classified as active inhibitors by the ML consensus model. These molecules exhibited structural features—such as high molecular weight, enriched polar surface area, and aromatic carbocycles—that correspond closely to descriptors highlighted as significant during ML feature selection. By contrast, Set 2 compounds, which displayed greater structural heterogeneity in docking, were also recognized by the ML model as candidates with high activity probabilities, consistent with their enriched electrostatic interaction profiles. Taken together, these findings demonstrate that docking-based interaction patterns and ML-based descriptor importance converge on similar chemical determinants of hDNMT1 inhibition, lending robustness to the prioritization process.

3.2.4. Retrospective Validation with Previously Tested hDNMT1 Inhibitors

To further evaluate the reliability of the XGBoost model, we conducted a retrospective validation using five compounds—bergenin, orientin, phlorizin, 2-(3-(3,4-dimethoxyphenyl)-3-(2-((2-oxo-2H-chromen-7-yl)oxy)acetamido)propanamido)acetic acid and 2-[(7,8-dihydroxy-6-undecylphenazin-2-yl)formamido]pentanedioic acid—that had been previously tested and confirmed as hDNMT1 inhibitors in our earlier in vitro studies [27]. These compounds were reintroduced into the predictive pipeline under identical pre-processing and normalization conditions, allowing for an unbiased assessment of the model’s predictive power.
The model classified all five compounds as active inhibitors with 100% consensus across 50 epochs, demonstrating complete agreement with the experimental outcomes (Table 5). This level of consistency indicates not only statistical robustness but also strong pharmacological relevance of the predictive framework. Importantly, the structural scaffolds of these compounds encompass a wide chemical diversity—two complex acid derivatives (2-(3-(3,4-dimethoxyphenyl)-3-(2-((2-oxo-2H-chromen-7-yl)oxy)acetamido)propanamido)acetic acid and 2-[(7,8-dihydroxy-6-undecylphenazin-2-yl)formamido]pentanedioic acid), to glycosylated flavonoids (phlorizin and orientin), to polyphenolic C-glycosides (bergenin). Their correct classification underscores the model’s ability to generalize across structurally distinct chemical spaces.
The structural features of these validated inhibitors closely align with the descriptors identified as significant in the feature selection process. For example, bergenin and orientin are enriched in aromatic carbocycles, hydroxyl substituents, and sugar moieties, which contribute to elevated values in molecular weight, polar surface area, and fragment counts such as fr_C_O. Phlorizin, as a flavonoid glycoside, shares similar descriptor patterns, reinforcing the model’s sensitivity to polyphenolic and glycosylated scaffolds. The latter two compounds, while formally derivatives of acetic and pentanedioic acid, incorporate extended aromatic systems, heteroatoms, and bulky substituents that contribute to high values for descriptors such as molecular weight, sum of polar surface area, and electrotopological indices. Their accurate classification suggests that the model effectively integrates both fragment-level contributions (e.g., carbonyl and heteroaromatic substituents) and global physicochemical parameters, capturing the key structural determinants of hDNMT1 inhibition.
Their accurate classification suggests that the model effectively integrates both fragment-level features and global physicochemical properties to capture hDNMT1 inhibitory potential. The fact that these structurally complex derivatives were all predicted with 100% consensus corroborates that the model’s decision boundaries are not simply favoring minimal or canonical inhibitor motifs but effectively capturing the multi-dimensional structure–activity relationships (Steric, Electronic, Fragmental) required for hDNMT1 inhibition. Literature reviews [47] similarly emphasize that non-nucleoside inhibitors with rich aromatic architecture, heterocycles, and polar substituents often yield potent hDNMT1 activity, reinforcing the view that your predictive framework is working in alignment with established SAR insights. By correctly predicting inhibitors that include unusual or bulky substituents, the model’s validity is strongly supported not only on statistical grounds but also in chemical and biological plausibility, increasing confidence in its use for discovering novel hDNMT1 inhibitors.
The alignment of computational predictions with prior experimental outcomes provides a powerful demonstration of external validity. Similar validation strategies are increasingly recognized as best practice in cheminformatics, where retrospective evaluation against experimentally confirmed molecules is used to benchmark predictive performance [46]. By successfully predicting compounds of diverse structural classes with known inhibitory activity, the present model demonstrates both sensitivity to relevant chemical features and robustness across heterogeneous scaffolds.
Taken together, the retrospective validation results confirm that the XGBoost model is not only statistically reliable but also experimentally meaningful. The complete agreement with prior in vitro findings provides strong support for the model’s use in prospective inhibitor discovery, bridging computational predictions with pharmacological reality.
Overall, the agreement between molecular docking and machine learning predictions underscores the robustness of the proposed computational framework. The convergence of high docking scores and elevated classification probabilities for the same scaffolds suggests that these compounds represent reliable DNMT1 inhibitor candidates. While docking reflects the structural complementarity between ligands and key catalytic residues, the machine learning model captures chemoinformatic patterns—such as functional group frequency and electronic state distribution—that statistically differentiate active from inactive structures. The combined interpretation of these two methodologies provides a consistent, multidimensional validation of the most promising inhibitors and reinforces the predictive power of the integrated structure- and data-driven approach developed in this study.

4. Conclusions

The convergence of docking and machine learning predictions in this study provides compelling evidence for the robustness of the proposed framework. Docking analyses highlighted the capacity of candidate ligands to form key interactions with the hDNMT1 binding pocket, particularly with residues such as Phe1145, Cys1191, Met1169, and Asn1578, while ML-based classification emphasized physicochemical descriptors—including MolWt, SPS, and aromatic substructure counts—that underpin these interactions. The agreement between structural and statistical perspectives is consistent with previous reports demonstrating the value of hybrid pipelines in epigenetic drug discovery, where docking-based binding hypotheses are reinforced by data-driven prioritization strategies. Such integrative approaches reduce the likelihood of false positives, enhance interpretability, and improve hit enrichment compared with either method in isolation. The present findings therefore underscore the importance of multi-faceted computational strategies in hDNMT1 inhibitor discovery and provide a strong rationale for their continued application in the identification of chemically diverse, biologically relevant scaffolds. Finally, because the framework is modular and relies on public descriptors and docking engines, it can be readily applied to other epigenetic regulators such as DNMT3A/B or HDAC isoforms, facilitating broader virtual screening of epigenetic modulators.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app152211984/s1, Figure S1: Box plots of the eight top-ranked descriptors, illustrating their distribution across active and not-active compound classes; Figure S2: Heatmap presenting the top candidates and the key residue interactions; Table S1: Final selected compounds from virtual screening. The table include the compound library, compound name, similarity score, screening method, and SMILES code; Table S2: Predicted ADMET properties of the most promising compounds, using ADMETlab 3.0 open source software.

Author Contributions

Conceptualization, P.C., D.C. and E.K.; methodology, P.C., E.C., M.Z., I.M., D.C. and E.K.; software, P.C., E.C., M.Z., I.M., D.C. and E.K.; validation, P.C., D.C. and E.K.; formal analysis, P.C., E.C., M.Z., I.M., C.K., V.J.S., D.C. and E.K.; investigation, P.C., E.C., M.Z., I.M., C.K., V.J.S., D.C. and E.K.; resources, P.C., E.C., M.Z., I.M., C.K., V.J.S., D.C. and E.K.; data curation, P.C., M.Z., V.J.S., D.C. and E.K.; writing—original draft preparation, P.C., M.Z., V.J.S., D.C. and E.K.; writing—review and editing, P.C., M.Z., V.J.S., D.C. and E.K.; visualization, P.C., D.C. and E.K.; supervision, D.C. and E.K.; project administration, D.C. and E.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article and Supplementary Materials.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Bird, A. Perceptions of Epigenetics. Nature 2007, 447, 396–398. [Google Scholar] [CrossRef]
  2. Bird, A. DNA Methylation: Mega-Year Inheritance with the Help of Darwin. Curr. Biol. 2020, 30, 319–321. [Google Scholar] [CrossRef]
  3. Wilkinson, A.L.; Zorzan, I.; Rugg-Gunn, P.J. Epigenetic Regulation of Early Human Embryo Development. Cell Stem Cell 2023, 30, 1569–1584. [Google Scholar] [CrossRef]
  4. Farsetti, A.; Illi, B.; Gaetano, C. How Epigenetics Impacts on Human Diseases. Eur. J. Intern. Med. 2023, 114, 15–22. [Google Scholar] [CrossRef]
  5. Davletgildeeva, A.T.; Kuznetsov, N.A. The Role of DNMT Methyltransferases and TET Dioxygenases in the Maintenance of the DNA Methylation Level. Biomolecules 2024, 14, 1117. [Google Scholar] [CrossRef]
  6. Linde, J.; Zimmer-Bensch, G. DNA Methylation-Dependent Dysregulation of GABAergic Interneuron Functionality in Neuropsychiatric Diseases. Front. Neurosci. 2020, 14, 586133. [Google Scholar] [CrossRef]
  7. Tsymbalova, E.A.; Chernyavskaya, E.A.; Ryzhkova, D.E.; Bisaga, G.N.; Abdurasulova, I.N.; Lioudyno, V.I. Changes in DNMT1 Expression as a Marker of Epigenetic Regulation Disturbanses in Multiple Sclerosis Patients. Med. Acad. J. 2023, 23, 41–53. [Google Scholar] [CrossRef]
  8. Hazra, A.; Bose, P.; Sunita, P.; Pattanayak, S.P. Molecular Epigenetic Dynamics in Breast Carcinogenesis. Arch. Pharm. Res. 2021, 44, 741–763. [Google Scholar] [CrossRef]
  9. Valente, S.; Liu, Y.; Schnekenburger, M.; Zwergel, C.; Cosconati, S.; Gros, C.; Tardugno, M.; Labella, D.; Florean, C.; Minden, S.; et al. Selective Non-Nucleoside Inhibitors of Human DNA Methyltransferases Active in Cancer Including in Cancer Stem Cells. J. Med. Chem. 2014, 57, 701–713. [Google Scholar] [CrossRef]
  10. Chen, T.; Mahdadi, S.; Vidal, M.; Desbène-Finck, S. Non-Nucleoside Inhibitors of DNMT1 and DNMT3 for Targeted Cancer Therapy. Pharmacol. Res. 2024, 207, 107328. [Google Scholar] [CrossRef]
  11. Dai, W.; Qiao, X.; Fang, Y.; Guo, R.; Bai, P.; Liu, S.; Li, T.; Jiang, Y.; Wei, S.; Na, Z.; et al. Epigenetics-Targeted Drugs: Current Paradigms and Future Challenges. Sig. Transduct. Target Ther. 2024, 9, 332. [Google Scholar] [CrossRef]
  12. Danieli, M.G.; Casciaro, M.; Paladini, A.; Bartolucci, M.; Sordoni, M.; Shoenfeld, Y.; Gangemi, S. Exposome: Epigenetics and Autoimmune Diseases. Autoimm. Rev. 2024, 23, 103584. [Google Scholar] [CrossRef]
  13. Shamsi, M.B.; Firoz, A.S.; Imam, S.N.; Alzaman, N.; Samman, M.A. Epigenetics of Human Diseases and Scope in Future Therapeutics. J. Taibah Univ. Med. Sci. 2017, 12, 205–211. [Google Scholar] [CrossRef]
  14. Lu, W.; Zhang, R.; Jiang, H.; Zhang, H.; Luo, C. Computer-Aided Drug Design in Epigenetics. Front. Chem. 2018, 6, 57. [Google Scholar] [CrossRef]
  15. Ning, X.; Karypis, G. In Silico Structure-Activity-Relationship (SAR) Models from Machine Learning: A Review. Drug Dev. Res. 2011, 72, 138–146. [Google Scholar] [CrossRef]
  16. Temml, V.; Kutil, Z. Structure-Based Molecular Modeling in SAR Analysis and Lead Optimization. Comput. Struct. Biotechnol. J. 2021, 19, 1431–1444. [Google Scholar] [CrossRef]
  17. Jamal, S.; Goyal, S.; Shanker, A.; Grover, A. Machine Learning and Molecular Dynamics Based Insights into Mode of Actions of Insulin Degrading Enzyme Modulators. Comb. Chem. High Throughput Screen. 2017, 20, 279–291. [Google Scholar] [CrossRef]
  18. Crampon, K.; Giorkallos, A.; Deldossi, M.; Baud, S.; Steffenel, L.A. Machine-Learning Methods for Ligand–Protein Molecular Docking. Drug Discov. Today 2022, 27, 151–164. [Google Scholar] [CrossRef]
  19. Boczar, D.; Michalska, K. A Review of Machine Learning and QSAR/QSPR Predictions for Complexes of Organic Molecules with Cyclodextrins. Molecules 2024, 29, 3159. [Google Scholar] [CrossRef]
  20. Wu, H.; Liu, J.; Zhang, R.; Lu, Y.; Cui, G.; Cui, Z.; Ding, Y. A Review of Deep Learning Methods for Ligand Based Drug Virtual Screening. Fundam. Res. 2024, 4, 715–737. [Google Scholar] [CrossRef]
  21. Liu, X.; Jiang, S.; Duan, X.; Vasan, A.; Liu, C.; Tien, C.; Ma, H.; Brettin, T.; Xia, F.; Foster, I.T.; et al. Binding Affinity Prediction: From Conventional to Machine Learning-Based Approaches. arXiv 2024, arXiv:2410.00709. [Google Scholar] [CrossRef]
  22. Bragina, M.E.; Daina, A.; Perez, M.A.S.; Michielin, O.; Zoete, V. The SwissSimilarity 2021 Web Tool: Novel Chemical Libraries and Additional Methods for an Enhanced Ligand-Based Virtual Screening Experience. Int. J. Mol. Sci. 2022, 23, 811. [Google Scholar] [CrossRef]
  23. Saavedra, O.M.; Isakovic, L.; Llewellyn, D.B.; Zhan, L.; Bernstein, N.; Claridge, S.; Raeppel, F.; Vaisburg, A.; Elowe, N.; Petschner, A.J.; et al. SAR around (l)-S-Adenosyl-l-Homocysteine, an Inhibitor of Human DNA Methyltransferase (DNMT) Enzymes. Bioorg. Med. Chem. Lett. 2009, 19, 2747–2751. [Google Scholar] [CrossRef]
  24. Kochnev, Y.; Hellemann, E.; Cassidy, K.C.; Durrant, J.D. Webina: An Open-Source Library and Web App That Runs AutoDock Vina Entirely in the Web Browser. Bioinformatics 2020, 36, 4513–4515. [Google Scholar] [CrossRef]
  25. Xiong, G.; Wu, Z.; Yi, J.; Fu, L.; Yang, Z.; Hsieh, C.; Yin, M.; Zeng, X.; Wu, C.; Lu, A.; et al. ADMETlab 2.0: An Integrated Online Platform for Accurate and Comprehensive Predictions of ADMET Properties. Nucleic Acids Res. 2021, 49, 5–14. [Google Scholar] [CrossRef]
  26. Ladika, G.; Christodoulou, P.; Kritsi, E.; Tsiaka, T.; Sotiroudis, G.; Cavouras, D.; Sinanoglou, V.J. Exploring Postharvest Metabolic Shifts and NOX2 Inhibitory Potential in Strawberry Fruits and Leaves via Untargeted LC-MS/MS and Chemometric Analysis. Metabolites 2025, 15, 321. [Google Scholar] [CrossRef]
  27. Kritsi, E.; Christodoulou, P.; Tsiaka, T.; Georgiadis, P.; Zervou, M. A Computational Approach for the Discovery of Novel DNA Methyltransferase Inhibitors. Curr. Issues Mol. Biol. 2024, 46, 3394–3407. [Google Scholar] [CrossRef]
  28. Lee, W.J.; Shim, J.-Y.; Zhu, B.T. Mechanisms for the Inhibition of DNA Methyltransferases by Tea Catechins and Bioflavonoids. Mol. Pharmacol. 2005, 68, 1018–1030. [Google Scholar] [CrossRef]
  29. Zsidó, B.Z.; Hetényi, C. Molecular Structure, Binding Affinity, and Biological Activity in the Epigenome. Int. J. Mol. Sci. 2020, 21, 4134. [Google Scholar] [CrossRef]
  30. Zhang, Z.-M.; Liu, S.; Lin, K.; Luo, Y.; Perry, J.J.; Wang, Y.; Song, J. Crystal Structure of Human DNA Methyltransferase 1. J. Mol. Biol. 2015, 427, 2520–2531. [Google Scholar] [CrossRef]
  31. Fang, M.Z.; Wang, Y.; Ai, N.; Hou, Z.; Sun, Y.; Lu, H.; Welsh, W.; Yang, C.S. Tea Polyphenol (−)-Epigallocatechin-3-Gallate Inhibits DNA Methyltransferase and Reactivates Methylation-Silenced Genes in Cancer Cell Lines. Cancer Res. 2003, 63, 7563–7570. [Google Scholar]
  32. Medina-Franco, J.L.; Méndez-Lucio, O.; Dueñas-González, A.; Yoo, J. Discovery and Development of DNA Methyltransferase Inhibitors Using in Silico Approaches. Drug Discov. Today 2015, 20, 569–577. [Google Scholar] [CrossRef]
  33. Yin, J. DNA Methyltransferase and Its Clinical Applications. IOP Conf. Ser. Earth Environ. Sci. 2020, 512, 012082. [Google Scholar] [CrossRef]
  34. Kaveh, S.; Mani-Varnosfaderani, A.; Neiband, M.S. Deriving General Structure–Activity/Selectivity Relationship Patterns for Different Subfamilies of Cyclin-Dependent Kinase Inhibitors Using Machine Learning Methods. Sci. Rep. 2024, 14, 15315. [Google Scholar] [CrossRef]
  35. Kaneko, H. Molecular Descriptors, Structure Generation, and Inverse QSAR/QSPR Based on SELFIES. ACS Omega 2023, 8, 21781–21786. [Google Scholar] [CrossRef]
  36. Uddin, M.G.; Fandy, T.E. DNA Methylation Inhibitors: Retrospective and Perspective View. Adv. Cancer Res. 2021, 152, 205–223. [Google Scholar] [CrossRef]
  37. Zwergel, C.; Schnekenburger, M.; Sarno, F.; Battistelli, C.; Manara, M.C.; Stazi, G.; Mazzone, R.; Fioravanti, R.; Gros, C.; Ausseil, F.; et al. Identification of a Novel Quinoline-Based DNA Demethylating Compound Highly Potent in Cancer Cells. Clin. Epigenet. 2019, 11, 68. [Google Scholar] [CrossRef]
  38. Danishuddin; Khan, A.U. Descriptors and Their Selection Methods in QSAR Analysis: Paradigm for Drug Design. Drug Discov. Today 2016, 21, 1291–1302. [Google Scholar] [CrossRef]
  39. Comesana, A.E.; Huntington, T.T.; Scown, C.D.; Niemeyer, K.E.; Rapp, V.H. A Systematic Method for Selecting Molecular Descriptors as Features When Training Models for Predicting Physiochemical Properties. Fuel 2022, 321, 123836. [Google Scholar] [CrossRef]
  40. Kausar, S.; Falcao, A.O. An Automated Framework for QSAR Model Building. J. Cheminform. 2018, 10, 1. [Google Scholar] [CrossRef]
  41. Riniker, S.; Landrum, G.A. Open-Source Platform to Benchmark Fingerprints for Ligand-Based Virtual Screening. J. Cheminform. 2013, 5, 26. [Google Scholar] [CrossRef]
  42. Goodarzi, M.; Dejaegher, B.; Heyden, Y.V. Feature Selection Methods in QSAR Studies. J. AOAC Int. 2012, 95, 636–651. [Google Scholar] [CrossRef]
  43. Prado-Romero, D.L.; Saldívar-González, F.I.; López-Mata, I.; Laurel-García, P.A.; Durán-Vargas, A.; García-Hernández, E.; Sánchez-Cruz, N.; Medina-Franco, J.L. De Novo Design of Inhibitors of DNA Methyltransferase 1: A Critical Comparison of Ligand- and Structure-Based Approaches. Biomolecules 2024, 14, 775. [Google Scholar] [CrossRef]
  44. Saldívar-González, F.I.; Gómez-García, A.; Chávez-Ponce de León, D.E.; Sánchez-Cruz, N.; Ruiz-Rios, J.; Pilón-Jiménez, B.A.; Medina-Franco, J.L. Inhibitors of DNA Methyltransferases From Natural Sources: A Computational Perspective. Front. Pharmacol. 2018, 9, 1144. [Google Scholar] [CrossRef]
  45. Phanus-Umporn, C.; Prachayasittikul, V.; Nantasenamat, C.; Prachayasittikul, S.; Prachayasittikul, V. QSAR-Driven Rational Design of Novel DNA Methyltransferase 1 Inhibitors. EXCLI J. 2020, 19, 458–475. [Google Scholar] [CrossRef]
  46. Sánchez-Cruz, N.; Medina-Franco, J.L. Epigenetic Target Profiler: A Web Server to Predict Epigenetic Targets of Small Molecules. J. Chem. Inf. Model. 2021, 61, 1550–1554. [Google Scholar] [CrossRef] [PubMed]
  47. Zhang, Z.; Wang, G.; Li, Y.; Lei, D.; Xiang, J.; Ouyang, L.; Wang, Y.; Yang, J. Recent Progress in DNA Methyltransferase Inhibitors as Anticancer Agents. Front. Pharmacol. 2022, 13, 1072651. [Google Scholar] [CrossRef]
Figure 1. Chemical structures of EGCG (reference compound) and the seven candidate compounds (17) grouped by scaffold similarity (Set 1). Group 1 (12): polyphenolic scaffolds; Group 2 (35): structurally diverse derivatives with methoxy/acyl substitutions; Group 3 (67): hybrid scaffolds combining polyphenolic and extended aromatic features. Compounds IDs: Compound 1: AT358-LC01710, Compound 2: ZINC000004098238, Compound 3: AT358-LC01681, Compound 4: AT358-LC03095, Compound 5: AT358-LC01638, Compound 6: AT358-LC03398 and Compound 7: AT358-MD21478.
Figure 1. Chemical structures of EGCG (reference compound) and the seven candidate compounds (17) grouped by scaffold similarity (Set 1). Group 1 (12): polyphenolic scaffolds; Group 2 (35): structurally diverse derivatives with methoxy/acyl substitutions; Group 3 (67): hybrid scaffolds combining polyphenolic and extended aromatic features. Compounds IDs: Compound 1: AT358-LC01710, Compound 2: ZINC000004098238, Compound 3: AT358-LC01681, Compound 4: AT358-LC03095, Compound 5: AT358-LC01638, Compound 6: AT358-LC03398 and Compound 7: AT358-MD21478.
Applsci 15 11984 g001
Figure 2. Chemical structures of EGCG (reference compound) and the fourteen candidate compounds (821) grouped by scaffold similarity (Set 2). Group 1 (811) polyphenolic/flavonoid-like scaffolds; Group 2 (1217) heteroaromatic/substituted derivatives; Group 3 (1821) hybrid scaffolds with extended aromatic and polar features. Compounds IDs: Compound 8: ZINC000005214162, Compound 9: ZINC000005999086, Compound 10: ZINC000003937611, Compound 11: ZINC000257200127, Compound 12: ZINC000253409835, Compound 13: ZINC000253414457, Compound 14: ZINC000085384287, Compound 15: ZINC00006581553, Compound 16: ZINC00002311470, Compound 17: ZINC000019691261, Compound 18: ZINC000017323662, Compound 19: ZINC000015113824, Compound 20: ZINC000014240802, Compound 21: ZINC000257200127.
Figure 2. Chemical structures of EGCG (reference compound) and the fourteen candidate compounds (821) grouped by scaffold similarity (Set 2). Group 1 (811) polyphenolic/flavonoid-like scaffolds; Group 2 (1217) heteroaromatic/substituted derivatives; Group 3 (1821) hybrid scaffolds with extended aromatic and polar features. Compounds IDs: Compound 8: ZINC000005214162, Compound 9: ZINC000005999086, Compound 10: ZINC000003937611, Compound 11: ZINC000257200127, Compound 12: ZINC000253409835, Compound 13: ZINC000253414457, Compound 14: ZINC000085384287, Compound 15: ZINC00006581553, Compound 16: ZINC00002311470, Compound 17: ZINC000019691261, Compound 18: ZINC000017323662, Compound 19: ZINC000015113824, Compound 20: ZINC000014240802, Compound 21: ZINC000257200127.
Applsci 15 11984 g002
Figure 3. Representative docking poses of (A) Compound 5 (Set 1), (B) Compound 7 (Set 1), (C) Compound 10 (Set 2) and (D) Compound 12 (Set 2) into the catalytic domain of hDNMT1. Hydrogen bonds are depicted with yellow dashed lines.
Figure 3. Representative docking poses of (A) Compound 5 (Set 1), (B) Compound 7 (Set 1), (C) Compound 10 (Set 2) and (D) Compound 12 (Set 2) into the catalytic domain of hDNMT1. Hydrogen bonds are depicted with yellow dashed lines.
Applsci 15 11984 g003
Figure 4. (A) Receiver Operating Characteristic (ROC) curves of the XGBoost classifier across five epochs. (B) Most important features offering to the classification of Active and Not-active compounds. (C) The model achieved a mean overall accuracy of 97.5%, with class-wise accuracies of 95.4% for active compounds and 96.3% for not-active compounds. Across epochs, the classifier correctly predicted an average of 57 not-active compounds with only 2 misclassifications, and 57 active compounds with an average of 3 misclassifications. The mean area under the ROC curve (AUC) was 0.9894, indicating excellent discriminatory power. These metrics confirm the stability and reproducibility of the XGBoost model in distinguishing hDNMT1 inhibitors from non-inhibitors. It should be noted that Figure 4 presents representative results from five randomly selected epochs out of the fifty total iterations performed during model training. Also, the test set comprised 30% of the total dataset. The consistent model performance across all epochs demonstrates the high reproducibility and generalization capacity of the XGBoost classifier, confirming that the trends illustrated in the representative epochs are reflective of the overall training behavior.
Figure 4. (A) Receiver Operating Characteristic (ROC) curves of the XGBoost classifier across five epochs. (B) Most important features offering to the classification of Active and Not-active compounds. (C) The model achieved a mean overall accuracy of 97.5%, with class-wise accuracies of 95.4% for active compounds and 96.3% for not-active compounds. Across epochs, the classifier correctly predicted an average of 57 not-active compounds with only 2 misclassifications, and 57 active compounds with an average of 3 misclassifications. The mean area under the ROC curve (AUC) was 0.9894, indicating excellent discriminatory power. These metrics confirm the stability and reproducibility of the XGBoost model in distinguishing hDNMT1 inhibitors from non-inhibitors. It should be noted that Figure 4 presents representative results from five randomly selected epochs out of the fifty total iterations performed during model training. Also, the test set comprised 30% of the total dataset. The consistent model performance across all epochs demonstrates the high reproducibility and generalization capacity of the XGBoost classifier, confirming that the trends illustrated in the representative epochs are reflective of the overall training behavior.
Applsci 15 11984 g004
Table 1. Docking scores and interaction pattern into hDNMT1 binding site of the most promising compounds of Set 1 and Set 2. Residues that form similar interactions with SAH are presented in bold.
Table 1. Docking scores and interaction pattern into hDNMT1 binding site of the most promising compounds of Set 1 and Set 2. Residues that form similar interactions with SAH are presented in bold.
CompoundsDocking Score (kcal·mol −1)Interaction Pattern
S-Adenosyl-L-Homocysteine (SAH)
(crystal structure)
−8.61pi-pi 1 Phe1145, HB 2 Gly1150, HB Leu1151, Glu1168, HB Met1169, HB Asp1190, HB Cys1191, HB Asn1578, HB Val1580
Epigallocatechin-3-gallate (EGCG)−7.36HB Met1169, HB Gln1227, HB Arg1574, HB Asn1578
Set 1
1−8.62HB Phe1145, HB Met1169, HB Cys1191, HB Asp1196, HB Glu1266, HB Arg1310, HB Arg1312, HB Asn1578
2−8.45HB Cys1148, HB Gly1149, HB Gly1150, HB Leu1151, HB Glu1168, HB Asn1578, HB Val1580
3−8.09HB Phe1145, HB Glu1168, HB Asn1578, HB Val1580
4−8.68HB Phe1145, HB Glu1168, HB Cys1191, HB Asn1578
5−10.21HB Phe1145, HB Gly1149, HB Gly1150, HB Leu1151, HB Glu1266, HB Arg1310, HB Asn1578, HB Val1580
6−8.20HB Glu1168, HB Met1169, HB Asn1578, HB Val1580, HB Glu1266, HB Gly1223
7−9.39HB Cys1148, HB Gly1149, HB Gly1150, HB Leu1151, HB Glu1169, HB Asn1578, HB Val1580
Set 2
8−9.10HB Leu1151, HB Glu1169, HB Asn1578, HB Val1580
9−8.44HB Glu1150, HB Leu1151, HB Glu1168, HB Met1169, HB Cys1191, HB Asn1578, HB Val1580
10−10.51HB Phe1145, HB Gly1150, HB Leu1151, HB Glu1168, HB Met1169, HB Cys1191, HB Gly1223, HB Val1580
11−8.79HB Glu1168, HB Met1169, HB Cys1191, HB Asn1578
12−9.56HB Cys1148, HB Gly1149, HB Gly1150, HB Leu1151, HB Glu1168, HB Met1169, HB Asp1190, HB Cys1191, HB Val1580
13−9.73HB Cys1148, HB Gly1149, HB Gly1150, HB Leu1151, HB Glu1168, HB Met1169, HB Gln1227, HB Val1580
14−8.63HB Glu1168, HB Met1169, HB Glu1189, HB Cys1191, HB Asn1578
15−8.96HB Met1169, HB Asp1190, HB Cys1191, HB Glu1266, HB Asn1578
16−9.80HB Gly1150, HB Leu1151, HB Glu1168, HB Gly1223, HB Asn1578, HB Val1580
17−8.40HB Phe1145, HB Ile1167, HB Met1169, HB Cys1191, HB Asn1578
18−8.37HB Phe1145, HB Gly1150, HB Leu1151, HB Glu1168, HB Cys1191
19−8.58HB Phe1145, HB Gly1150, HB Leu1151, HB Gly1223, HB Arg1310, HB Arg1312, HB Asn1578, HB Val1580
20−8.91HB Phe1145, HB Glu1168, HB Cys1191, HB Asn1578
21−8.79HB Glu1168, HB Met1169, HB Cys1191, HB Asn1578
1 pi-pi: p-Interaction, 2 HB: Hydrogen Bond.
Table 2. Statistically significant molecular descriptors differentiating hDNMT1 active and not-active compounds, as identified by the Mann–Whitney U test (p < 0.05). Descriptor values were computed using RDKit software and reflect physicochemical, topological, and fragment-based properties relevant to hDNMT1 inhibitory activity.
Table 2. Statistically significant molecular descriptors differentiating hDNMT1 active and not-active compounds, as identified by the Mann–Whitney U test (p < 0.05). Descriptor values were computed using RDKit software and reflect physicochemical, topological, and fragment-based properties relevant to hDNMT1 inhibitory activity.
DNMT1_Activity_with_Descriptors
Num.Featurep_Stat < 0.05Num.Featurep_Stat < 0.05
1MinEStateIndex0.000220fr_furan0.0166
2BCUT2D_MRHI0.000221fr_C_O0.0203
3fr_bicyclic0.001422VSA_EState80.0226
4MaxEStateIndex0.001923VSA_EState50.0242
5BCUT2D_MWHI0.003024Chi3v0.0247
6VSA_EState70.003325Chi2v0.0263
7VSA_EState100.003326HeavyAtomMolWt0.0265
8SMR_VSA10.003827NumAliphaticHeterocycles0.0277
9EState_VSA10.004628Chi4v0.0310
10EState_VSA60.005629EState_VSA80.0329
11NumSaturatedRings0.006730NumAliphaticRings0.0331
12BCUT2D_CHGHI0.008531AvgIpc0.0349
13PEOE_VSA90.009832SMR_VSA100.0362
14NumSaturatedHeterocycles0.011033NumAromaticCarbocycles0.0371
15FpDensityMorgan10.011634fr_benzene0.0371
16FpDensityMorgan20.013035MolWt0.0425
17EState_VSA100.015136ExactMolWt0.0432
18SPS0.015237EState_VSA50.0438
19SlogP_VSA120.0162
Table 3. Descriptors with both statistically significant differences (p < 0.05) between hDNMT1 activity classes and high model-based importance as ranked by XGBoost feature gain. These features were selected for final model deployment.
Table 3. Descriptors with both statistically significant differences (p < 0.05) between hDNMT1 activity classes and high model-based importance as ranked by XGBoost feature gain. These features were selected for final model deployment.
Featurep_Stat < 0.05Model.Importances
(model: XGBoost Classifier)
1MinEStateIndex0.0001770.12
2MaxEStateIndex0.0019240.09
3NumSaturatedHeterocycles0.0110360.07
4SPS0.0152360.09
5fr_furan0.0166310.4
6fr_C_O0.0202960.09
7NumAromaticCarbocycles0.0371090.08
8MolWt0.0424990.07
Table 4. Predicted hDNMT1 activity of 21 compounds across 50 epochs, with scaffold group assignments.
Table 4. Predicted hDNMT1 activity of 21 compounds across 50 epochs, with scaffold group assignments.
CompoundsConfidence Active (%)Active in 50 EpochsPredicted ActivityStructural Group
Compound 610050ActivePhenolic derivative
Compound 1510050ActiveFlavonoid-like/Polyphenolic
Compound 1310050ActiveFlavonoid-like/Polyphenolic
Compound 1210050ActiveFlavonoid-like/Polyphenolic
Compound 1610050ActivePhenolic derivative
Compound 1010050ActivePhenolic derivative
Compound 810050ActiveFlavonoid-like
Compound 1910050ActiveFlavonoid-like
Compound 710050ActiveHeterocyclic phenol
Compound 510050ActivePhenolic derivative
Compound 410050ActiveHeterocyclic phenol
Compound 210050ActiveHeterocyclic phenol
Compound 19849ActivePhenolic derivative
Compound 189849ActiveFlavonoid-like
Compound 209648ActiveHeterocyclic phenol
Compound 149648ActiveHeterocyclic phenol
Compound 39447ActivePhenolic derivative
Compound 178442ActivePhenolic derivative
Compound 92412Not ActiveSmall rigid heterocycle
Compound 2142Not ActiveSmall rigid heterocycle
Table 5. Retrospective validation of the XGBoost classifier using five compounds previously confirmed as hDNMT1 inhibitors in vitro. All compounds were predicted as active with 100% consensus across 50 epochs. The set includes structurally diverse scaffolds, ranging from polyphenolic glycosides (bergenin, orientin, phlorizin) to complex phenyl- and phenazinyl-derived acid analogues, underscoring the model’s ability to generalize across heterogeneous chemical classes.
Table 5. Retrospective validation of the XGBoost classifier using five compounds previously confirmed as hDNMT1 inhibitors in vitro. All compounds were predicted as active with 100% consensus across 50 epochs. The set includes structurally diverse scaffolds, ranging from polyphenolic glycosides (bergenin, orientin, phlorizin) to complex phenyl- and phenazinyl-derived acid analogues, underscoring the model’s ability to generalize across heterogeneous chemical classes.
Compound NameConfidence Active (%)Active in 50 EpochsPredicted ActivityStructural GroupExperimental Validation
Reference
Bergenin10050ActiveC-glycoside/Polyphenolic[27]
Orientin10050ActiveFlavonoid (C-glycosylated)[27]
Phlorizin10050ActiveFlavonoid glycoside[27]
2-(3-(3,4-dimethoxyphenyl)-3-(2-((2-oxo-2H-chromen-7-yl)oxy)acetamido)
propanamido)acetic acid
10050ActiveAromatic coumarin–phenyl acetic acid derivative[27]
2-[(7,8-dihydroxy-6-undecylphenazin-2-yl)formamido]
pentanedioic acid
10050ActivePhenazinyl pentanedioic acid derivative[27]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Christodoulou, P.; Chytiri, E.; Zervou, M.; Manushin, I.; Kolvatzis, C.; Sinanoglou, V.J.; Cavouras, D.; Kritsi, E. Data-Driven and Structure-Based Modelling for the Discovery of Human DNMT1 Inhibitors: A Pathway to Structure–Activity Relationships. Appl. Sci. 2025, 15, 11984. https://doi.org/10.3390/app152211984

AMA Style

Christodoulou P, Chytiri E, Zervou M, Manushin I, Kolvatzis C, Sinanoglou VJ, Cavouras D, Kritsi E. Data-Driven and Structure-Based Modelling for the Discovery of Human DNMT1 Inhibitors: A Pathway to Structure–Activity Relationships. Applied Sciences. 2025; 15(22):11984. https://doi.org/10.3390/app152211984

Chicago/Turabian Style

Christodoulou, Paris, Ellie Chytiri, Maria Zervou, Igor Manushin, Charalampos Kolvatzis, Vassilia J. Sinanoglou, Dionisis Cavouras, and Eftichia Kritsi. 2025. "Data-Driven and Structure-Based Modelling for the Discovery of Human DNMT1 Inhibitors: A Pathway to Structure–Activity Relationships" Applied Sciences 15, no. 22: 11984. https://doi.org/10.3390/app152211984

APA Style

Christodoulou, P., Chytiri, E., Zervou, M., Manushin, I., Kolvatzis, C., Sinanoglou, V. J., Cavouras, D., & Kritsi, E. (2025). Data-Driven and Structure-Based Modelling for the Discovery of Human DNMT1 Inhibitors: A Pathway to Structure–Activity Relationships. Applied Sciences, 15(22), 11984. https://doi.org/10.3390/app152211984

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop