Next Article in Journal
Overcoming Template Surface Blocking: Geraniol Adsorption Studies Guiding MIP-Based Sensor Design
Previous Article in Journal
Age-Related Features of Neuroinflammation: Hidden Association of Neuronal Damage with Activation of Natural Killers in Patients with Ischemic Stroke
Previous Article in Special Issue
Virtual Screening of Cathelicidin-Derived Anticancer Peptides and Validation of Their Production in the Probiotic Limosilactobacillus fermentum KUB-D18 Using Genome-Scale Metabolic Modeling and Experimental Approaches
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Computational Phenotypic Drug Discovery for Anticancer Chemotherapy: PTML Modeling of Multi-Cell Inhibitors of Colorectal Cancer Cell Lines

by
Alejandro Speck-Planche
* and
M. Natália D. S. Cordeiro
LAQV/REQUIMTE, Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2025, 26(23), 11453; https://doi.org/10.3390/ijms262311453
Submission received: 3 November 2025 / Revised: 24 November 2025 / Accepted: 25 November 2025 / Published: 26 November 2025
(This article belongs to the Special Issue In Silico Approaches to Drug Design and Discovery)

Abstract

Colorectal cancer is one of the most dangerous neoplastic diseases in terms of both mortality and incidence. Thus, anti-colorectal cancer agents are urgently needed. Computational approaches have great potential to accelerate the phenotypic discovery of versatile anticancer agents. Here, by combining perturbation-theory machine learning (PTML) modeling with the fragment-based topological design (FBTD) approach, we provide key computational evidence on the computer-aided de novo design and prediction of new molecules virtually exhibiting multi-cell inhibitory activity against different colorectal cancer cell lines. The PTML model created in this study achieved sensitivity and specificity values exceeding 80% in training and test sets. The FBTD approach was employed to physicochemically and structurally interpret the PTML model. These interpretations enabled the rational design of six new drug-like molecules, which were predicted as active against multiple colorectal cancer cell lines by both our PTML model and a CLC-Pred 2.0 webserver, with the latter being a well-established virtual screening tool for early anticancer discovery. This work confirms the potential of the joint use of PTML and FBTD as a unified computational methodology for early phenotypic anticancer drug discovery.

1. Introduction

Colorectal cancer (CRC) stands as a significant global health challenge, ranking among the most prevalent neoplastic diseases and being a leading cause of cancer-related mortality worldwide [1]. In developed countries such as the US, more than 150,000 new CRC cases and over 50,000 deaths are expected to occur this year, 2025 [2]. It is widely recognized that CRC involves an intricate combination of genetic mutations, epigenetic alterations, and environmental factors, leading to uncontrolled proliferation of colonic epithelial cells and metastasis [3]. Despite advances in screening and therapeutic solutions, including chemotherapy, immunotherapy, and targeted therapies, CRC prognosis remains poor in advanced stages due to tumor heterogeneity and drug resistance [4].
These challenges indicate the urgent need for the discovery of new anticancer agents capable of targeting CRC proliferation.
Although experimental screening remains the gold standard in early anticancer drug discovery for evaluating the efficacy of anticancer compounds at both target-based and phenotypic levels [5], these approaches are resource-intensive and time-consuming. In this context, in silico approaches, widely regarded as powerful tools in modern drug discovery campaigns, have played an important role in rationalizing the discovery of anti-CRC agents. In silico approaches such as pharmacophore modeling [6], network-based analysis [7,8], molecular docking and molecular dynamics simulations [6,7,8,9,10,11,12,13], quantum mechanical calculations [10,13], and machine learning [7,13] have proven highly useful. However, several limitations persist, including (a) the reliance on small chemical datasets (limiting coverage of chemical space), (b) prediction of activity against only one cancer-related target (e.g., protein, or cancer cell line), which reduces the potential pharmacological applicability of the modeled/predicted chemicals, and (c) insufficient interpretability of physicochemical properties and structural features (hindering the rational design of multi-target ligands or multi-cell inhibitors with anticancer versatility.
Advanced models based on perturbation-theory machine learning (PTML) have demonstrated the potential to overcome these limitations [14,15]. In this sense, PTML models have been successfully applied in diverse pharmacology-related areas, including antimicrobial research [16,17,18,19,20,21,22], neurodegenerative diseases [23,24,25,26,27], immunology [28,29], nanomedicine [30,31,32,33,34,35], and antineoplastic drug discovery [36,37,38]. Moreover, PTML models can be coupled with the fragment-based topological design (FBTD) approach to enable the de novo design of molecules predicted to possess specific bioactivity profiles [16,36,39].
To date, there are no reports employing PTML modeling for the rational in silico design of CRC-targeted agents capable of simultaneously considering phenotypic aspects affecting cancer proliferation (e.g., cell doubling times and microsatellite characteristics). In this study, we provide key computational evidence that demonstrates that a computational framework, combining a PTML model based on a multilayer perceptron network (PTML-MLP) with the FBTD, can facilitate both the prediction and the de novo design of drug-like molecules with multi-cell inhibitory activity against different CRC cell lines. The study is fully computational, and the predictions and designed molecules generated by the PTML–FBTD computational framework are intended to guide future experimental validation rather than constitute biological confirmation.

2. Results and Discussion

2.1. PTML Model: Performance

The PTML-MLP model presented in this study used as inputs the so-called multi-label topological indices D[GTI]ej, which fuse information on the chemical structure of molecules and a combination of experimental aspects (ej) associated with the CRC cell lines. In this context, each ej combines the doubling time of each CRC cell line (dt) [40], the specific type of CRC cell line (ct) [41], and the microsatellite characteristics (mc) of each CRC cell line [41,42]. The best PTML-MLP model has the notation MLP 20-50-2. This means that there were 20 input nodes (D[GTI]ej indices), 50 hidden neurons, and two output nodes; hyperbolic tangent was used as the activation function in both hidden and output layers. The aforementioned PTML-MLP model notation provides information about its topology, which, together with the number of training cases employed (T = 4108), led to the value of the parameter ρ = 3.57 (the formula for the calculation of ρ appears discussed in the Materials and Methods section). This ρ value indicates that the PTML-MLP model is not overfitting the data [43]. A summary of the different D[GTI]ej is provided in Table 1.
To analyze the performance of the PTML-MLP model, we used both global and local statistical metrics. We used sensitivity (Sn), specificity (Sp), and the normalized Matthew’s correlation coefficient (nMCC) [44] as the global metrics. From Table 2, one can see that a value of Sn = 89% was achieved in the training set, while in the test set, Sn > 83%. For Sp, values higher than 89% and 84% are reported for training and test sets, respectively.
All this means that the PTML-MLP model can correctly classify or predict chemicals as either active (CCActive) or inactive (CCInactive), relative to the total number of chemicals labeled as active or inactive (NActive and NInactive, respectively). Furthermore, for the metric nMCC, the values are close to 1, which means that the correlation between the observed [ACRC(ej)] and the predicted [PACRC(ej)] categorical values of anti-colorectal cancer activity is very strong. For the dataset used in the present study, chemical and biological data can be found in Supplementary Materials S1. Furthermore, details on the prediction outcome for each chemical in the dataset used to develop and validate the PTML-MLP model are available in Supplementary Materials S2. It is important to note that chemicals/cases annotated as active have GI50 ≤ 1900 nM (GI50 is the concentration causing 50% growth inhibition), while the remaining cases are annotated as inactive.
We also employed local statistical metrics known as local sensitivities [Sn(dt), Sn(ct), and Sn(mc)] and specificities [Sp(dt), Sp(ct), and Sp(mc)] as they allow the assessment of the PTML-MLP model’s performance by considering the elements/aspects of ej. Regarding the biological aspect dt, both Sn(dt) and Sp(dt) presented values in the interval 87–91.1% in the training set; in the test set, for these local metrics, the achieved interval was 80–86.3%. For the element ct, Sn(ct) and Sp(ct) displayed values in the range 84–93.2% in the training set, while also exhibiting a range of 79–87.1% in the test set. For the biological aspect mc, the values of the local metrics were Sn(mc) > 88% and Sp(mc) > 87% in the training set, while for the test set, Sn(mc) > 82% and Sp(mc) > 83% were obtained. Altogether, both global and local statistical metrics demonstrate that the PTML-MLP can accurately predict the anti-CRC activity. The PTML-MLP model does so by simultaneously and explicitly considering the growth/proliferation of the CRC cell lines (doubling times—dt), the specific types of CRC cell lines (ct), and the microsatellite characteristics (mc). Specific values of the local metrics associated with each label can be found in Supplementary Materials S2.
At a deeper chemical level, Figure 1 illustrates that the PTML-MLP model can correctly predict/classify the anti-CRC activity of well-established anticancer drugs.
The anticancer drugs (both investigational and approved by the Food and Drug Administration—FDA) with in vitro inhibitory activity against one or more CRC cell lines, which were correctly predicted by the PTML-MLP model are those annotated as ChEMBL53463 (doxorubicin), ChEMBL185 (fluorouracil), ChEMBL126159 (LMP744), ChEMBL44657 (etoposide), ChEMBL917 (floxuridine), ChEMBL84 (topotecan), and ChEMBL941 (imatinib). This indicates that our PTML-MLP model can identify chemical patterns associated with drugs with well-established anti-CRC activity at the in vitro level. Our PTML-MLP model also excels in identifying/predicting new chemical patterns (Figure 2), which are different from those present in anticancer drugs.
For simplicity, Figure 2 illustrates a non-exhaustive list of molecules that can be found in the dataset of the present study (Supplementary Materials S1 and S2). This means that the molecules in the aforementioned figure have been experimentally tested; in particular, they have been assayed against all the combinations of ej depicted in Table 3, being reported as multi-cell inhibitors of all seven CRC cell lines. Despite belonging to diverse chemical scaffolds, our PTML-MLP model could correctly predict the multi-cell anti-CRC activity of all the molecules in Figure 2 (and many others reported in Supplementary Materials S1 and S2). These results demonstrate that our PTML-MLP model can serve as a computational tool for predicting or discovering novel chemicals (in virtual screening scenarios) with versatile anti-CRC activity.
Finally, we determined the applicability domain (AD) of the PTML-MLP model using a variation in the bounding box (descriptor space) as reported in recent studies [16,36,45]. In doing so, for each of the 20 D[GTI]ej indices present in the PTML-MLP model, we calculated a local score denoted as LSAD_D[GTI]ej. For a query chemical and a given D[GTI]ej index, LSAD_D[GTI]ej was a categorical variable, which took the value of 1 if the D[GTI]ej value of that chemical fell between the minimum and maximum D[GTI]ej values (with the minimum and maximum values being calculated from chemicals in the training set that were correctly classified by the PTML-MLP model); otherwise, the LSAD_D[GTI]ej took the value of zero. This operation was repeated for all the chemicals in the dataset and the 20 D[GTI]ej indices in the PTML-MLP model. As a result, is chemical was associated with LSAD_D[GTI]ej values. Next, the total applicability domain score (TSAD) was calculated for each chemical. If TSAD = 20, the chemical was considered inside the AD; otherwise, it was deemed outside the AD and regarded as an unreliable prediction. In the end, 5475 out of 5478 chemicals/cases were inside the AD of the PTML-MLP model (Supplementary Materials S2).

2.2. Physicochemical Interpretation of the PTML-MLP Through the FBTD Approach

The FBTD approach is suitable to perform physicochemical and structural interpretation of non-linear machine learning models, and, in particular, those based on the PTML philosophy [16,36,45]. In this sense, in the context of PTML modeling, and applied to the present study, FBTD comprises three steps, namely the estimation of the relative influence of the D[GTI]ej indices in our PTML-MLP model, the tendency of variation in the D[GTI]ej indices, and the gathering of information regarding different subgraphs and their corresponding molecular fragments, which may be responsible for the enhancement of the anti-CRC activity against the different CRC cell lines.
For the first step, we computed the sensitivity values (SVs) of the D[GTI]ej indices, which were used as inputs to build the PTML-MLP model (Figure 3). By definition, SVs are quantitative measures of the importance/discriminative power of inputs in a machine learning model [46].
Therefore, the D[GTI]ej indices with the highest SVs are not only the ones with the greatest influence/discriminatory power in the PTML-MLP model; they are also the ones containing the most important structural features and physicochemical properties that are desirable for enhancing the multi-cell anti-CRC activity. The second step involves assessing the tendency of variation in the D[GTI]ej indices (Table 4), meaning that the core idea is to determine whether the value of each D[GTI]ej index should increase or decrease to enhance the multi-cell anti-CRC activity.
Notice that Table 4 depicts two sets of averages, one associated with chemicals labeled and correctly classified as active, and the other reflecting the chemicals annotated and correctly classified as inactive. These two aforementioned averages are calculated from the training set. Now let us take, for example, the D[GTI]ej index DGT01; because the active-based average is smaller than the inactive-based average, this means that to increase the multi-cell CRC activity, the value of DGT01 is likely to decrease. When applying the same reasoning to the D[GTI]ej index DGT07, the active-based average is larger than the inactive-based average. This indicates that, to enhance the multi-cell CRC activity, the value of DGT07 is likely to increase.
The third step is related to the structural aspects associated with the information contained within the D[GTI]ej indices. In this sense, we provide subgraphs (Figure 4), i.e., generic structural representations from which molecular fragments (e.g., polar functional groups, aromatic portions, aliphatic chains and rings, and ramifications) can be easily analyzed. Thus, we will now proceed with the physicochemical and structural interpretations of the D[GTI]ej indices in the PTML-MLP model.
There are 10 D[GTI]ej indices (from DGT01 to DGT03, DGT05, from DGT07 to DGT10, DGT16, and DGT17) derived from the topological indices known as bond-based spectral moments. By definition, the bond-based spectral moments encode different physicochemical properties and can be expressed as a linear combination of the number of times in which fragments of different sizes appear in a molecule [47,48,49,50,51,52]. They can also describe 3D parameters such as dihedral angles [53]. This information remains in the D[GTI]ej indices derived from the topological indices known as bond-based spectral moments. The first of the D[GTI]ej indices, DGT01, indicates the diminution of the polarity-based property known as dipole moment, particularly in fragments containing the subgraphs SG-03 (favorable for isopropyl and tert-butyl groups, and to a lesser degree, a fluorine or chlorine attached to an aromatic carbon), SG-04 (e.g., presence of tert-butyl or tert-butoxy), and SG-06 (cyclopropane moiety); DGT01 is the thirteenth most important D[GTI]ej index in the PTML-MLP model. Following with polarity-based properties, we have DGT02, DGT05, and DGT08; they rank fifteenth, twelfth, and fourteenth, respectively. These D[GTI]ej indices describe the decrease in the polar surface area; while DGT02 and DGT05 take into account the global polar surface area, i.e., SG-01 subgraphs (with DGT02 being size-independent), DGT08 focuses on the same SGs as DGT01 plus the SG-07 subgraphs (cyclobutene preferred over its heteroatoms-containing counterparts), those prioritizing the same molecular fragments/functional groups. Notice that this does not mean that polar groups cannot be present; the key is that their presence should be reduced as much as possible, and therefore, if present, only one highly polar group (e.g., urea, amide, etc.) should be allowed.
At the same time, polarizability is also a very important physicochemical property. In this sense, the three D[GTI]ej indices derived from bond-based spectral moments, namely DGT03, DGT09, and DGT17 (ranking nineteenth, sixth, and ninth, respectively, among the most important D[GTI]ej indices in the PTML-MLP model), describe the diminution of the aforementioned property to increase the multi-cell anti-CRC activity. More specifically, DGT03 describes fragments involving the SG-03, SG-04, SG-06, and SG-07 subgraphs (four-membered rings), while DGT09 focuses only on SG-03 and SG-04; DGT17 is a measure of the global polarizability of a molecule (SG-01 subgraphs). Altogether, they indicate that the functional groups tert-butyl, trifluoromethyl, and tert-butoxy, trifluoromethoxy are very suitable, particularly when attached to rings. At the same time, high-polarizability atoms such as halogens other than fluorine, sulfur, and phosphorus should be avoided; if present in a molecule, only one of these atoms is allowed. Because aromatic rings are very important in most chemical structures with biological activity, their relatively high polarizability is detrimental (they contribute to the unfavorable increase in the global polarizability), and, to manage that, aliphatic portions (including aliphatic rings) should be introduced to separate any two aromatic systems. Oxygen-containing functional groups (in particular, methoxy, hydroxyl, and amide) are also very suitable for decreasing the polarizability.
Furthermore, hydrophobicity, atomic weight, and bond distance influence the multi-cell anti-CRC activity. On one side, the increase in the global hydrophobicity is characterized by DGT07 (SG-01 subgraphs), which prioritizes the presence of trihalomethyl groups, aromatic carbons (except those to which nitrogen or oxygen atoms are attached), tertiary amines, pyrrolic nitrogen atoms, as well as thiol and thioether groups and halogens (mainly Cl, Br, I) attached to rings. On the other hand, DGT10 involves the increase in the atomic weight in SG-03, SG-04, and SG-06 subgraphs, thus prioritizing the presence of trifluoromethyl and trifluoromethoxy, as well as the presence of Cl, Br, I, and S. In the case of DGT16, this is a D[GTI]ej index with steric effect implications because indicates the need to increase the bond distance (SG-01 subgraphs), making a molecule bigger; this can be achieved by introducing saturated aliphatic portions as well as Cl, Br, I, and S. It is important to highlight that DGT07, DGT10, and DGT16, are the tenth, twentieth, and eleventh most important D[GTI]ej indices in the PTML-MLP model.
In the PTML-MLP model reported in the present study, we have two D[GTI]ej indices derived from the topological descriptors known as atom-based connectivity indices [54,55,56,57,58,59,60]. These are measures of molecular accessibility, that is, the ability of different molecular regions/fragments to participate in both polar and non-polar interactions with their surrounding environment (solvent molecules, amino acids in the pocket of a protein, molecules present in different locations of a cell, etc.). The first of these D[GTI]ej indices is DGT11 (ranked as the fifth most influential), and the favorable diminution of its value is equivalent to increasing the number of heteroatoms in six-membered rings (SG-09), where aromatic rings are preferred over their aliphatic counterparts. The other D[GTI]ej index, DGT18 (ranked sixteenth), involves the decrease in the molecular accessibility in linear fragments formed by six bonds (SG-10 subgraphs—without counting bond order). The favorable diminution in the value of DGT18 indicates an augmentation in the number of heteroatoms present in both aliphatic portions and aromatic systems, as well as the increase in the number of ramifications and polysubstituted rings (including fused ring systems).
The remaining eight D[GTI]ej indices are derived from the topological descriptors named bond-based connectivity indices, which are direct measures of fragment-based contributions to the molecular volume [61,62,63,64,65]. In this context, DGT04 (ranked eighteenth) characterizes the diminution of the molecular volume in regions containing two-bond fragments (SG-02 subgraphs), indicating the need for ramifications in the central part of a molecule as well as polysubstituted and fused ring systems. The D[GTI]ej index DGT06 is the second most important in the PTML-MLP model and indicates the diminution of the number of fragments based on the SG-06 subgraphs; thus, having one group containing this fragment (such as tert-butyl, trifluoromethyl, and trifluoromethoxy) attached to a ring is allowed. On the other hand, DGT12 and DGT19 characterize the increase in the molecular volume of a molecule (i.e., SG-01 subgraphs, with DGT19 being size-independent). This means that most ramifications should appear in the periphery of a molecule; DGT12 and DGT19 rank third and eighth among the most important D[GTI]ej indices in the PTML-MLP model.
At the same time, DGT13 is the seventh most important D[GTI]ej index and describes the increase in the number of molecular fragments containing the SG-05 subgraphs. Examples of structural moieties containing these subgraphs are N,N-disubstituted amides, isopropyl, N,N-dimethylamino, tert-butyl, and trifluoromethyl groups, as well as fused ring systems. On the other hand, DGT14 (ranked fourth) implies the diminution of the number of five-membered rings (SG-08 subgraphs); if present, no more than two five-membered rings are allowed, and they should have substitutions in two or more positions.
The most important D[GTI]ej index in the PTML-MLP model is DGT15, which characterizes the diminution of the molecular volume in six-membered rings (SG-09), indicating the need for the presence of polysubstituted rings. Finally, DGT20 involves the increase in the molecular volume in six-bond fragments (SG-10 subgraphs); if ramifications are present, they should appear in the peripheral regions of a molecule.

2.3. Designing Novel Molecules Virtually Exhibiting Multi-Cell Anti-CRC Activity

Using this joint interpretation, we designed six structurally related molecules (Figure 5) but with key chemical modifications to examine how the structural difference among molecules leads to marked outcomes in their predicted multi-cell anti-CRC activity.
The joint interpretation of all the D[GTI]ej indices in the PTML-MLP model suggests that aliphatic portions are very important in the chemical structure of a molecule expected to exhibit multi-cell anti-CRC activity, particularly if they appear in the central part of a molecule (separating the aromatic portions from each other) or heterocyclic rings; but two large aliphatic portions may be detrimental. Heteroaromatic fused systems are also very important, specifically if they are in the periphery of a molecule. The same goes for substituents such as tert-butyl, trifluoromethyl, and trifluoromethoxy, which can be attached to aromatic rings. The presence of a single highly polar functional group is important, and such a group can be in both the central and the peripheral parts of a molecule. Furthermore, low-polarizability groups such as hydroxyl, methoxy are very suitable as substituents in rings; the same goes for heavy atoms such as sulfur and halogens other than fluorine (the number of these heavy atoms should be as low as possible). Such structural features jointly discussed here are the ones present in the chemical structures of the designed molecules. To offer accurate computational/theoretical evidence regarding the potential of the designed molecules to exhibit multi-cell anti-CRC activity, the designed molecules were predicted by two tools (Table 5).
The results from Table 5 suggest that the six designed molecules exhibit predicted multi-cell anti-CRC activity because they were predicted by the PTML-MLP model as active (ProbAct > 50%) against at least 4 out of 7 combinations of ej (one per each CRC cell line), thus virtually exhibiting GI50 ≤ 1900 nM (the activity cutoff used when developing the PTML-MLP model). Particularly, the designed molecules ASP-COLRC-01 and ASP-COLRC-02 were predicted as active in 4 out of 7 combinations of ej, while ASP-COLRC-03 was predicted as active in 5 of the 7 aforementioned experimental aspects; the designed molecules ASP-COLRC-04, ASP-COLRC-05, and ASP-COLRC-06 were predicted to exhibit multi-cell anti-CRC activity by considering the seven combinations of ej. More details regarding the six designed molecules can be found in Supplementary Materials S3.
From a chemical point of view, there are marked structural differences among the designed molecules, which ultimately led to the differences in both ProbAct values (expressed in percentage) and the number of combinations of ej against which these molecules were predicted. In the case of the molecules ASP-COLRC-01 and ASP-COLRC-02, they were predicted against a smaller number of CRC cell lines because they present in their structure an aliphatic portion/region that is too large (the tertiary amine moiety plus the 4-methylpiperazine-1-carbonyl fragment). Particularly, this chemical modification unfavorably increases the value of the D[GTI]ej indices DGT11 and DGT18, being detrimental to the multi-cell anti-CRC activity. Notice that when the 4-methylpiperazine moiety is replaced by the 3-methoxyphenyl fragment, the multi-cell anti-CRC activity increases in molecules from ASP-COLRC-03 to ASP-COLRC-06; i.e., they are predicted against a higher number of CRC cell lines. Yet, as mentioned before, ASP-COLRC-03 is predicted as a multi-cell inhibitor with anti-CRC activity only in 5 out of the 7 combinations of ej, while ASP-COLRC-04, ASP-COLRC-05, and ASP-COLRC-06 are predicted against all seven ej. The difference between ASP-COLRC-03 and ASP-COLRC-04 is that in the latter, the trifluoromethoxy group has been replaced by the trifluoromethyl group, with the subsequent favorable increase in the value of the D[GTI]ej index DGT13. For the case of ASP-COLRC-05 and ASP-COLRC-06, the key is a second methoxy group introduced in the carbon adjacent to the one to which the hydroxyl group is attached; such a chemical modification favorably increases the value of the D[GTI]ej index DGT13. It is important to emphasize that DGT13 is also the main responsible for the fact that the ProbAct values for ASP-COLRC-04 and ASP-COLRC-06 are higher than those related to ASP-COLRC-05 because the trifluoromethyl group is more suitable than the trifluoromethoxy.
The other computational tool, whose predictions are depicted in Table 5, can predict anticancer (GI50) against 391 cancer cell lines based on structural information from more than 125,000 chemicals retrieved from ChEMBL and PubChem databases [66,67]. This state-of-the-art computational tool is the web server known as CLC-Pred 2.0 [68]. It is important to emphasize that our PTML-MLP model and CLC-Pred 2.0 are machine learning tools that were created by using different approaches and predict different outcomes. For instance, the PTML-MLP model employs an integrative machine learning approach that, through the use of the D[GTI]ej indices (these are multi-label graph-based indices) as inputs and an MLP network, enables the simultaneous prediction of the anti-CRC activity of any molecule/chemical against seven different CRC cell lines. In contrast, CLC-Pred 2.0 uses as inputs the chemical similarity (fragment-based) descriptors known as the multilevel neighborhood of atoms, with the modeling algorithm being the Naive Bayes classifier [68]; one model has been created to predict activity against each of the seven CRC cell lines. Furthermore, Table 5 illustrated another key difference: the outcome of the prediction; the PTML-MLP model predicts the probability ProbAct for a molecule to be active while CLC-Pred 2.0 uses the definitions of probabilities to be active and inactive (Pa and Pi, respectively) based on the similarity of the predicted compound when compared with chemicals in the training set by using the activity cutoff of GI50 ≤ 100 nM; therefore, any molecule with Pa > Pi is labeled as active (with predicted activity GI50 ≤ 100 nM), and thus, may be considered for future experimental validation.
Because of the difference in approaches used to build the PTML-MLP model and CLC-Pred 2.0, the predictions of multi-cell anti-CRC activity of these two tools should not be expected to fully converge. Nevertheless, because the PTML-MLP model and CLC-Pred 2.0 are classification models based on the same activity endpoint (GI50) and assay protocol (sulforhodamine B—SRB, with a time assay of 48 h), a certain agreement in the predictions performed by both tools should be expected in the sense that both tools should be able to predict that the designed molecules exhibit multi-cell anti-CRC activity (against at least 4 out of 7 combinations of ej—one per each CRC cell line).
Results from Table 5 show that, when analyzing CLC-Pred 2.0, the six designed molecules were all predicted as active by considering the experimental aspect ej05, i.e., against the CRC cell line named KM12 (see Table 3). Particularly, the six designed molecules Pa > Pi for these CRC cell lines, which means that they were predicted by CLC-Pred 2.0 to exhibit GI50 ≤ 100 nM. We would like to highlight that the cutoff used by CLC-Pred 2.0 (GI50 ≤ 100 nM) is remarkably more rigorous than the one employed to create our PTML-MLP model (GI50 ≤ 1900 nM). It is important to highlight that for certain combinations of ej (specific CRC cell lines), CLC-Pred 2.0 yielded no prediction results, indicating that there was no chemical similarity information in the machine learning models from CLC-Pred 2.0 to assess the anti-CRC activity of the molecules through the Pa and Pi values. Among the designed molecules, the best predicted is ASP-COLRC-05, while the worst predicted is ASP-COLRC-02; except the latter, all the other molecules were predicted to exhibit multi-cell anti-CRC activity, i.e., a predicted activity value of GI50 ≤ 100 nM against at least 4 out of 7 combinations of ej. Altogether, the predictions performed by our PTML-MLP model and CLC-Pred 2.0 suggest that the designed molecules may behave as versatile and potent anti-CRC agents.
To assess the chemical novelty of the six designed molecules, we examined prestigious online chemical databases such as ChEMBL [66,69], SureChEMBL [70], eMolecules [71], and ZINC [72,73,74,75]. The purpose here was to check if any of our six designed molecules resembled any of the molecules present in these databases. Thus, for each of the six designed molecules, we performed a similarity search in each of the aforementioned databases, using Tanimoto’s coefficient (Tc). By using the accepted chemical similarity cutoff of Tc > 0.85 [76], we found that there are no molecules in those chemical databases whose structures resemble the ones of our six designed molecules. This demonstrates that our designed molecules have chemical novelty, and, at the same time, pharmacological novelty because no chemical similar to our six designed molecules has been reported to exhibit multi-cell anti-CRC activity.

2.4. Druglikeness of the Designed Molecules

Estimating the druglikeness is very important because it can help with the prioritization of those molecules that are more likely to succeed in drug discovery campaigns. One well-established approach to assess the druglikeness is the compliance with certain druglikeness-related rules, such as Lipinski’s rule of five [77] and the Veber guidelines [78]. According to Lipinski’s rule of five, key physicochemical properties are: number of hydrogen bond acceptors and donors (HBA and HBD, respectively), molecular weight (MW), and the logarithm of the octanol-water partition coefficient (logP). According to Lipinski’s rule of five, for a molecule to exhibit drug-like properties, HBA ≤ 10, HBO ≤ 5, MW < 500 Da, and logP ≤ 5. Regarding the Veber guidelines, which criticize the cutoff MW < 500 Da, only the number of rotatable bonds (NRB) and the polar surface area (PSA) are considered; in this sense, a molecule exhibits druglikeness if NRB ≤ 10 and PSA < 140 Å2. We employed the AlvaDesc software (version 1.0.22) [79] to calculate all these physicochemical properties (Table 6) to verify the compliance of the six designed molecules with the two aforementioned druglikeness-based rules.
The comparison of the values of the physicochemical properties of the six designed with the cutoff values of the same properties described by Lipinski’s rule of five and Veber guidelines allowed the estimation of the druglikeness of these molecules. Except for ASP-COLRC-05, all the other designed molecules comply with these two druglikeness-related rules. We would like to highlight that although ASP-COLRC-05 violates two aspects of Lipinski’s rule of five, it complies with Veber guidelines. Furthermore, given the current state of modern drug discovery, it is well-established that chemicals presenting two violations of druglikeness-related rules can still be approved as therapeutic drugs by the Food and Drug Administration, can [80]. Thus, altogether, the six designed molecules present druglike properties and can therefore be considered for future synthesis and experimental validation.

3. Materials and Methods

3.1. Data Retrieval and Curation

Chemical and biological data were retrieved (as a Microsoft Excel-compatible file) from the online repository known as the ChEMBL database [66,69,81,82]. In the extracted data, chemical information for each molecule/chemical was present in the form of a Simplified Molecular Input Line Entry System (SMILES) code; only molecules with molar mass (M) in the range 130–854 g/mol were considered in the present study. In the case of the biological information, this was present as labels of the combinations of ej mentioned and discussed in the previous section. Furthermore, the biological activity endpoint (GI50) was experimentally determined for each molecule via the SRB assay after 48 h of exposure of the molecules to the CRC cell lines. We eliminated noisy data such as entries that lacked SMILES (or had multi-component SMILES) codes, as well as those where units or values of activity were missing. If a molecule was experimentally tested more than once against the same CRC cell line, all the entries related to that molecule were deleted except the one exhibiting the lowest GI50 value. Our dataset contained 5478 cases. Molecules with GI50 ≤ 1900 nM were annotated as active, with the observed categorical variable of anti-CRC activity ACRC(ej) = 1; the other molecules were annotated as inactive, i.e., ACRC(ej) = –1. We would like to emphasize that the activity cutoff of GI50 ≤ 1900 nM was chosen because, on one side, it is well-known that chemicals with anticancer activity at least at the low-micromolar range are more likely to exhibit anticancer efficacy later on at the in vivo level [83,84]. This aspect guaranteed a more rigorous search for anti-CRC chemicals. On the other hand, the aforementioned cutoff prevented the excessive imbalance between the number of chemicals/cases annotated as active and the number of those labeled as inactive.

3.2. Calculation of the Descriptors

From the SMILES codes of the chemicals (stored in a *.txt file), the software known as MODELAB v1.5 was employed to calculate a series of molecular descriptors known as topological indices [85]. In this sense, three families of topological indices (TIs) were calculated. The first family was the weighted bond-based spectral moments [SM(PP)o], where PP was a physicochemical property (based on a bond measure or calculated from atomic contributions) such as bond standard distance (Std), bond dipole moment (Dip), hydrophobicity (Hyd), polar surface area (Psa), molar refractivity (Mol), Gasteiger-Marsili charges (Gas), and atomic weight (Ato). Furthermore, the letter “o” was the order (ranging from 1 to 7), i.e., the maximum number of bonds that a fragment/SG can have without considering bond order. The second and third families were the Kier-Hall valence connectivity indices [Xv(SG)m] and the bond-based connectivity indices [e(SG)m], respectively. In these two families, the subgraph SG is a generic fragment that can be further classified into path (P), cluster (C), path-cluster (PC), or a ring (Ch). Simultaneously, the order “m” (ranging from 1 to 6) is the exact number of bonds (without considering the bond order) present in a particular SG type. In addition to the aforementioned families of TIs, a new set of partially normalized TIs (NTIs) was obtained as the quotient of each TI and L, with the latter being the number of bonds in a molecule without considering the bond order.

3.3. Dataset Splitting and the Box–Jenkins Approach

We then split the 5478 chemicals/cases of the dataset into training and test sets by applying the following procedure. First, we sorted the chemicals according to their increasing GI50 values, and then, according to the CRC cells against which they were experimentally tested. Then, the first three chemicals/cases were assigned to the training set while the fourth one was assigned to the test set; we applied that assignment to the entire dataset. We would like to emphasize that the test set was never used for model construction, parameter optimization, threshold selection, or descriptor standardization. Therefore, the molecules/chemicals in the test were completely unknown to the PTML model during training, making the test set a true external hold-out set in the sense of standard machine-learning validation protocols.
Then, we used an adaptation of the Box–Jenkins approach, which allowed us to fuse numeric chemical (structural) information calculated through the different graph-theoretical indices (GTI) with the labels provided by the combinations of ej:
a v g [ G T I ] e j = 1 n e j × a = 1 n e j G T I a  
D G T I e j = G T I a v g G T I e j s d v [ G T I ] × p e j  
It is important to highlight that because ej contained the experimental aspects dt, ct, and mc, Equations (1) and (2) were applied to each of them separately. In the case of Equation (1), the symbol “GTI” refers to either any TI or NTI. Notice that the average value, avg[GTI]ej is calculated from GTI and n(ej), with the latter being the number of chemicals/cases in the training set annotated as active, which were experimentally tested by considering the same element of ej. Thus, if n(ej) = n(dt), then, avg[GTI]ej = avg[GTI]dt. This means that Equation (1) is applied to the element dt, and n(dt) is the active chemicals in the training set tested by considering the same doubling time. The same procedure is applied to the elements ct and mc. In Equation (2), sdv[GTI] is the standard deviation calculated from each GTI value (only for chemicals in the training set). The term p(ej) is the a priori probability of finding a chemical tested by considering a defined element of ej. In this sense, p(ej) is the quotient of n(ej) and the T (i.e., the number of training cases mentioned in the previous section). Notice that, as in the case of n(ej), the p(ej) values were calculated for dt, ct, and mc, separately.
It is important to highlight that in PTML modeling, the experimental conditions ej (dt, ct, mc) do not constitute the perturbations themselves. Rather, they define the condition-specific reference values, i.e., the avg[GTI]ej values calculated in Equation (1). A “perturbation” represents the deviation of an individual descriptor GTI from its expected average value, avg[GTI]ej, under that specific experimental condition, as quantified in Equation (2). Thus, the multi-label graph-theoretical index D[GTI]ej is a perturbation descriptor because it encodes the structural change in a molecule relative to the condition-dependent baseline (characterized by avg[GTI]ej).

3.4. Development of the PTML Model

We examined the potential discriminatory power of the D[GTI]ej indices. To do so, we employed the computer program known as IMMAN v1.0 [86], which allowed us to calculate for (each D[GTI]ej index) three different metrics from information theory: differential Shannon entropy [87], gain ratio [88], and symmetric uncertainty [89]. Following, IMMAN v1.0 also permitted the calculation of the geometric mean value (GMV) based on the aforementioned metrics. Thus, we ranked the D[GTI]ej indices according to their decreasing GMV; the D[GTI]ej indices with the largest GMV were the ones with the greatest discriminatory power. To reduce information redundancy among the D[GTI]ej indices, we employed the computer program named STATISTICA v13.5.0.17 [90], which allowed us to calculate pair-wise Pearson’s correlation coefficient (PCC) values; those D[GTI]ej indices that did not comply with the condition −0.7 < PCC < 0.7 were eliminated. Furthermore, given the size of the dataset used in the present study and our previous and vast experience working with PTML models [16,45,91], we arbitrarily concluded that for the development of the PTML-MLP model, we would use as inputs the top 20 non-redundant D[GTI]ej indices, i.e., the D[GTI]ej indices with the highest GMV, which complied with the condition −0.7 < PCC < 0.7.
To develop the PTML-MLP model, we used the artificial neural networks (ANN) package of STATISTICA v13.5.0.17, and the key hyperparameters were configured in the following manner. The number of input nodes I = 20 (the number of inputted D[GTI]ej indices). Exponential, logistic, and hyperbolic tangent activation functions were evaluated in both hidden and output layers. The minimum and maximum number of neurons in the hidden (H) layer were 20 and 60, respectively. The number of output nodes was O = 2, i.e., the number of categories (active and inactive) that had to be predicted by the PTML-MLP model. The ANN type of preference was the MLP due to the excellent results obtained by PTML models using this architecture [16,45,91]. As mentioned above, the parameter ρ was calculated as a measure of the capacity of the MLP network (PTML-MLP model) to overfit the data [43,92]:
ρ = T [ ( I + 1 ) H + ( H + 1 ) O ]  
Notice that Equation (3) reflects the topology of an MLP network; the parameters T, I, H, and O have already been mentioned and explained above.
We trained 1000 MLP networks, retaining 300 of them for further analysis. Among the 300 retained MLP networks, the best of them (PTML-MLP model) was the one exhibiting the highest values of global and local metrics (already explained in the Results and Discussion section) in both training and test sets.

4. Conclusions

Among the many neoplastic malignancies, CRC is of great concern due to its elevated mortality and variable degree during prognosis. At the phenotypic level, discovering novel chemicals with multi-cell anti-CRC activity is of paramount importance in the road to finding versatile, more efficacious therapeutics to tackle CRC. Our computational methodology, which combines our PTML-MLP model with the FBTD approach, proposes a chemistry-driven generation of druglike molecules virtually exhibiting multi-cell anti-CRC activity, which holds great promise for future experimental validation through organic synthesis and subsequent determination of the versatility of cytostatic activity against multiple CRC cell lines (multi-cell anti-CRC activity). The present study showcases the applicability of the PTML-FBTD in silico framework in the context of de novo molecular design, opening new horizons for early phenotypic anticancer discovery and beyond.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/ijms262311453/s1

Author Contributions

Conceptualization, A.S.-P. and M.N.D.S.C.; methodology, A.S.-P.; software, A.S.-P.; validation, A.S.-P.; formal analysis, A.S.-P.; investigation, A.S.-P. and M.N.D.S.C.; resources, A.S.-P. and M.N.D.S.C.; data curation, A.S.-P.; writing—original draft preparation, A.S.-P. and M.N.D.S.C.; writing—review and editing, A.S.-P.; visualization, A.S.-P.; supervision, A.S.-P.; project administration, A.S.-P.; funding acquisition, M.N.D.S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the PT national funds (FCT/MCTES, Fundação para a Ciência e Tecnologia, and Ministério da Ciência, Tecnologia e Ensino Superior) through the project UID/50006—Laboratório Associado para a Química Verde—Tecnologias e Processos Limpos.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are provided within the manuscript and Supplementary Information Files.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef]
  2. Siegel, R.L.; Kratzer, T.B.; Giaquinto, A.N.; Sung, H.; Jemal, A. Cancer statistics, 2025. CA Cancer J. Clin. 2025, 75, 10–45. [Google Scholar] [CrossRef]
  3. Haynes, J.; Manogaran, P. Mechanisms and Strategies to Overcome Drug Resistance in Colorectal Cancer. Int. J. Mol. Sci. 2025, 26, 1988. [Google Scholar] [CrossRef]
  4. Oh, J.M.; Kim, S.; Tsung, C.; Kent, E.; Jain, A.; Ruff, S.M.; Zhang, H. Comprehensive review of the resistance mechanisms of colorectal cancer classified by therapy type. Front. Immunol. 2025, 16, 1571731. [Google Scholar] [CrossRef]
  5. Wang, Z.; Hulikova, A.; Swietach, P. Innovating cancer drug discovery with refined phenotypic screens. Trends Pharmacol. Sci. 2024, 45, 723–738. [Google Scholar] [CrossRef]
  6. Jiang, C.; Yang, S.; Wang, Y.; Du, L.; Niu, M.M.; Zhang, D. Structure-based design of new potent and highly selective PARP-1 inhibitor for treating colorectal cancer. J. Enzyme Inhib. Med. Chem. 2025, 40, 2542358. [Google Scholar] [CrossRef] [PubMed]
  7. Sharma, D.; Arumugam, S. A machine learning-Assisted QSAR and integrative computational combined with network pharmacology approach for rational identification of tankyrase inhibitors in colon adenocarcinoma. Comput. Biol. Med. 2025, 197, 111068. [Google Scholar] [CrossRef] [PubMed]
  8. Pushpaveni, C.; Hemavathi, S.; Kurmi, S.P.C.; Patra, B.R.; Esther, V.A.; Yadav, C.K.; Biradar, M.S.; Thapa, S. Repurposing terfenadine and domperidone for inhibition of apoptotic gene association in colorectal cancer: A system pharmacology approach integrated with molecular docking, MD simulations, and post-MD simulation analysis. Bioinform. Biol. Insights 2025, 19, 11779322251365019. [Google Scholar] [CrossRef] [PubMed]
  9. Scalvini, L.; Tagliazucchi, L.; Elisi, G.M.; Zappaterra, D.; Moschella, M.G.; Fantini, S.; Aiello, D.; Guerrini, R.; Albanese, V.; Pacifico, S.; et al. Identification of pyrazolo-piperidinone derivatives targeting YAP-TEAD interface 3 as anticancer agents through integrated virtual screening and mass spectrometry proteomics. Eur. J. Med. Chem. 2025, 300, 118056. [Google Scholar] [CrossRef]
  10. Arooj, M.; Mateen, R.M.; Javed, M.; Ali, M.; Fareed, M.I.; Parveen, R.; Bahadur, A.; Iqbal, S.; Mahmood, S.; Knani, S.; et al. Computational screening of phytochemicals targeting mutant KRAS in colorectal cancer. Sci. Rep. 2025, 15, 28754. [Google Scholar] [CrossRef]
  11. Khalid, M.; Mateen, R.M.; Javed, M.; Ali, M.; Saqab, M.A.N.; Parveen, R.; Asimov, A.; Bibi, S.; Bahadur, A.; Iqbal, S.; et al. In-silico analysis of potential phytochemicals targeting mitogen activating protein kinase-14 (MAPK14) gene in colorectal cancer. Sci. Rep. 2025, 15, 20361. [Google Scholar] [CrossRef]
  12. Oladeji, S.M.; Conteh, D.N.; Bello, L.A.; Adegboyega, A.E.; Shokunbi, O.S. Rational Design and Optimization of Novel PDE5 Inhibitors for Targeted Colorectal Cancer Therapy: An In Silico Approach. Int. J. Mol. Sci. 2025, 26, 1937. [Google Scholar] [CrossRef]
  13. Alshahrani, M.M. Structural stability-guided scaffold hopping and computational modeling of tankyrase inhibitors targeting colorectal cancer. PLoS ONE 2025, 20, e0332798. [Google Scholar] [CrossRef] [PubMed]
  14. Kleandrova, V.V.; Cordeiro, M.N.D.S.; Speck-Planche, A. Perturbation-Theory Machine Learning for Multi-Target Drug Discovery in Modern Anticancer Research. Curr. Issues Mol. Biol. 2025, 47, 301. [Google Scholar] [CrossRef]
  15. Kleandrova, V.V.; Cordeiro, M.N.D.S.; Speck-Planche, A. Optimizing drug discovery using multitasking models for quantitative structure-biological effect relationships: An update of the literature. Expert Opin. Drug Discov. 2023, 18, 1231–1243. [Google Scholar] [CrossRef] [PubMed]
  16. Kleandrova, V.V.; Cordeiro, M.N.D.S.; Speck-Planche, A. In Silico Approach for Antibacterial Discovery: PTML Modeling of Virtual Multi-Strain Inhibitors Against Staphylococcus aureus. Pharmaceuticals 2025, 18, 196. [Google Scholar] [CrossRef]
  17. Velasquez-Lopez, Y.; Ruiz-Escudero, A.; Arrasate, S.; Gonzalez-Diaz, H. Implementation of IFPTML Computational Models in Drug Discovery Against Flaviviridae Family. J. Chem. Inf. Model. 2024, 64, 1841–1852. [Google Scholar] [CrossRef] [PubMed]
  18. Santiago, C.; Ortega-Tenezaca, B.; Barbolla, I.; Fundora-Ortiz, B.; Arrasate, S.; Dea-Ayuela, M.A.; Gonzalez-Diaz, H.; Sotomayor, N.; Lete, E. Prediction of Antileishmanial Compounds: General Model, Preparation, and Evaluation of 2-Acylpyrrole Derivatives. J. Chem. Inf. Model. 2022, 62, 3928–3940. [Google Scholar] [CrossRef]
  19. Dieguez-Santana, K.; Casanola-Martin, G.M.; Torres, R.; Rasulev, B.; Green, J.R.; Gonzalez-Diaz, H. Machine Learning Study of Metabolic Networks vs ChEMBL Data of Antibacterial Compounds. Mol. Pharm. 2022, 19, 2151–2163. [Google Scholar] [CrossRef]
  20. Vasquez-Dominguez, E.; Armijos-Jaramillo, V.D.; Tejera, E.; Gonzalez-Diaz, H. Multioutput Perturbation-Theory Machine Learning (PTML) Model of ChEMBL Data for Antiretroviral Compounds. Mol. Pharm. 2019, 16, 4200–4212. [Google Scholar] [CrossRef]
  21. Quevedo-Tumailli, V.; Ortega-Tenezaca, B.; Gonzalez-Diaz, H. IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds. Int. J. Mol. Sci. 2021, 22, 13066. [Google Scholar] [CrossRef]
  22. Barbolla, I.; Hernandez-Suarez, L.; Quevedo-Tumailli, V.; Nocedo-Mena, D.; Arrasate, S.; Dea-Ayuela, M.A.; Gonzalez-Diaz, H.; Sotomayor, N.; Lete, E. Palladium-mediated synthesis and biological evaluation of C-10b substituted Dihydropyrrolo[1,2-b]isoquinolines as antileishmanial agents. Eur. J. Med. Chem. 2021, 220, 113458. [Google Scholar] [CrossRef]
  23. Kleandrova, V.V.; Speck-Planche, A. PTML Modeling for Alzheimer’s Disease: Design and Prediction of Virtual Multi-Target Inhibitors of GSK3B, HDAC1, and HDAC6. Curr. Top. Med. Chem. 2020, 20, 1661–1676. [Google Scholar] [CrossRef]
  24. Baltasar-Marchueta, M.; Llona, L.; M.-Alicante, S.; Barbolla, I.; Ibarluzea, M.G.; Ramis, R.; Salomon, A.M.; Fundora, B.; Araujo, A.; Muguruza-Montero, A.; et al. Identification of Riluzole derivatives as novel calmodulin inhibitors with neuroprotective activity by a joint synthesis, biosensor, and computational guided strategy. Biomed. Pharmacother. 2024, 174, 116602. [Google Scholar] [CrossRef]
  25. Sampaio-Dias, I.E.; Rodriguez-Borges, J.E.; Yanez-Perez, V.; Arrasate, S.; Llorente, J.; Brea, J.M.; Bediaga, H.; Vina, D.; Loza, M.I.; Caamano, O.; et al. Synthesis, Pharmacological, and Biological Evaluation of 2-Furoyl-Based MIF-1 Peptidomimetics and the Development of a General-Purpose Model for Allosteric Modulators (ALLOPTML). ACS Chem. Neurosci. 2021, 12, 203–215. [Google Scholar] [CrossRef] [PubMed]
  26. Diez-Alarcia, R.; Yanez-Perez, V.; Muneta-Arrate, I.; Arrasate, S.; Lete, E.; Meana, J.J.; Gonzalez-Diaz, H. Big Data Challenges Targeting Proteins in GPCR Signaling Pathways; Combining PTML-ChEMBL Models and [(35)S]GTPgammaS Binding Assays. ACS Chem. Neurosci. 2019, 10, 4476–4491. [Google Scholar] [CrossRef] [PubMed]
  27. Ferreira da Costa, J.; Silva, D.; Caamano, O.; Brea, J.M.; Loza, M.I.; Munteanu, C.R.; Pazos, A.; Garcia-Mera, X.; Gonzalez-Diaz, H. Perturbation Theory/Machine Learning Model of ChEMBL Data for Dopamine Targets: Docking, Synthesis, and Assay of New l-Prolyl-l-leucyl-glycinamide Peptidomimetics. ACS Chem. Neurosci. 2018, 9, 2572–2587. [Google Scholar] [CrossRef]
  28. Tenorio-Borroto, E.; Castanedo, N.; Garcia-Mera, X.; Rivadeneira, K.; Vazquez Chagoyan, J.C.; Barbabosa Pliego, A.; Munteanu, C.R.; Gonzalez-Diaz, H. Perturbation Theory Machine Learning Modeling of Immunotoxicity for Drugs Targeting Inflammatory Cytokines and Study of the Antimicrobial G1 Using Cytometric Bead Arrays. Chem. Res. Toxicol. 2019, 32, 1811–1823. [Google Scholar] [CrossRef]
  29. Vazquez-Prieto, S.; Paniagua, E.; Solana, H.; Ubeira, F.M.; Gonzalez-Diaz, H. A study of the Immune Epitope Database for some fungi species using network topological indices. Mol. Divers. 2017, 21, 713–718. [Google Scholar] [CrossRef]
  30. He, S.; Segura Abarrategi, J.; Bediaga, H.; Arrasate, S.; Gonzalez-Diaz, H. On the additive artificial intelligence-based discovery of nanoparticle neurodegenerative disease drug delivery systems. Beilstein J. Nanotechnol. 2024, 15, 535–555. [Google Scholar] [CrossRef]
  31. He, S.; Nader, K.; Abarrategi, J.S.; Bediaga, H.; Nocedo-Mena, D.; Ascencio, E.; Casanola-Martin, G.M.; Castellanos-Rubio, I.; Insausti, M.; Rasulev, B.; et al. NANO.PTML model for read-across prediction of nanosystems in neurosciences. computational model and experimental case of study. J. Nanobiotechnol. 2024, 22, 435. [Google Scholar] [CrossRef]
  32. Ortega-Tenezaca, B.; Gonzalez-Diaz, H. IFPTML mapping of nanoparticle antibacterial activity vs. pathogen metabolic networks. Nanoscale 2021, 13, 1318–1330. [Google Scholar] [CrossRef]
  33. Munteanu, C.R.; Gutierrez-Asorey, P.; Blanes-Rodriguez, M.; Hidalgo-Delgado, I.; Blanco Liverio, M.J.; Castineiras Galdo, B.; Porto-Pazos, A.B.; Gestal, M.; Arrasate, S.; Gonzalez-Diaz, H. Prediction of Anti-Glioblastoma Drug-Decorated Nanoparticle Delivery Systems Using Molecular Descriptors and Machine Learning. Int. J. Mol. Sci. 2021, 22, 11519. [Google Scholar] [CrossRef] [PubMed]
  34. Dieguez-Santana, K.; Gonzalez-Diaz, H. Towards machine learning discovery of dual antibacterial drug-nanoparticle systems. Nanoscale 2021, 13, 17854–17870. [Google Scholar] [CrossRef]
  35. Urista, D.V.; Carrue, D.B.; Otero, I.; Arrasate, S.; Quevedo-Tumailli, V.F.; Gestal, M.; Gonzalez-Diaz, H.; Munteanu, C.R. Prediction of Antimalarial Drug-Decorated Nanoparticle Delivery Systems with Random Forest Models. Biology 2020, 9, 198. [Google Scholar] [CrossRef]
  36. Kleandrova, V.V.; Cordeiro, M.N.D.S.; Speck-Planche, A. Perturbation Theory Machine Learning Model for Phenotypic Early Antineoplastic Drug Discovery: Design of Virtual Anti-Lung-Cancer Agents. Appl. Sci. 2024, 14, 9344. [Google Scholar] [CrossRef]
  37. Cabrera-Andrade, A.; Lopez-Cortes, A.; Munteanu, C.R.; Pazos, A.; Perez-Castillo, Y.; Tejera, E.; Arrasate, S.; Gonzalez-Diaz, H. Perturbation-Theory Machine Learning (PTML) Multilabel Model of the ChEMBL Dataset of Preclinical Assays for Antisarcoma Compounds. ACS Omega 2020, 5, 27211–27220. [Google Scholar] [CrossRef] [PubMed]
  38. Cabrera-Andrade, A.; Lopez-Cortes, A.; Jaramillo-Koupermann, G.; Gonzalez-Diaz, H.; Pazos, A.; Munteanu, C.R.; Perez-Castillo, Y.; Tejera, E. A Multi-Objective Approach for Anti-Osteosarcoma Cancer Agents Discovery Through Drug Repurposing. Pharmaceuticals 2020, 13, 409. [Google Scholar] [CrossRef] [PubMed]
  39. Kleandrova, V.V.; Cordeiro, M.; Speck-Planche, A. Perturbation-theory machine learning for mood disorders: Virtual design of dual inhibitors of NET and SERT proteins. BMC Chem. 2025, 19, 2. [Google Scholar] [CrossRef]
  40. Tsherniak, A.; Vazquez, F.; Montgomery, P.G.; Weir, B.A.; Kryukov, G.; Cowley, G.S.; Gill, S.; Harrington, W.F.; Pantel, S.; Krill-Burger, J.M.; et al. Defining a Cancer Dependency Map. Cell 2017, 170, 564–576.e16. [Google Scholar] [CrossRef]
  41. Robin, T.; Capes-Davis, A.; Bairoch, A. CLASTR: The Cellosaurus STR similarity search tool—A precious help for cell line authentication. Int. J. Cancer 2020, 146, 1299–1306. [Google Scholar] [CrossRef]
  42. van der Meer, D.; Barthorpe, S.; Yang, W.; Lightfoot, H.; Hall, C.; Gilbert, J.; Francies, H.E.; Garnett, M.J. Cell Model Passports-a hub for clinical, genetic and functional datasets of preclinical cancer models. Nucleic Acids Res. 2019, 47, D923–D929. [Google Scholar] [CrossRef]
  43. Schneider, G.; Wrede, P. Artificial neural networks for computer-based molecular design. Prog. Biophys. Mol. Biol. 1998, 70, 175–222. [Google Scholar] [CrossRef]
  44. Chicco, D.; Jurman, G. The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification. BioData Min. 2023, 16, 4. [Google Scholar] [CrossRef]
  45. Kleandrova, V.V.; Cordeiro, M.; Speck-Planche, A. In Silico Approach for Early Antimalarial Drug Discovery: De Novo Design of Virtual Multi-Strain Antiplasmodial Inhibitors. Microorganisms 2025, 13, 1620. [Google Scholar] [CrossRef]
  46. Zhou, X.; Lin, H.; Lin, H. Global Sensitivity Analysis. In Encyclopedia of GIS; Shekhar, S., Xiong, H., Eds.; Springer: Boston, MA, USA, 2008; pp. 408–409. [Google Scholar]
  47. Estrada, E. Spectral moments of the edge adjacency matrix in molecular graphs. 1. Definition and applications for the prediction of physical properties of alkanes. J. Chem. Inf. Comput. Sci. 1996, 36, 844–849. [Google Scholar] [CrossRef]
  48. Estrada, E. Spectral moments of the edge adjacency matrix in molecular graphs. 2. Molecules containing heteroatoms and QSAR applications. J. Chem. Inf. Comput. Sci. 1997, 37, 320–328. [Google Scholar] [CrossRef]
  49. Estrada, E. Spectral moments of the edge adjacency matrix in molecular graphs. 3. Molecules containing cycles. J. Chem. Inf. Comput. Sci. 1998, 38, 23–27. [Google Scholar] [CrossRef]
  50. Estrada, E. How the parts organize in the whole? A top-down view of molecular descriptors and properties for QSAR and drug design. Mini Rev. Med. Chem. 2008, 8, 213–221. [Google Scholar] [CrossRef] [PubMed]
  51. Estrada, E.; Molina, E. Automatic extraction of structural alerts for predicting chromosome aberrations of organic compounds. J. Mol. Graph. Model. 2006, 25, 275–288. [Google Scholar] [CrossRef] [PubMed]
  52. Helguera, A.M.; Cabrera Perez, M.A.; Gonzalez, M.P.; Ruiz, R.M.; Gonzalez Diaz, H. A topological substructural approach applied to the computational prediction of rodent carcinogenicity. Bioorg. Med. Chem. 2005, 13, 2477–2488. [Google Scholar] [CrossRef]
  53. Estrada, E.; Molina, E.; Perdomo-Lopez, I. Can 3D structural parameters be predicted from 2D (topological) molecular descriptors? J. Chem. Inf. Comput. Sci. 2001, 41, 1015–1021. [Google Scholar] [CrossRef]
  54. Estrada, E. Physicochemical Interpretation of Molecular Connectivity Indices. J. Phys. Chem. A 2002, 106, 9085–9091. [Google Scholar] [CrossRef]
  55. Kier, L.B.; Murray, W.J.; Hall, L.H. Molecular connectivity. 4. Relationships to biological activities. J. Med. Chem. 1975, 18, 1272–1274. [Google Scholar] [CrossRef] [PubMed]
  56. Kier, L.B.; Hall, L.H. Molecular connectivity VII: Specific treatment of heteroatoms. J. Pharm. Sci. 1976, 65, 1806–1809. [Google Scholar] [CrossRef]
  57. Hall, L.H.; Kier, L.B. Structure-activity studies using valence molecular connectivity. J. Pharm. Sci. 1977, 66, 642–644. [Google Scholar] [CrossRef]
  58. Kier, L.B.; Hall, L.H. Derivation and significance of valence molecular connectivity. J. Pharm. Sci. 1981, 70, 583–589. [Google Scholar] [CrossRef] [PubMed]
  59. Kier, L.B.; Hall, L.H. Intermolecular accessibility: The meaning of molecular connectivity. J. Chem. Inf. Comput. Sci. 2000, 40, 792–795. [Google Scholar] [CrossRef]
  60. Kier, L.B.; Hall, L.H. Molecular connectivity: Intermolecular accessibility and encounter simulation. J. Mol. Graph. Model. 2001, 20, 76–83. [Google Scholar] [CrossRef]
  61. Estrada, E. Edge adjacency relationship and a novel topological index related to molecular volume. J. Chem. Inf. Comput. Sci. 1995, 35, 31–33. [Google Scholar] [CrossRef]
  62. Estrada, E. Edge adjacency relationships in molecular graphs containing heteroatoms: A new topological index related to molar volume. J. Chem. Inf. Comput. Sci. 1995, 35, 701–707. [Google Scholar] [CrossRef]
  63. Estrada, E.; Rodríguez, L. Edge-Connectivity Indices in QSPR/QSAR Studies. 1. Comparison to Other Topological Indices in QSPR Studies. J. Chem. Inf. Comput. Sci. 1999, 39, 1037–1041. [Google Scholar] [CrossRef]
  64. Estrada, E. Edge-Connectivity Indices in QSPR/QSAR Studies. 2. Accounting for Long-Range Bond Contributions. J. Chem. Inf. Comput. Sci. 1999, 39, 1042–1048. [Google Scholar] [CrossRef]
  65. Estrada, E.; Guevara, N.; Gutman, I. Extension of Edge Connectivity Index. Relationships to Line Graph Indices and QSPR Applications. J. Chem. Inf. Comput. Sci. 1998, 38, 428–431. [Google Scholar] [CrossRef]
  66. Zdrazil, B.; Felix, E.; Hunter, F.; Manners, E.J.; Blackshaw, J.; Corbett, S.; de Veij, M.; Ioannidis, H.; Lopez, D.M.; Mosquera, J.F.; et al. The ChEMBL Database in 2023: A drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res. 2024, 52, D1180–D1192. [Google Scholar] [CrossRef]
  67. Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B.A.; Thiessen, P.A.; Yu, B.; et al. PubChem 2019 update: Improved access to chemical data. Nucleic Acids Res. 2019, 47, D1102–D1109. [Google Scholar] [CrossRef] [PubMed]
  68. Lagunin, A.A.; Rudik, A.V.; Pogodin, P.V.; Savosina, P.I.; Tarasova, O.A.; Dmitriev, A.V.; Ivanov, S.M.; Biziukova, N.Y.; Druzhilovskiy, D.S.; Filimonov, D.A.; et al. CLC-Pred 2.0: A Freely Available Web Application for In Silico Prediction of Human Cell Line Cytotoxicity and Molecular Mechanisms of Action for Druglike Compounds. Int. J. Mol. Sci. 2023, 24, 1689. [Google Scholar] [CrossRef] [PubMed]
  69. Mendez, D.; Gaulton, A.; Bento, A.P.; Chambers, J.; De Veij, M.; Felix, E.; Magarinos, M.P.; Mosquera, J.F.; Mutowo, P.; Nowotka, M.; et al. ChEMBL: Towards direct deposition of bioassay data. Nucleic Acids Res. 2019, 47, D930–D940. [Google Scholar] [CrossRef] [PubMed]
  70. Papadatos, G.; Davies, M.; Dedman, N.; Chambers, J.; Gaulton, A.; Siddle, J.; Koks, R.; Irvine, S.A.; Pettersson, J.; Goncharoff, N.; et al. SureChEMBL: A large-scale, chemically annotated patent document database. Nucleic Acids Res. 2016, 44, D1220–D1228. [Google Scholar] [CrossRef] [PubMed]
  71. Gubernator, K.; James, C.A.; Gubernator, N. eMolecules. California, USA. Available online: https://www.emolecules.com/ (accessed on 14 October 2025).
  72. Irwin, J.J.; Tang, K.G.; Young, J.; Dandarchuluun, C.; Wong, B.R.; Khurelbaatar, M.; Moroz, Y.S.; Mayfield, J.; Sayle, R.A. ZINC20-A Free Ultralarge-Scale Chemical Database for Ligand Discovery. J. Chem. Inf. Model. 2020, 60, 6065–6073. [Google Scholar] [CrossRef]
  73. Sterling, T.; Irwin, J.J. ZINC 15--Ligand Discovery for Everyone. J. Chem. Inf. Model. 2015, 55, 2324–2337. [Google Scholar] [CrossRef]
  74. Irwin, J.J.; Sterling, T.; Mysinger, M.M.; Bolstad, E.S.; Coleman, R.G. ZINC: A free tool to discover chemistry for biology. J. Chem. Inf. Model. 2012, 52, 1757–1768. [Google Scholar] [CrossRef]
  75. Irwin, J.J.; Shoichet, B.K. ZINC– a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 2005, 45, 177–182. [Google Scholar] [CrossRef] [PubMed]
  76. Maggiora, G.; Vogt, M.; Stumpfe, D.; Bajorath, J. Molecular similarity in medicinal chemistry. J. Med. Chem. 2014, 57, 3186–3204. [Google Scholar] [CrossRef] [PubMed]
  77. Lipinski, C.A.; Lombardo, F.; Dominy, B.W.; Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 2001, 46, 3–26. [Google Scholar] [CrossRef]
  78. Veber, D.F.; Johnson, S.R.; Cheng, H.Y.; Smith, B.R.; Ward, K.W.; Kopple, K.D. Molecular properties that influence the oral bioavailability of drug candidates. J. Med. Chem. 2002, 45, 2615–2623. [Google Scholar] [CrossRef] [PubMed]
  79. Mauri, A. alvaDesc: A Tool to Calculate and Analyze Molecular Descriptors and Fingerprints. In Ecotoxicological QSARs; Roy, K., Ed.; Springer: New York, NY, USA, 2020; pp. 801–820. [Google Scholar]
  80. Pathania, S.; Singh, P.K. Analyzing FDA-approved drugs for compliance of pharmacokinetic principles: Should there be a critical screening parameter in drug designing protocols? Expert Opin. Drug Metab. Toxicol. 2021, 17, 351–354. [Google Scholar] [CrossRef]
  81. Gaulton, A.; Bellis, L.J.; Bento, A.P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; et al. ChEMBL: A large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40, D1100–D1107. [Google Scholar] [CrossRef]
  82. Mok, N.Y.; Brenk, R. Mining the ChEMBL database: An efficient chemoinformatics workflow for assembling an ion channel-focused screening library. J. Chem. Inf. Model. 2011, 51, 2449–2454. [Google Scholar] [CrossRef]
  83. Holbeck, S.L.; Collins, J.M.; Doroshow, J.H. Analysis of Food and Drug Administration-approved anticancer agents in the NCI60 panel of human tumor cell lines. Mol. Cancer Ther. 2010, 9, 1451–1460, Erratum in Am. J. Psychiatry 2012, 169, 540. [Google Scholar] [CrossRef]
  84. Johnson, J.I.; Decker, S.; Zaharevitz, D.; Rubinstein, L.V.; Venditti, J.M.; Schepartz, S.; Kalyandrug, S.; Christian, M.; Arbuck, S.; Hollingshead, M.; et al. Relationships between drug activity in NCI preclinical in vitro and in vivo models and early clinical trials. Br. J. Cancer 2001, 84, 1424–1431. [Google Scholar] [CrossRef] [PubMed]
  85. Estrada, E.; Gutiérrez, Y. MODESLAB, v1.5; Santiago de Compostela, Spain. 2004. Available online: https://insilicomoleculardesign.com/modeslab/ (accessed on 30 September 2025).
  86. Urias, R.W.; Barigye, S.J.; Marrero-Ponce, Y.; Garcia-Jacas, C.R.; Valdes-Martini, J.R.; Perez-Gimenez, F. IMMAN: Free software for information theory-based chemometric analysis. Mol. Divers. 2015, 19, 305–319. [Google Scholar] [CrossRef]
  87. Stahura, F.L.; Godden, J.W.; Bajorath, J. Differential Shannon entropy analysis identifies molecular property descriptors that predict aqueous solubility of synthetic compounds with high accuracy in binary QSAR calculations. J. Chem. Inf. Comput. Sci. 2002, 42, 550–558. [Google Scholar] [CrossRef]
  88. Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
  89. Press, W.H.; Flannery, B.P.; Teukolsky, S.A.; Vetterling, W.T. Numerical Recipes in C: The Art of Scientific Computing, 1st ed.; Cambridge University Press: New York, NY, USA, 1988. [Google Scholar]
  90. TIBCO-Software-Inc. STATISTICA (Data Analysis Software System), v13.5.0.17; TIBCO-Software-Inc.: Palo Alto, CA, USA, 2018.
  91. Kleandrova, V.V.; Cordeiro, M.N.D.S.; Speck-Planche, A. Perturbation-Theory Machine Learning for Multi-Objective Antibacterial Discovery: Current Status and Future Perspectives. Appl. Sci. 2025, 15, 1166. [Google Scholar] [CrossRef]
  92. Manallack, D.T.; Livingstone, D.J.; A.-Razzak, M.; Glen, R.C. Neural Networks and Expert Systems in Molecular Design. In Advanced Computer-Assisted Techniques in Drug Discovery; van de Waterbeemd, H., Ed.; Methods and Principles in Medicinal Chemistry; Wiley: Hoboken, NJ, USA, 1994; pp. 293–331. [Google Scholar]
Figure 1. Chemical structures of anticancer drugs correctly predicted by the PTML-MLP model.
Figure 1. Chemical structures of anticancer drugs correctly predicted by the PTML-MLP model.
Ijms 26 11453 g001
Figure 2. Chemicals labeled and correctly predicted by the PTML-MLP model as multi-cell inhibitors against the CRC cell lines.
Figure 2. Chemicals labeled and correctly predicted by the PTML-MLP model as multi-cell inhibitors against the CRC cell lines.
Ijms 26 11453 g002
Figure 3. Relative importance of the D[GTI]ej indices in the PTML-MLP model, assessed through their SVs.
Figure 3. Relative importance of the D[GTI]ej indices in the PTML-MLP model, assessed through their SVs.
Ijms 26 11453 g003
Figure 4. Most common subgraphs (SGs) described by the D[GTI]ej indices that characterize molecular fragments in the dataset used to create the PTML-MLP model.
Figure 4. Most common subgraphs (SGs) described by the D[GTI]ej indices that characterize molecular fragments in the dataset used to create the PTML-MLP model.
Ijms 26 11453 g004
Figure 5. The FBTD approach applied to the design of new molecules virtually exhibiting multi-cell anti-CRC activity.
Figure 5. The FBTD approach applied to the design of new molecules virtually exhibiting multi-cell anti-CRC activity.
Ijms 26 11453 g005
Table 1. Definitions of the different D[GTI]ej indices present in the PTML-MLP model.
Table 1. Definitions of the different D[GTI]ej indices present in the PTML-MLP model.
Codes a,b,cSymbologyDefinition
DGT01D[NSM(Dip)5]dtMulti-label topological index derived from the normalized bond-based spectral moment of order 5, weighted by the bond dipole moment.
DGT02D[NSM(Psa)1]dtMulti-label topological index derived from the normalized bond-based spectral moment of order 1, weighted by the polar surface area.
DGT03D[NSM(Mol)4]dtMulti-label topological index derived from the normalized bond-based spectral moment of order 4, weighted by the atomic contributions to the molar refractivity.
DGT04D[Ne(P)2]dtMulti-label topological index derived from the normalized bond connectivity of order 2, containing only path subgraphs.
DGT05D[SM(Psa)1]ctMulti-label topological index derived from the bond-based spectral moment of order 1, weighted by the atomic contributions to the polar surface area.
DGT06D[e(C)4]ctMulti-label topological index derived from the bond connectivity of order 4, containing only cluster subgraphs.
DGT07D[SM(Hyd)1]mcMulti-label topological index derived from the bond-based spectral moment of order 1, weighted by the atomic contributions to the hydrophobicity.
DGT08D[SM(Psa)4]mcMulti-label topological index derived from the bond-based spectral moment of order 4, weighted by the atomic contributions to the polar surface area.
DGT09D[SM(Mol)3]mcMulti-label topological index derived from the bond-based spectral moment of order 3, weighted by the atomic contributions to the molar refractivity.
DGT10D[SM(Ato)4]mcMulti-label topological index derived from the bond-based spectral moment of order 4, weighted by the atomic weights.
DGT11D[Xv(Ch)6]mcMulti-label topological index derived from the atom-based valence connectivity of order 6, containing only ring (cycle) subgraphs.
DGT12D[e(P)1]mcMulti-label topological index derived from the bond connectivity of order 1, containing only path subgraphs.
DGT13D[e(C)5]mcMulti-label topological index derived from the bond connectivity of order 5, containing only cluster subgraphs.
DGT14D[e(Ch)5]mcMulti-label topological index derived from the bond connectivity of order 5, containing only ring (cycle) subgraphs.
DGT15D[e(Ch)6]mcMulti-label topological index derived from the bond connectivity of order 6, containing only ring (cycle) subgraphs.
DGT16D[NSM(Std)1]mcMulti-label topological index derived from the normalized bond-based spectral moment of order 1, weighted by the bond standard distance.
DGT17D[NSM(Mol)1]mcMulti-label topological index derived from the normalized bond-based spectral moment of order 1, weighted by the atomic contributions to the molar refractivity.
DGT18D[NXv(P)6]mcMulti-label topological index derived from the normalized atom-based valence connectivity of order 6, containing only path subgraphs.
DGT19D[Ne(P)1]mcMulti-label topological index derived from the normalized bond connectivity of order 1, containing only path subgraphs.
DGT20D[Ne(P)6]mcMulti-label topological index derived from the normalized bond connectivity of order 6, containing only path subgraphs.
a The codes for the D[GTI]ej indices will be used throughout the entire manuscript. b For the D[GTI]ej indices containing the symbology “SM”, the order indicates the maximum number of bonds that a fragment can have (without considering bond order). For the D[GTI]ej indices containing the symbols “Xv” and “e”, the order is the exact number of bonds (without considering order) present in a fragment. c The notation dt indicates that the D[GTI]ej indices depend on the chemical structure and the cells’ doubling times; likewise, ct indicates that the D[GTI]ej indices depend on the chemical structure and the specific CRC cell lines, while D[GTI]ej indices with the notation mc depend on the chemical structure and the microsatellite characteristics.
Table 2. PTML-MLP model: performance analysis through global metrics.
Table 2. PTML-MLP model: performance analysis through global metrics.
Symbols aTraining SetTest Set
NActive1846615
CCActive1643512
Sn89.00%83.25%
NInactive2262755
CCInactive2026635
Sp89.57%84.11%
nMCC0.8920.836
a NActive—Number of chemicals labeled as active; NInactive—Number of chemicals labeled as inactive; CCActive—Number of chemicals correctly classified as active; CCInactive—Number of chemicals correctly classified as inactive; Sn—Sensitivity (percentage of cases correctly predicted as active); Sp—Specificity (percentage of cases correctly predicted as inactive); nMCC—Normalized Matthews’ correlation coefficient.
Table 3. Biological aspects considered by the PTML-MLP model when predicting anti-CRC activity.
Table 3. Biological aspects considered by the PTML-MLP model when predicting anti-CRC activity.
ej adt bct cmc d
ej01Slow growthHCC2998MSS
ej02Fast growthHT-29MSS
ej03Fast growthHCT 15MSI
ej04Fast growthHCT 116MSI
ej05Intermediate growthKM12MSI
ej06Intermediate growthSW620MSS
ej07Slow growthCOLO 205MSS
a Codes for the combination of experimental aspects, with each of them considering a defined label belonging to aspect dt, one label related to the aspect ct, and another label based on the aspect mc. b Labels associated with the different growth rates (doubling times) of each CRC cell line. For the case of the aspect dt, the labels were annotated according to the values of doubling times expressed in hours (see the Materials and Methods section). c Labels for the specific type of CRC cell line. d Labels involving the microsatellite characteristics of each CRC cell line; the notation “MSS” indicates a stable microsatellite while “MSI” indicates microsatellite instability.
Table 4. Relative variability in the values of the different D[GTI]ej indices in the PTML-MLP model.
Table 4. Relative variability in the values of the different D[GTI]ej indices in the PTML-MLP model.
Codes aAverage ValuesPropensity b
ActiveInactive
DGT011.481 × 10−28.723 × 10−2Decrease
DGT025.160 × 10−31.156 × 10−1Decrease
DGT033.545 × 10−31.048 × 10−1Decrease
DGT04−2.001 × 10−21.414 × 10−1Decrease
DGT052.747 × 10−34.112 × 10−2Decrease
DGT066.336 × 10−32.913 × 10−2Decrease
DGT07−9.067 × 10−3−4.975 × 10−2Increase
DGT081.135 × 10−21.840 × 10−1Decrease
DGT09−1.061 × 10−28.230 × 10−2Decrease
DGT101.251 × 10−2−2.358 × 10−2Increase
DGT11−7.321 × 10−3−1.911 × 10−3Decrease
DGT12−3.556 × 10−3−1.092 × 10−1Increase
DGT131.692 × 10−24.808 × 10−3Increase
DGT149.767 × 10−37.377 × 10−2Decrease
DGT15−2.721 × 10−22.977 × 10−2Decrease
DGT16−1.179 × 10−2−6.458 × 10−2Increase
DGT17−1.816 × 10−28.865 × 10−2Decrease
DGT181.347 × 10−23.415 × 10−2Decrease
DGT19−7.019 × 10−3−6.653 × 10−2Increase
DGT202.185 × 10−3−1.142 × 10−1Increase
a The codes depicted here are the same as the ones reported in Table 1. b This refers to the relative variation (decrease or increase) in the value of a defined D[GTI]ej index.
Table 5. Predictions of multi-cell anti-CRC activity performed by the PTML-MLP model and CLC-Pred 2.0.
Table 5. Predictions of multi-cell anti-CRC activity performed by the PTML-MLP model and CLC-Pred 2.0.
ID aej bPTML-MLP Model c,dCLC-Pred 2.0
(GI50 ≤ 100 nM) e
PACRC(ej)ProbAct (%)PaPi
ASP-COLRC-01ej01−143.430.2540.148
ASP-COLRC-01ej02158.410.2660.124
ASP-COLRC-01ej03155.23
ASP-COLRC-01ej04166.490.2090.204
ASP-COLRC-01ej05156.130.2820.117
ASP-COLRC-01ej06−142.44
ASP-COLRC-01ej07−146.020.3370.098
ASP-COLRC-02ej01155.32
ASP-COLRC-02ej02159.79
ASP-COLRC-02ej03−139.550.2240.174
ASP-COLRC-02ej04−144.55
ASP-COLRC-02ej05−139.640.2540.133
ASP-COLRC-02ej06150.60
ASP-COLRC-02ej07155.38
ASP-COLRC-03ej01−146.750.3360.097
ASP-COLRC-03ej02159.000.3020.104
ASP-COLRC-03ej03155.760.3230.107
ASP-COLRC-03ej04161.670.3610.097
ASP-COLRC-03ej05155.950.3690.080
ASP-COLRC-03ej06−149.100.2430.153
ASP-COLRC-03ej07150.880.3530.091
ASP-COLRC-04ej01166.31
ASP-COLRC-04ej02174.59
ASP-COLRC-04ej03172.670.2500.152
ASP-COLRC-04ej04176.790.2680.148
ASP-COLRC-04ej05173.990.2040.171
ASP-COLRC-04ej06167.950.2170.180
ASP-COLRC-04ej07169.47
ASP-COLRC-05ej01153.800.3430.093
ASP-COLRC-05ej02161.150.3020.104
ASP-COLRC-05ej03158.830.3580.091
ASP-COLRC-05ej04163.400.3760.090
ASP-COLRC-05ej05158.850.3970.072
ASP-COLRC-05ej06154.710.2610.140
ASP-COLRC-05ej07156.600.3460.094
ASP-COLRC-06ej01176.95
ASP-COLRC-06ej02179.130.1890.188
ASP-COLRC-06ej03176.560.2850.129
ASP-COLRC-06ej04178.330.2820.138
ASP-COLRC-06ej05177.200.2280.152
ASP-COLRC-06ej06177.280.2320.164
ASP-COLRC-06ej07178.12
a Codes for the molecules designed as multi-cell anti-CRC agents; these codes coincide with those reported in Figure 5. b Combinations of experimental aspects as represented in Table 3. c Predicted categorical activity values obtained by the PTML-MLP model. If PACRC(ej) = 1 means that the molecule was predicted as active (exhibiting GI50 ≤ 1900 nM); otherwise, the molecule was predicted as inactive, i.e., PACRC(ej) = −1. d Probability value obtained by the PTML-MLP model for a molecule to be classified as active. e Probabilities of being active (Pa) and inactive (Pi) by considering the activity cutoff GI50 ≤ 100 nM.
Table 6. Druglikeness-related physicochemical properties calculated for the six designed molecules.
Table 6. Druglikeness-related physicochemical properties calculated for the six designed molecules.
IDPhysicochemical Properties a
MWHBAHBDMlogPAlogPAvgLogPRBNPSA
ASP-COLRC-01473.551002.3044.6283.466661.80
ASP-COLRC-02435.65601.9413.0502.496677.87
ASP-COLRC-03483.481012.8265.3934.110784.78
ASP-COLRC-04467.48912.7734.1053.439675.55
ASP-COLRC-05513.511112.2775.3773.827894.01
ASP-COLRC-06497.511012.2134.0893.151784.78
a The druglikeness-related physicochemical properties have the following symbols and meanings: MW—molecular weight (expressed in Daltons—Da); HBA—number of hydrogen bond acceptors; HBD—number of hydrogen bond donors; MlogP—the logarithm of octanol-water partition coefficient according to Moriguchi’s method; AlogP—the logarithm of octanol-water partition coefficient according to Ghose–Crippen’s method; AvgLogP—the average value calculated from MlogP and AlogP; RBN—number of rotatable bonds; PSA—polar surface area (expressed in Å2).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Speck-Planche, A.; Cordeiro, M.N.D.S. Computational Phenotypic Drug Discovery for Anticancer Chemotherapy: PTML Modeling of Multi-Cell Inhibitors of Colorectal Cancer Cell Lines. Int. J. Mol. Sci. 2025, 26, 11453. https://doi.org/10.3390/ijms262311453

AMA Style

Speck-Planche A, Cordeiro MNDS. Computational Phenotypic Drug Discovery for Anticancer Chemotherapy: PTML Modeling of Multi-Cell Inhibitors of Colorectal Cancer Cell Lines. International Journal of Molecular Sciences. 2025; 26(23):11453. https://doi.org/10.3390/ijms262311453

Chicago/Turabian Style

Speck-Planche, Alejandro, and M. Natália D. S. Cordeiro. 2025. "Computational Phenotypic Drug Discovery for Anticancer Chemotherapy: PTML Modeling of Multi-Cell Inhibitors of Colorectal Cancer Cell Lines" International Journal of Molecular Sciences 26, no. 23: 11453. https://doi.org/10.3390/ijms262311453

APA Style

Speck-Planche, A., & Cordeiro, M. N. D. S. (2025). Computational Phenotypic Drug Discovery for Anticancer Chemotherapy: PTML Modeling of Multi-Cell Inhibitors of Colorectal Cancer Cell Lines. International Journal of Molecular Sciences, 26(23), 11453. https://doi.org/10.3390/ijms262311453

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop