Tubulin Inhibitors: A Chemoinformatic Analysis Using Cell-Based Data

Inhibiting the tubulin-microtubules (Tub-Mts) system is a classic and rational approach for treating different types of cancers. A large amount of data on inhibitors in the clinic supports Tub-Mts as a validated target. However, most of the inhibitors reported thus far have been developed around common chemical scaffolds covering a narrow region of the chemical space with limited innovation. This manuscript aims to discuss the first activity landscape and scaffold content analysis of an assembled and curated cell-based database of 851 Tub-Mts inhibitors with reported activity against five cancer cell lines and the Tub-Mts system. The structure–bioactivity relationships of the Tub-Mts system inhibitors were further explored using constellations plots. This recently developed methodology enables the rapid but quantitative assessment of analog series enriched with active compounds. The constellations plots identified promising analog series with high average biological activity that could be the starting points of new and more potent Tub-Mts inhibitors.


Introduction
The α,β-tubulin heterodimer is the basic structural unit of microtubules. It is one of the most studied cancer therapy targets due to its significant role in cellular and tumor proliferation. It actively participates in forming the centrosome, an essential organelle, during the G 2 /M phase of the cell cycle [1]. The microtubule's dynamic activity is guided by a polymerization and depolymerization process which can be modified by interaction with small molecules with different binding sites on the Tub-Mts system, e.g., colchicine, taxanes, pironetin, vinca alkaloids, and laulimalide derivatives, as shown in Figure 1. In this way, the modulation of polymerization/depolymerization of the microtubules allows for pharmacological regulation of the cell cycle, which is a crucial event in cancer [2]. According to the U.S. National Institute of Health (www.clinicaltrials.gov, accessed on 23 April 2021), there are several ongoing clinical studies in different phases related to tubulin inhibition: I (1604 studies), II (3771 studies), III (1410 studies), and IV (182 studies), that are analyzing colchicine derivatives (e.g., ombrabulin), taxanes (e.g., docetaxel), vinca alkaloids (e.g., ALB 109564) or laulimalide derivatives (e.g., epothilone D and eribulin). Although currently there are no pironetin analogs in clinical trials, pironetin is the first compound found to have the ability to covalently bind to microtubules, which gives it the capacity to inhibit the growth of cancer cells that are resistant to conventional treatments (derivatives of the vinca or paclitaxel) [3]. The small molecules are of synthetic, semi-synthetic, or natural origin. Figure 1 shows that the main binding sites are distributed along the microtubule. Additionally, the flexibility of microtubules' quaternary structure has limited classical structure-based drug design approaches, such as rigid molecular docking. Several distributed along the microtubule. Additionally, the flexibility of microtubules' quaternary structure has limited classical structure-based drug design approaches, such as rigid molecular docking. Several manuscripts that report predictive quantitative structureactivity relationships (QSAR) and machine learning models for compounds interacting with the Tub-Mts system have been published [4][5][6][7][8][9][10][11]. These studies have focused on the biological activity measured in biochemical assays. However, there are no reports on the quantitative analysis of the SAR of Tub-Mts system modulators tested in cell-based assays. One consistent approach to characterize the SAR of compound datasets is through the systematic pairwise comparison of the structure with the activity. This approach, termed "activity landscape modeling" [12,13], is based upon the similarity principle, i.e., structurally similar compounds have similar activity. Activity landscape modeling can be generalized to "property landscape modeling", where "property" includes a biological activity measured in vitro biochemical assays, cell-based, or any other type of activity with a measurable outcome. This approach identifies activity cliffs (AC), i.e., pairs of compounds with high structural similarity but large potency differences [14]. Depending on the work scope, an AC, which constitutes a significant exception to the similarity principle, can have beneficial or detrimental effects. For instance, an AC leads directly to essential structural information that influences the activity (property) [14]. Several quantitative and/or visual approaches have been published to characterize the activity (property) landscapes [15] of compounds with one or several endpoints over the past few years.
Virtual screening methods initially deal with many compounds, until they are reduced to a manageable quantity [16]. Each type of evaluation has its challenges and complexity in regard to in vitro assays (e.g., biochemical or cell-based assays). It is not uncommon that compounds which are active in biochemical assays are inactive in cell-based assays. Each system (biochemical or cell-based assays) allows for analyzing the properties of different compounds, evaluating their pros and cons depending on the costs and how representative each system can be for the proposed study. This is shown schematically in Figure 2 [17]. One consistent approach to characterize the SAR of compound datasets is through the systematic pairwise comparison of the structure with the activity. This approach, termed "activity landscape modeling" [12,13], is based upon the similarity principle, i.e., structurally similar compounds have similar activity. Activity landscape modeling can be generalized to "property landscape modeling", where "property" includes a biological activity measured in vitro biochemical assays, cell-based, or any other type of activity with a measurable outcome. This approach identifies activity cliffs (AC), i.e., pairs of compounds with high structural similarity but large potency differences [14]. Depending on the work scope, an AC, which constitutes a significant exception to the similarity principle, can have beneficial or detrimental effects. For instance, an AC leads directly to essential structural information that influences the activity (property) [14]. Several quantitative and/or visual approaches have been published to characterize the activity (property) landscapes [15] of compounds with one or several endpoints over the past few years.
Virtual screening methods initially deal with many compounds, until they are reduced to a manageable quantity [16]. Each type of evaluation has its challenges and complexity in regard to in vitro assays (e.g., biochemical or cell-based assays). It is not uncommon that compounds which are active in biochemical assays are inactive in cell-based assays. Each system (biochemical or cell-based assays) allows for analyzing the properties of different compounds, evaluating their pros and cons depending on the costs and how representative each system can be for the proposed study. This is shown schematically in Figure 2 [17]. Schematic overview of the differences between biochemical and cell-based assays on the hit-to-lead process for drug discovery. Adapted from Mateus et al. [17].
A common and significant issue in the early phases of drug discovery, particularly in large screening campaigns, is the large number of false positives. This large number can be reduced by analyzing cell-based data instead of biochemical data. However cell-based data do not provide direct information concerning the specific mechanism of action [17]. SAR studies are typically performed using activity obtained from biochemical assays, with quantitative measures of half-maximal inhibitory concentration (IC50), inhibitory constant (Ki), or the percentage of inhibition [18]. The same happens in the particular case of activity (property) landscape studies [19]. Given that sizable cellular diversity exists and there are significant differences in the outcomes of biochemical and cell-based data analyses, the present study uses cell-based inhibition information to characterize the activity landscape of a herein generated dataset of Tub-Mts inhibitors reported in the literature.
The main goal of this work was to explore and describe the activity landscape and scaffold content [19] of a database built and curated herein, with 851 Tub-Mts inhibitors tested on cell-based assays and reported in the literature. To achieve the main goal, we analyzed the chemical space, structural chemical diversity, and scaffold content of the Tub-Mts inhibitors' dataset.

Dataset
We assembled a dataset of 851 compounds tested as Tub-Mts inhibitors and with reported bioactivity in different cancer cell lines . All compounds were retrieved from original and review articles and patents over a period of 15 years (2005-2020). The list of information sources is shown in Table S1 Table S1 in the Supplementary Materials. Duplicate molecules were removed Figure 2. Schematic overview of the differences between biochemical and cell-based assays on the hit-to-lead process for drug discovery. Adapted from Mateus et al. [17].
A common and significant issue in the early phases of drug discovery, particularly in large screening campaigns, is the large number of false positives. This large number can be reduced by analyzing cell-based data instead of biochemical data. However, cell-based data do not provide direct information concerning the specific mechanism of action [17]. SAR studies are typically performed using activity obtained from biochemical assays, with quantitative measures of half-maximal inhibitory concentration (IC 50 ), inhibitory constant (K i ), or the percentage of inhibition [18]. The same happens in the particular case of activity (property) landscape studies [19]. Given that sizable cellular diversity exists and there are significant differences in the outcomes of biochemical and cell-based data analyses, the present study uses cell-based inhibition information to characterize the activity landscape of a herein generated dataset of Tub-Mts inhibitors reported in the literature.
The main goal of this work was to explore and describe the activity landscape and scaffold content [19] of a database built and curated herein, with 851 Tub-Mts inhibitors tested on cell-based assays and reported in the literature. To achieve the main goal, we analyzed the chemical space, structural chemical diversity, and scaffold content of the Tub-Mts inhibitors' dataset.

Dataset
We assembled a dataset of 851 compounds tested as Tub-Mts inhibitors and with reported bioactivity in different cancer cell lines . All compounds were retrieved from original and review articles and patents over a period of 15 years (2005-2020). The list of information sources is shown in Table S1 Table S1 in the Supplementary Materials. Duplicate molecules were removed using Molecular Operating Environment (MOE) software, version 2019 [42].

Chemical Space
Standard 2D chemical features were used to characterize the chemical space. The chemical space analysis focused on six physicochemical properties (PCP) of pharmaceutical relevance: octanol/water partition coefficient (cLog P), molecular weight (MW), topological polar surface area (TPSA), number of rotatable bonds (RB), number of hydrogen bond donors, and number of hydrogen bond acceptors (HBD/HBA). PCP-based clustering using t-distributed stochastic neighbor embedding (t-SNE) [43] was generated with DataWarrior, version 5.2.1 [44].

Activity Landscape Modeling
A structure-activity similarity (SAS) map is a two-dimensional graph suited for SAR analysis of compound datasets tested against a molecular target or a biological outcome. SAS maps are based on the concept of the activity landscape. They are suited for the rapid identification of AC, defined as compounds with a high structural similarity but unexpected large activity differences [45]. SAS maps also enable one to identify scaffold hops (SH), defined as compounds with low structural similarity due to differences in their scaffold but similar biological activity [45].
SAS maps were generated via systematic pairwise comparisons of the 851 compounds tested as Tub-Mts inhibitors. The structural similarity was calculated with the extended connectivity fingerprint 4 (ECFP4) (systematically records the neighborhood of each hydrogen atom at a distance of 4 bonds, for each atom within a molecule) [46] and the Tanimoto coefficient. This was represented on the X-axis to generate the map. The activity difference (pIC 50 differences between each pair of compounds, e.g., if the pIC 50 of compound "A" is 5 and the pIC 50 of compound "B" is 7, the difference in activity of the pair of compounds A-B would be 2) was plotted on the Y-axis. The four major regions in the SAS map were defined using thresholds along the Xand Y-axis. There are several rational approaches to determine the thresholds [47]. In this work, the criteria to select the X-axis threshold was the "mean + two standard deviations" of the similarity values of compounds in the dataset (calculated with Tanimoto and the ECFP4 fingerprint). The threshold of the activity difference (Y-axis) was set to two logarithmic units [48].
The data points in the SAS map were further colored by the corresponding Structure-Activity Landscape Index (SALI) value [47]. This index, as implemented in Activity Landscape Plotter [47], quantifies AC using the equation proposed by Guha and Van Drie: where Ai and Aj are the activities of the ith and the jth molecules, and sim (i,j) is the similarity coefficient between the two molecules (in this work, computed with the ECFP4 fingerprint and the Tanimoto coefficient). Quantitative analysis of the SAS maps was done with Activity Landscape Plotter, a web server freely available at http://132.248.103.152: 3838/ActLSmaps/ (accessed on 23 April 2021) [48].

Scaffold Content Analysis
To study the molecular scaffolds, we used the methodology implemented by Bemis and Murcko [49]. Briefly, the method involves a graph analysis for each compound where a "scaffold" is defined as the union of ring systems and linkers in a molecule, and the side chains are removed (any non-ring, non-linker atoms). This was done with the RDKit Fragments node implemented in KNIME, version 3.7.2 [50]. The chemical structures of the scaffolds are available in Table S2 of the Supplementary Materials.

Constellation Plot
A constellation plot is a graphical representation of chemical space based on networks and coordinates. Each node represents a group of chemical analog series. In other words, a constellation plot reduces the number of points depicted in chemical space representations, while increasing the quality and volume of data legibly represented in a 2D chemical space plot. They aim to group analogs that share a common core, which can be compared with other cores and their corresponding associated molecules [51]. A constellation plot was generated using free Python code published elsewhere [52]. RDKit was used for computing Morgan fingerprints and to handle the chemical structures (http://www.rdkit.org, accessed on 23 April 2021); Scikit-learn was used for computing the t-SNE (scaffold-based clustering). The output file was adapted to be viewed interactively using the DataWarrior software (see the file "out_FinalData.dwar" in the Supplementary Materials). This is the first interactive constellation plot reported so far.

Results and Discussion
As mentioned in the Introduction Section, most of the Tub-Mts system's reported inhibitors are derivatives of four main principal structures, as shown in Figure 1. Thus, to further develop the inhibitors of the Tub-Mts system, it is common to first perform cell-based screening, followed by a screening using a biochemical assay (i.e., tubulin polymerization) of the most active compounds. The new series of compounds lack information about the mechanism of action and the specific binding site in tubulin. Therefore, we employed chemoinformatic approaches to explore the SAR of 851 compounds with reported cell-based inhibitory activity.
Although there are advantages to using data based on cell line inhibition, there are also disadvantages. For example, the bioactivity results reported in the literature depend on the type of test and protocol used. Hence, it is not possible to compare each compound directly against another, and approximations have to be made to explore the SAR. In this scenario, this work aims to fill part of the information gaps left by conventional SAR and QSAR studies.  Figure 3A shows a visual representation of the chemical space of the 851 compounds using t-SNE coordinates based on the six PCP of pharmaceutical interest described in the Materials and Methods Section. The box plot analysis of the common drug-like properties for drug discovery is shown in Figure 3B-G. The figure shows the data summarized using the information presented by the box plots. Such as, in panel B (cLog P), the active compounds (green area) have values of around 2.5 to 4.0, and the inactive compounds (red area) have higher values. This information can be used to generate new knowledge about this kind of inhibitor, e.g., generally, compounds with cLog P values higher than four are inactive. It is important to note that the average PCP values of compounds are different, depending on the tubulin's binding site. For example, the cLog P values of vinca-like inhibitors (e.g., vinblastine derivatives) are higher than those of colchicine-like inhibitors; in other words, vinca derivatives are more lipophilic than colchicine derivatives. This can be deduced from the higher number of aromatic and non-aromatic rings of vinblastine than colchicine (Figure 1).

Activity Landscape Modeling
Following the activity landscape concept described in the Introduction and Methods Sections, we analyzed the SAR of the 851 compounds tested in cell-based assays. Figure 4A,B shows the SAS map and amplification on the AC zone, respectively. Each point represents a pair of compounds that are colored by SALI values, using a color scale from green (low SALI value, i.e., non-AC) to red (high SALI value, i.e., AC). The information helps identify small structural changes in molecules that decrease or increase their bioactivity. Interestingly, 97% of the data points correspond to a series of compounds with low structural similarity: 67% are SH (compounds with low structural similarity and low activity differences or high activity similarity). This result is indicative of the high structural diversity of the compounds in this dataset. Figure 4B,C depicts the chemical structures of representative AC.

Activity Landscape Modeling
Following the activity landscape concept described in the Introduction and Methods Sections, we analyzed the SAR of the 851 compounds tested in cell-based assays. Figure  4A and B shows the SAS map and amplification on the AC zone, respectively. Each point represents a pair of compounds that are colored by SALI values, using a color scale from green (low SALI value, i.e., non-AC) to red (high SALI value, i.e., AC). The information helps identify small structural changes in molecules that decrease or increase their bioactivity. Interestingly, 97% of the data points correspond to a series of compounds with low structural similarity: 67% are SH (compounds with low structural similarity and low activity differences or high activity similarity). This result is indicative of the high structural diversity of the compounds in this dataset. Figure 4B,C depicts the chemical structures of representative AC. We emphasize that the bioactivity data were derived from cell-based assays. Therefore, it involves the affinity of these compounds with the main target (Tub-Mts system) and also with the ability to interact favorably with a biological (cellular) system, e.g., consider the membrane barriers and non-specific bindings (see Introduction Section). In addition to Figures 1 and 4, Figure 5 illustrates other representative compounds and scaffolds with bioactivity reported against the Tub-Mts system. Except for com- We emphasize that the bioactivity data were derived from cell-based assays. Therefore, it involves the affinity of these compounds with the main target (Tub-Mts system) and also with the ability to interact favorably with a biological (cellular) system, e.g., consider the membrane barriers and non-specific bindings (see Introduction Section). In addition to Figures 1, 4 and 5 illustrates other representative compounds and scaffolds with bioactivity reported against the Tub-Mts system. Except for compound 3BB (designed to interact with the laulimalide binding site), all other compounds shown were designed to interact with the colchicine binding site. However, as emphasized above, the precise binding site remains to be elucidated using biophysical assays. pound 3BB (designed to interact with the laulimalide binding site), all other compounds shown were designed to interact with the colchicine binding site. However, as emphasized above, the precise binding site remains to be elucidated using biophysical assays.  Figure 6A shows an overview of the ten most common scaffolds of compounds tested in cell-based assays against the Tub-Mts system. In contrast, Figure 6B shows a landscape with the most bioactive compounds' IDs (and their respective scaffold identifiers). Figure 6B plots the pIC50 Max values (maximum IC50 value reported in any cell line) in relationship with a list of compounds (and its respective scaffolds) represented by each point. The dotted line represents the "cliff path" of activity from one compound (and its respective scaffolds) to another. For example, compound 3D is more bioactive (pIC50 Max = 10.1) than 17ZZ (pIC50 Max = 9.6), and the latter, in turn, is more bioactive than 10III (pIC50 Max = 9.2). In Figure 6B, the compounds (represented by points) are colored by a color scale, from red (lower similarity value to the colchicine scaffold) to blue (higher similarity to the colchicine scaffold).

Scaffold Content Analysis
Interestingly, only S128, S140, and S165 (common scaffolds) are contained in five, four, and four cases, respectively, of the most active compounds. Representative examples of the most active compounds with common scaffolds are 22HH, 4OO, and 20TT (see Figure S1 in the Supplementary Materials which illustrates other active compounds that were not found in the AC or scaffold hops sections). In other words, they are scaffolds (Bemis-Murcko) that are not necessarily contained in the most active compounds. The complete list of scaffolds is in Table S2 in the Supplementary Materials.  Figure 6A shows an overview of the ten most common scaffolds of compounds tested in cell-based assays against the Tub-Mts system. In contrast, Figure 6B shows a landscape with the most bioactive compounds' IDs (and their respective scaffold identifiers). Figure 6B plots the pIC 50 Max values (maximum IC 50 value reported in any cell line) in relationship with a list of compounds (and its respective scaffolds) represented by each point. The dotted line represents the "cliff path" of activity from one compound (and its respective scaffolds) to another. For example, compound 3D is more bioactive (pIC 50 Max = 10.1) than 17ZZ (pIC 50 Max = 9.6), and the latter, in turn, is more bioactive than 10III (pIC 50 Max = 9.2). In Figure 6B, the compounds (represented by points) are colored by a color scale, from red (lower similarity value to the colchicine scaffold) to blue (higher similarity to the colchicine scaffold).

Constellation Plot
Constellation plots were developed as a graphical representation of SAR to integrate coordinate-based chemical space representation with analog series. These plots summarize the scaffolds content of a dataset and show the scaffold diversity and their mutual structural relationships [51]. Since the biological activity data can be mapped into a constellation plot, these two-dimensional representations of the chemical space enable the identification of whole regions in chemical space rich in SAR annotations, either active ("bright" SAR in analogy with chemical space) or inactive ("dark regions"), where few or Interestingly, only S128, S140, and S165 (common scaffolds) are contained in five, four, and four cases, respectively, of the most active compounds. Representative examples of the most active compounds with common scaffolds are 22HH, 4OO, and 20TT (see Figure S1 in the Supplementary Materials which illustrates other active compounds that were not found in the AC or scaffold hops sections). In other words, they are scaffolds (Bemis-Murcko) that are not necessarily contained in the most active compounds. The complete list of scaffolds is in Table S2 in the Supplementary Materials.

Constellation Plot
Constellation plots were developed as a graphical representation of SAR to integrate coordinate-based chemical space representation with analog series. These plots summarize the scaffolds content of a dataset and show the scaffold diversity and their mutual structural relationships [51]. Since the biological activity data can be mapped into a constellation plot, these two-dimensional representations of the chemical space enable the identification of whole regions in chemical space rich in SAR annotations, either active ("bright" SAR in analogy with chemical space) or inactive ("dark regions"), where few or no active molecules have been found. Figure 7 shows the constellation plot of Tub-Mts inhibitors (an interactive version of the plot that can be visualized with DataWarrior and is available in the Supplementary Materials).
rize the 258 Bemis-Murcko scaffolds in Table S2 (Supplementary Materials). Of note, compounds with different Bemis-Murcko scaffolds share a structural fraction (that is not a complete scaffold). This explains why molecules with different scaffolds are contained in the same analog series. An analog series considers the synthetic route to generate the compounds which are based on RECAP (retrosynthetic rules) [53]. In contrast, Bemis-Murcko scaffolds do not consider synthetic rules; these remove each compound's side chains. Additionally, the constellation plots order the analog series using similarity-based coordinates, i.e., analog series with similar chemical structures are closely ordered because they share similar X and Y coordinates in the 2D plots. In contrast, analog series with more different chemical structures remain far apart. As mentioned in the Materials and Methods Section, the similarity and coordinate data are not comparable between Figure 4A,B (calculated similarity between pairs of compounds) and Figure 7 (calculated similarity between analog series).
The analog series ID, SMILES, coordinates (t-SNE coordinates), average activity (average of pIC50), compounds contained per analog series, and standard deviations are included in the file "out_FinalData.dwar" in the Supplementary Materials.  Figure 7 illustrates representative "dark" and "bright" inhibitors of chemical space. Each point in the graph corresponds to a complete analog series and the data points are colored by the average pIC50 of all compounds in that particular analog series. The average activity is colored using a scale from blue (less active) to red (more active). The size From the initial compound database, the chemical structures of 851 inhibitors have been summarized into 142 analog series (illustrated in Figure 7), which further summarize the 258 Bemis-Murcko scaffolds in Table S2 (Supplementary Materials). Of note, compounds with different Bemis-Murcko scaffolds share a structural fraction (that is not a complete scaffold). This explains why molecules with different scaffolds are contained in the same analog series. An analog series considers the synthetic route to generate the compounds which are based on RECAP (retrosynthetic rules) [53]. In contrast, Bemis-Murcko scaffolds do not consider synthetic rules; these remove each compound's side chains. Additionally, the constellation plots order the analog series using similarity-based coordinates, i.e., analog series with similar chemical structures are closely ordered because they share similar X and Y coordinates in the 2D plots. In contrast, analog series with more different chemical structures remain far apart. As mentioned in the Materials and Methods Section, the similarity and coordinate data are not comparable between Figure 4A,B (calculated similarity between pairs of compounds) and Figure 7 (calculated similarity between analog series).
The analog series ID, SMILES, coordinates (t-SNE coordinates), average activity (average of pIC 50 ), compounds contained per analog series, and standard deviations are included in the file "out_FinalData.dwar" in the Supplementary Materials. Figure 7 illustrates representative "dark" and "bright" inhibitors of chemical space. Each point in the graph corresponds to a complete analog series and the data points are colored by the average pIC 50 of all compounds in that particular analog series. The average activity is colored using a scale from blue (less active) to red (more active). The size of each point represents the relative number of molecules contained in the analog series. Additionally, linking lines represent shared molecules between two analog series.
The constellation plot in Figure 7 shows the clear identification of "dark regions" in the SAR in the chemical space of Tub-Mts inhibitors, e.g., the analog series in light blue with low average pIC 50 values: AS16, AS23, AS68, AS92 and AS96. The plot also aids the identification of promising analog series or "emerging bright stars" in the chemical space, e.g., the analog series in green-to-red color with high average pIC 50 values, AS2, AS9, AS43, AS112, AS130, and AS133. Although these analog series have been explored on a limited basis (between two and three compounds, as depicted by the smaller size of the data point in Figure 7) they have a high average activity. Thus, these analog series could be the future of new and potent inhibitors. However, and despite their high average activity, some series (e.g., AS112) could still have limitations such as their difficult synthetic accessibility or poor pharmacokinetic profile. Figure S2 in the Supplementary Materials shows a summary of the major analog series (constellations in chemical space) related to the chemical structures of four principal inhibitors of the Tub-Mts system (illustrated in Figure 1). The analog series (constellations) with labels AS16, AS23, AS68, and AS112 in Figure 7 are analog series of compounds that interact with the vinca binding site. The analog series AS2, AS9, AS43, AS130, and AS133 in Figure 7 interact with the colchicine binding site. Other representative examples are the analog series AS96 and AS107 (Figure 7), which include compounds that interact with the pironetin and paclitaxel binding sites, respectively. Of note, there were no analog series for inhibitors that interact with the laulimalide binding site since none of them complied with the retrosynthesis rules (RECAP) considered in this work, so they are not visualized in the constellation plot.

Conclusions
The present work explored and described the first activity landscape and scaffold content analysis of a newly assembled and curated cell-based database of 851 Tub-Mts inhibitors with reported activity against five cancer cell lines and the Tub-Mts system. The study revealed that the current Tub-Mts inhibitors are a series of compounds with limited molecular and scaffold diversity. It was also concluded that there are differences in the physicochemical profile that depend on the inhibitors' binding site (e.g., against colchicine, vinca, pironetin, paclitaxel, or laulimalide binding site). Cell-based data implicitly contain information that is not possible to analyze via biochemical assays. We propose using this information to generate SAR and QSAR predictive methods to reduce the error rate in biological evaluations of novel inhibitors of the Tub-Mts systems. Additionally, Tub-Mts system's inhibitors were explored using constellation plots; this novel visualization of the SAR of chemical datasets led to the identification of promising analog series with high average pIC 50 values (e.g., AS2, AS9, AS43, AS112, AS130, and AS133); these analog series could be the starting point of new and potent Tub-Mts inhibitors.
Supplementary Materials: The following are available online, Table S1: Compound dataset, Table  S2: List of Bemis-Murcko scaffolds, Figure S1: Active compounds that were not found in the AC or scaffold hops sections, Figure S2: Summary of major analog series related to the chemical structures of four principal inhibitors by the Tub-Mts system, out_FinalData.dwar: Interactive output to be viewed using the DataWarrior software. Funding: We are thankful for the financial support from the NUATEI (Nuevas Alternativas para el Tratamiento de Enfermedades Infecciosas) program IBT-UNAM to purchase an MOE academic license. We also thank School of Chemistry, UNAM for funding with the program PAIP (Programa de Apoyo a la Investigación y Posgrado).

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.