Similarity-Based Virtual Screening to Find Antituberculosis Agents Based on Novel Scaffolds: Design, Syntheses and Pharmacological Assays

A method to identify molecular scaffolds potentially active against the Mycobacterium tuberculosis complex (MTBC) is developed. A set of structurally heterogeneous agents against MTBC was used to obtain a mathematical model based on topological descriptors. This model was statistically validated through a Leave-n-Out test. It successfully discriminated between active or inactive compounds over 86% in database sets. It was also useful to select new potential antituberculosis compounds in external databases. The selection of new substituted pyrimidines, pyrimidones and triazolo[1,5-a]pyrimidines was particularly interesting because these structures could provide new scaffolds in this field. The seven selected candidates were synthesized and six of them showed activity in vitro.


Introduction
Tuberculosis is one of the deadliest infections in the world, killing almost 1.45 million people annually [1,2]. Most of these deaths occur in poor countries, and a more rational distribution of wealth could prevent them. However, the increasing incidence of drugresistant MTBC has caused its resurgence in developed countries and makes therapeutic alternatives necessary [3,4]. Although new compounds potentially active against these bacteria are under active research, very few new treatments have been developed in recent years [5][6][7][8][9]. Therefore, more extensive investigations are needed to find new molecular scaffolds capable of generating new inhibitors of mycobacteria.
Although a few pyrimidines [64], pyrimidones [65] and triazolo[1,5-a]pyrimidines [66] have already been synthetized and tested for anti-TB activity, their substitution patterns and synthesis methods are different from the compounds described here.
The identification of new targets requires knowledge of the specific biochemical pathways of mycobacteria, but many metabolic processes are still unknown and the structure-based design of new anti-TB agents is a complex task [67].
Although a few pyrimidines [64], pyrimidones [65] and triazolo[1,5-a]pyrimidines [66] have already been synthetized and tested for anti-TB activity, their substitution patterns and synthesis methods are different from the compounds described here.
The identification of new targets requires knowledge of the specific biochemical pathways of mycobacteria, but many metabolic processes are still unknown and the structurebased design of new anti-TB agents is a complex task [67].
The aim of this study was to develop new discriminant models based on the molecular structures of antituberculosis agents, shown in Table S1 [96][97][98] and inactive substances, shown in Table S2, characterized by topological indices [99] (Table 1), in order to screen structural databases to identify new potentially useful scaffolds against MTBC.
Randić-like indices of order k and type path (p), cluster (c) and path-cluster (pc) δ i , number of bonds, σ or π, of the atom i to non-hydrogen atoms. S j , jth sub-structure of order k and type t. [71,81,82] Kier-Hall indices of order k and type path (p), cluster (c) and path-cluster (pc) Kier-Hall valence of the atom i. S j, jth sub-structure of order k and type t. [71,81,82] G k k = 1-5 Topological charge indices of order k Valence topological charge indices of order k product of the electronegativity-modified adjacency and inverse squared distance matrices for the hydrogen-depleted molecular graph. D, distance matrix. δ, Kronecker delta. [68,101] J k k = 1-5 Pondered topological charge indices of order k [68,101] k D t k = 0-4 t = p, c, pc Connectivity differences of order k and type path (p), cluster (c) and path-cluster (pc)

Symbol
Name Definition Reference Connectivity quotients of order k and type path (p), cluster (c) and path-cluster (pc) Inverse connectivity quotients of order k and type path (p), cluster (c) and path-cluster (pc) 2. Results and Discussion 2.1. Antituberculosis Activity Modelling Figure 1 illustrates the proposed virtual screening procedure. In this method, thresholds on descriptor values were combined with Linear Discriminant Analysis (LDA) to give a qualitative discriminant model. Table S3 in the Supplementary Materials collects all these values.
Given a population, for example of molecules, that can be classified into several groups according to their experimental properties, for example a group of molecules with a pharmacological activity and another without this activity, LDA is a method to find linear combinations of independent variables (for example the aforementioned structural invariants) whose numerical values can be used to distinguish between these different categories. When two categories are defined, the classification is done by the so-called discriminant function (DF). Int. J. Mol. Sci. 2022, 23, x FOR PEER REVIEW 5 of 24 Given a population, for example of molecules, that can be classified into several groups according to their experimental properties, for example a group of molecules with a pharmacological activity and another without this activity, LDA is a method to find linear combinations of independent variables (for example the aforementioned structural invariants) whose numerical values can be used to distinguish between these different categories. When two categories are defined, the classification is done by the so-called discriminant function (DF).
The following equation DF was obtained by stepwise LDA using the set shown in Table 2:  The following equation DF was obtained by stepwise LDA using the set shown in 470 NC a Prob., probability; b When probability active or probability inactive > 0.60, the compound is classified as active "+" or inactive "−", respectively. In any other case, the compound is considered non-classified "NC".
The topological descriptors selected in this equation were: the charge indices (J 1 , J 1 v , J 3 v ); the difference index ( 1 D = 1 χ − 1 χ v ); and the quotient index ( 4 C c = 4 χ c / 4 χ c v ), where m χ t and m χ t v are, respectively, single and valence Randić-Kier-Hall indices of order m and type t. Topological charge-transfer indices, J 1 , J 1 v and J 3 v , are measures of the contribution of molecular topological structure to the charge transfer at topological distance 1 and 3, respectively [68,101]. Difference and quotient indices are related to charge distributions within molecular fragments [68]. Thus, 1 D is the net contribution of the heteroatoms to the electronic charge within fragments of order 1 (bonds). The 4 C c index can be related to electron densities of cluster-type fragments of order 4 (three atoms bonded to a central one) in which there is at least one heteroatom. The maximum correlation between pairs of these selected variables, for the group of actives, was as weak as 0.424, corresponding to the pair J 1 v and J 3 v . Thus, the variables can be considered as independent. Table S4 shows these intercorrelations. Table 2 shows the results of the classification for each one of the compounds included in the LDA.
The linear equation gave good results since most compounds were classified with a probability over 86% (Table 2). When the probabilities are between 40% and 60%, the compounds were counted as non-classified (NC), and finally below 40%, they were considered as inactive. Under this framework, the error percentage in the active set was about 9%, whereas in the inactive was 0%. These probabilities are the so-called posterior probabilities, computed by the Bayes rule as the probability of classifying a case (molecule) conditioned to the model obtained. Let π k the prior probability: π k = N o f cases in class k Total N o f cases . The posterior probability P is given by: is the class-conditional density of the case x in the class k. Assuming that this density for x, given every class k, follows a normal distribution, the density formula for a multivariate Gaussian distribution is applicable.
where x and the mean µ k are both column vectors, Cov is the covariance matrix and p is its dimension. The denominator involves the square root of the determinant of this matrix. The result of the matrix multiplications in the numerator is a scalar number.
The results of the internal validation are illustrated in Table 3, with the percentage of correct classification within each group. Five runs were performed. A number of compounds ranging from 9 to 16 were randomly extracted from the training to a test set.
Wilks λ values are shown for each equation. The lower the Wilks λ value, the better the discrimination. Correct classification percentages are shown for training and test sets, for active and inactive compounds. The number of compounds classified as active (+) or inactive (−) appears in parentheses, where (a/b) = number of (+) compounds/number of (−) compounds. Average values are also shown, as well as the performance of DF function. (a/b) = number of (+) compounds/number of (−) compounds.
The results for the training and test groups are within the same range. The mean percentage of success obtained with the training group for DF (Table 3) was 87% for active (+) and 94% for inactive (−). For the test it was 80% and 92%, respectively. The results were similar to those obtained with the DF equation, which points out the validity of the LDA equation.
The results of the external validation test are shown in Table 4. As can be seen, for the active set there was a misclassified compound, namely ethionamide. The same is found in the opposite group, where glucosamine was misclassified. When probability active or probability inactive > 0.60, the compound is classified as active "+" or inactive "−", respectively. In any other case, the compound is considerate non-classified "NC". Figure 2 shows the PDD obtained from DF. As can be inferred from Figure 2, the optimal range of DF to find active compounds can be established between 0 and 4.5. similar to those obtained with the DF equation, which points out the validity of the LDA equation.
The results of the external validation test are shown in Table 4. As can be seen, for the active set there was a misclassified compound, namely ethionamide. The same is found in the opposite group, where glucosamine was misclassified. Figure 2 shows the PDD obtained from DF. As can be inferred from Figure 2, the optimal range of DF to find active compounds can be established between 0 and 4.5. 0.014 + a Prob., probability; b When probability active or probability inactive > 0.60, the compound is classified as active "+" or inactive "−", respectively. In any other case, the compound is considerate nonclassified "NC".

Similarity-Based Virtual Screening
A virtual library containing new pyrimidine derivatives generated by combining the chemical scaffolds and substituents depicted in Table 5 was screened as described in Section 3.2. After building the database containing the substituted pyrimidines, the virtual screening was performed. Descriptors calculated for the entire database are available to the readers upon request to the corresponding author. Based on these parameters, a virtual screening was carried out so that those molecular structures with the values of the descriptors and DFs within the thresholds, shown in the Table S3, were selected as candidates. Table S5, under Supplementary Materials, shows the calculated descriptors for the

Similarity-Based Virtual Screening
A virtual library containing new pyrimidine derivatives generated by combining the chemical scaffolds and substituents depicted in Table 5 was screened as described in Section 3.2. After building the database containing the substituted pyrimidines, the virtual screening was performed. Descriptors calculated for the entire database are available to the readers upon request to the corresponding author. Based on these parameters, a virtual screening was carried out so that those molecular structures with the values of the descriptors and DFs within the thresholds, shown in the Table S3, were selected as candidates. Table S5, under Supplementary Materials, shows the calculated descriptors for the selected compounds. Seven potentially active molecules were selected. These structures are presented in Figure 3. These compounds selected as possible candidates were synthesized and tested in vitro.  selected compounds. Seven potentially active molecules were selected. These structures are presented in Figure 3. These compounds selected as possible candidates were synthesized and tested in vitro. selected compounds. Seven potentially active molecules were selected. These structures are presented in Figure 3. These compounds selected as possible candidates were synthesized and tested in vitro.
Finally, deprotection of 6b in acidic conditions afforded 7 in 77% yield (Scheme 3). Synthesis of pyrimidine derivatives. Two of us reported on the synthesis of novel 4-alkoxypyrimidines starting from 2-alkylsulfanylpyrimidinones of type 8 [103,104]. The method is based on a selective O-alkylation reaction with bulky aliphatic alcohols using the Mitsunobu conditions (method A, Scheme 4) or in basic medium with sterically demanding agents like α-haloketones (method B, Scheme 4).
Oxidation of the thioether moiety to the corresponding sulfone 10 using m-CPBA and nucleophilic displacement by different nucleophiles produced the corresponding highly molecular diverse pyrimidines of type 11. In addition, when 4-isopropoxipyrimidine 11a was treated with a 1:1 mixture of H2SO4/AcOH at 90 °C for 15 min, the selective cleavage of the 4-isopropoxy group took place yielding 2-aryloxypyrimidinone 12 (Scheme 5).
Finally, deprotection of 6b in acidic conditions afforded 7 in 77% yield (Scheme 3). Synthesis of pyrimidine derivatives. Two of us reported on the synthesis of novel 4-alkoxypyrimidines starting from 2-alkylsulfanylpyrimidinones of type 8 [103,104]. The method is based on a selective O-alkylation reaction with bulky aliphatic alcohols using the Mitsunobu conditions (method A, Scheme 4) or in basic medium with sterically demanding agents like α-haloketones (method B, Scheme 4).
Oxidation of the thioether moiety to the corresponding sulfone 10 using m-CPBA and nucleophilic displacement by different nucleophiles produced the corresponding highly molecular diverse pyrimidines of type 11. In addition, when 4-isopropoxipyrimidine 11a was treated with a 1:1 mixture of H2SO4/AcOH at 90 °C for 15 min, the selective cleavage of the 4-isopropoxy group took place yielding 2-aryloxypyrimidinone 12 (Scheme 5).
On the other hand, the reaction of the deprotected amine 9e with phenylboronic acid and glyoxylic acid (Petasis reaction) [105,106] gave the desired α-phenyl glycine derivative 13 in moderated yield (Scheme 6).    Oxidation of the thioether moiety to the corresponding sulfone 10 using m-CPBA and nucleophilic displacement by different nucleophiles produced the corresponding highly molecular diverse pyrimidines of type 11. In addition, when 4-isopropoxipyrimidine 11a was treated with a 1:1 mixture of H 2 SO 4 /AcOH at 90 • C for 15 min, the selective cleavage of the 4-isopropoxy group took place yielding 2-aryloxypyrimidinone 12 (Scheme 5).  On the other hand, the reaction of the deprotected amine 9e with phenylboronic acid and glyoxylic acid (Petasis reaction) [105,106] gave the desired α-phenyl glycine derivative 13 in moderated yield (Scheme 6). Finally, we prepared the compound 14 by reducing the carbonyl group of compound 9d with NaBH4 in MeOH. This reduction afforded the hydroxy derivative 14 in 80% isolated yield (Scheme 7). Microbiological study. To check the predicted antituberculosis activity of the selected candidates, microbiological tests were performed. The result of in vitro susceptibility test of conventional drugs is shown in Table 6, and the experimental results of the selected compounds are illustrated in Table 7. Two compounds, 7 and 12, showed MIC50 = MIC90 = 32 mg/L, corresponding to 81.5 μM and 121.1 μM, respectively; four compounds, 6a, 9c, 11b and 13, showed MIC90 = 64 mg/L against M. tuberculosis, which corresponds to 146, 167.8, 160.1, 131.8 μM, respectively; and 14 was inactive at the assayed concentrations. The compound 11b showed the best MIC50 = 80 μM, but its MIC90 raised to 160,1 μM. Thus, the results pointed out that 7 was the most active agent, showing molar MIC90 four times lower than Ethambutol but within the same magnitude order.  Finally, we prepared the compound 14 by reducing the carbonyl group of compound 9d with NaBH4 in MeOH. This reduction afforded the hydroxy derivative 14 in 80% iso lated yield (Scheme 7). Microbiological study. To check the predicted antituberculosis activity of the se lected candidates, microbiological tests were performed. The result of in vitro susceptibil ity test of conventional drugs is shown in Table 6, and the experimental results of the selected compounds are illustrated in Table 7. Two compounds, 7 and 12, showed MIC5 = MIC90 = 32 mg/L, corresponding to 81.5 μM and 121.1 μM, respectively; four compounds 6a, 9c, 11b and 13, showed MIC90 = 64 mg/L against M. tuberculosis, which corresponds to 146, 167.8, 160.1, 131.8 μM, respectively; and 14 was inactive at the assayed concentrations The compound 11b showed the best MIC50 = 80 μM, but its MIC90 raised to 160,1 μM. Thus the results pointed out that 7 was the most active agent, showing molar MIC90 four times lower than Ethambutol but within the same magnitude order.  Microbiological study. To check the predicted antituberculosis activity of the selected candidates, microbiological tests were performed. The result of in vitro susceptibility test of conventional drugs is shown in Table 6, and the experimental results of the selected compounds are illustrated in Table 7. Two compounds, 7 and 12, showed MIC 50 = MIC 90 = 32 mg/L, corresponding to 81.5 µM and 121.1 µM, respectively; four compounds, 6a, 9c, 11b and 13, showed MIC 90 = 64 mg/L against M. tuberculosis, which corresponds to 146, 167.8, 160.1, 131.8 µM, respectively; and 14 was inactive at the assayed concentrations. The compound 11b showed the best MIC 50 = 80 µM, but its MIC 90 raised to 160.1 µM. Thus, the results pointed out that 7 was the most active agent, showing molar MIC 90 four times lower than Ethambutol but within the same magnitude order.

Antituberculosis Activity Modelling
A group of 32 compounds with known activity against MTBC were compiled from various sources [96][97][98]. Their structures are shown in Table S1 in the Supplementary Material. A set of 45 compounds with a different pharmacological activity was also used as inactive group. Their structures are shown in Table S2 in the Supplementary Material. The compounds were characterized by a set of 90 descriptors calculated using the DesMol program [99]. The descriptors used with their symbols, definitions and references are shown in Table 1. These were used to build a model capable of discriminating between active and inactive antituberculous compounds.
DFs were calculated with randomly selected subsets of 25 out of 32 active and 35 out of 45 inactive compounds by using BMDP New System [107]. Descriptor selection was based on the Fisher-Snedecor F parameter. The variables were introduced step by step in the DF: in each step, the variable that added the most to the separation of the groups was entered in the equation, or the variable that contributed the least to improving said separation was eliminated. The classification criterion was the shortest Mahalanobis distance. The discriminant ability of the DF was evaluated by two parameters, Wilks λ, and the percentage of correct classification in each group. The independent variables in this study were the calculated structural invariants, and the discrimination property was the presence of activity against MTBC.
The validation of the selected DF was performed by two methods. One internal leaven-out test in which the program randomly chose and pulled out a ratio of compounds, and used them to evaluate the DF obtained with the rest; and another external test with a previously unused data set.
To choose the optimal ranges of values for this equation, the corresponding Pharmacological Distribution Diagram, PDD [108], was obtained. These diagrams are useful to determine the intervals of the equation in which the probability of finding new candidates is maximum. This is a graph similar to a histogram in which expectancies appear on the ordinate axis. For an arbitrary range of values of a given function, the activity expectancy is E a = a/(i + 100), where a is the percentage of active compounds in the range and i is the corresponding percentage of inactive within the same range. The expectancy of inactivity is similarly defined as E i = i/(a + 100). This plot provides a good visualization of the regions of minimal overlap between the active and inactive compounds and helps to select the optimal interval of the DF.
The 90 descriptors and the selected discriminant function were used as filters to select potential candidates in structural databases. For this, the maximum and minimum values of each descriptor were established as thresholds for the 90 variables, while for DF, the optimal interval was taken. Compounds exhibiting all 90 values within the thresholds and DF values within the optimal intervals were considered as candidates.

Similarity-Based Virtual Screening
A virtual library containing 320 structures of substituted pyrimidines was generated by combining the chemical scaffolds and substituents depicted in Table 5, using in-house software. The set of 90 descriptors and DFs were calculated for this library. A virtual screening was carried out based on these parameters to select as candidates those molecular structures whose values for the descriptors and DFs were within the thresholds.

Chemical Methods
DMF was dried over activated molecular sieves (4 Å). THF was dried over Na/ benzophenone prior use. All the other commercially available chemicals were used as purchased without further purification. Reactions involving magnesium acetylides and synthesis of triazolo[1,5-a]pyrimidines were run under a dry Ar atmosphere. Melting points (capillary tube) were measured with an electrothermal digital melting point apparatus IA 91,000 and are uncorrected. IR spectra were recorded on a Mattson-Galaxy Satellite FT-IR. Additionally, 1 H and 13 C NMR spectra were recorded at 200 and 50 MHz, respectively, on a Brucker DPX200 Advance instrument with TMS as internal standard. MS spectra were recorded on a VG Quattro instrument in the positive ionization FAB mode, using 3-NBA or 1-thioglycerol as the matrix or in a Thermo Quest 2000 series apparatus for the EI (70 eV) mode. Analytical TLC was performed on precoated TLC plates, silica gel 60 F 254 (Merck). Flash-chromatography (FC) purifications were performed on silica gel 60 (230-400 mesh, Merck).
To a cooled (0 • C) solution of the corresponding alkyne 1a-b in dry THF (2 mL/mmol), i-PrMgCl (2 M solution in THF) was added dropwise under Ar. The mixture was stirred at that temperature for 4 h. Then, a solution of the piperonal 2 (1.3 equiv) in dry THF (1 mL/mmol) was slowly added dropwise over a period of 15 min. The reaction mixture, under Ar, was stirred from 0 • C to r.t. until total consumption of 1a-b (5-12 h., monitored by TLC). The reaction was quenched with saturated solution of NH 4 Cl (3 mL/mmol) at r.t. and the organic solvent was eliminated under reduced pressure. The aqueous layer was extracted with AcOEt (3 × 3 mL/mmol) and the combined organic layers were dried over MgSO 4 , the solvent was evaporated and the resulting residue purified by flash-chromatography (n-hexane:AcOEt).
Over a cooled (0 • C), mechanically stirred suspension of MnO 2 (5 equiv) in CH 2 Cl 2 (3 mL/mmol), a solution of the propargyl alcohol 3a or of the reaction mixture 3b in CH 2 Cl 2 (3 mL/mmol) was added dropwise. The reaction mixture was stirred from 0 • C to r.t. until total consumption of 3a-b (3-12 h., monitored by TLC) and then filtered through a Celite ® pad. The solvents were removed under reduced pressure and the resulting residue was purified by flash-chromatography (n-hexane:AcOEt).

Synthesis of triazolo[1,5-a]pyrimidines 6a-b. General procedure.
A mixture of 3-amino-5-benzylsulfanyl-1,2,4-triazole 5, DBU and anhydrous MgSO 4 (5 g/g) in dry DMF (3 mL/mmol) was heated at 40 • C under Ar for 30 min. Then, a solution of the corresponding α-acetylenic ketone 4a-b in dry DMF (3 mL/mmol) was slowly added using a syringe pump over a period of 5h. The reaction mixture, under Ar, was stirred at 40 • C until total consumption of 5 (1-2 h., monitored by TLC). The reaction was filtered and the organic solvent was eliminated under reduced pressure. The residue was dissolved with CH 2 Cl 2 (15 mL/mmol) and was washed with saturated solution of NH 4 Cl (3 × 3 mL/mmol). The organic layer was dried over MgSO 4 , the solvent was evaporated and the resulting residue was purified by flash-chromatography (n-hexane:AcOEt).

2-(2-Benzylsulfanyl-pyrimidin-4-yloxy)-1-(4-chloro-phenyl)-ethanone (9b).
According to the general procedure described above, the reaction between 8a (600 mg, 2.8 mmol), (4-chloro-phenyl)-acetyl bromide (850 mg, 3.6 mmol) and TMG (0.45 mL, 3.6 mmol) in dry DMF (8 mL the opinion of the authors, this approach is not able to find new action mechanisms, but it is possible to obtain new unexpected molecular structures acting through the known mechanisms which combine the properties of the known compounds. This feature could be useful in order to avoid problems of resistance.