Molecular Topology for the Search of New Anti-MRSA Compounds

The variability of methicillin-resistant Staphylococcus aureus (MRSA), its rapid adaptive response against environmental changes, and its continued acquisition of antibiotic resistance determinants have made it commonplace in hospitals, where it causes the problem of multidrug resistance. In this study, we used molecular topology to develop several discriminant equations capable of classifying compounds according to their anti-MRSA activity. Topological indices were used as structural descriptors and their relationship with anti-MRSA activity was determined by applying linear discriminant analysis (LDA) on a group of quinolones and quinolone-like compounds. Four extra equations were constructed, named DFMRSA1, DFMRSA2, DFMRSA3 and DFMRSA4 (DFMRSA was built in a previous study), all with good statistical parameters, such as Fisher–Snedecor F (>68 in all cases), Wilk’s lambda (<0.13 in all cases), and percentage of correct classification (>94% in all cases), which allows a reliable extrapolation prediction of antibacterial activity in any organic compound. The results obtained clearly reveal the high efficiency of combining molecular topology with LDA for the prediction of anti-MRSA activity.


Introduction
Staphylococcus aureus is a Gram-positive bacterium that causes important infections in humans. The first strain of methicillin-resistant Staphylococcus aureus (MRSA) was identified in 1960, almost immediately after the introduction of methicillin in therapeutics [1,2].
Antibiotic resistance occurs when the bacteria that cause the infection survive after exposure to a drug that, under normal conditions, would kill or inhibit its growth [3]. As a result, these surviving strains multiply and spread, due to the lack of competition from other strains sensitive to the same drug. This has led to the emergence of what we call "superbugs", such as MRSA, which are difficult to treat with available antibiotics [4].
MRSA is an important cause of infections, both of community and hospital origin, and represents a major clinical and public health problem due to the limited treatment options (partly due to its worldwide spread), well-established resistance to vancomycin (formerly the antibiotic of choice for MRSA infections), and high number of therapeutic failures [5,6]. In fact, the World Health Organization (WHO) has placed MRSA as a priority 2 (high-priority) pathogen on its list of priority pathogens for the development of new antibiotics [7]. Furthermore, according to the latest antibacterial resistance report developed by the European Center for Disease Prevention and Control (ECDC), MRSA Int. J. Mol. Sci. 2021, 22, 5823 2 of 12 strains are present in virtually the entire European continent, being one of the most common causes of nosocomial infections [8].
There are several factors that contribute to the success of this bacterium as a pathogen, among which are its ability to persist as a commensal, its frequent resistance to multiple antibacterials, and its variety of virulence determinants [9]. These bacteria have a great capacity to survive in adverse environments, which, after entering the hospital environment through patients, visitors, and/or healthcare workers, spread to other patients, mainly through the hands of healthcare personnel [10]. However, although initially related to the hospital setting, MRSA infections are common at the community level as well [11].
Its morbidity is variable and depends on factors specific to the host, the type of infection, and the precociousness of treatment. Although most of these infections affect the skin and soft tissues, these organisms are also capable of causing devastating diseases in certain patients. These infections include necrotizing fasciitis, septic thrombophlebitis of the extremities, Waterhouse-Frederickson syndrome, and rapidly progressive pneumonia [12].
Hence, MRSA can be regarded as a serious health problem worldwide. The current interest in the study of this pathogen is due to its high frequency and because it represents one of the main causes of nosocomial infection outbreaks around the world. The growing increase in resistant strains, which is now spreading faster in comparison to the development of new molecules, makes it necessary to investigate new antibacterial agents to expand the current therapeutic arsenal. In addition, WHO has repeatedly noted that current investment in the development of new antimicrobial compounds is insufficient [13].
Empirically establishing rules or filters for drugs, such as Lipinski's rule of five, will form the knowledge base to produce libraries tailored to drug discovery, which will require high-throughput identification of novel compounds within a large background of known substances [14]. This could be achieved by means of a rational drug design approach such as Quantitative Structure-Activity Relationships (QSAR). Within this field is molecular connectivity or molecular topology (MT), an effective and low-cost method capable of predicting molecular properties in new compounds, without the need to obtain or synthesize them previously [15].
By combining MT with pattern recognition techniques such as linear discriminant analysis (LDA) [16], neural networks [17], multilinear regression [18], factor analysis [19], or principal component analysis [20], and appropriately selecting the molecular descriptors to use, we can build mathematical-topological equations able to identify almost any molecular property. This makes MT a powerful tool for the search and design of new compounds with antibacterial activities, but it is not the only one in the field. In fact, a variety of linear and nonlinear statistical methods are used to develop models based on 2D or 3D representations of molecules [21]; 1D-3D-QSAR methods pose a series of limitations that have led to the development of higher dimensional QSAR models (4D-7D). Multi-dimensional models, although technically more complex, have been developed with the objective of finding the true binding mode [22].
In this context, this study aims to obtain mathematical-topological equations capable of predicting anti-MRSA activity. By combining MT and LDA, we use topological indices (TI) [23] to classify a compound as anti-MRSA or non-anti-MRSA. To do this, we simply select a group of compounds with antibacterial activity and another one lacking it.
We have selected structures from quinolones and quinolone-like compounds to build these equations (see Figure 1). Quinolones are a well-known and extensive group that will allow us to collect numerous data, leading to greater precision of the predictive equations [24]. In addition, their pharmacokinetic profile, as well as their high antibacterial activity and broad spectrum of action, makes them very versatile [25]. Furthermore, compared with the resistance levels of other classes of antibacterial, those of quinolones are relatively low [26].
with the resistance levels of other classes of antibacterial, those of quinolones are relatively low [26]. The discriminant functions obtained can be applied to quinolones and quinolone-like compounds not used in the study as well as in molecules that have no structural relationship, since they will select those that have a similar mathematical-topological relationship [27]. Since topology is the part of the mathematical analysis that studies the positions and interconnections of elements within a set, when applied to molecules, it gives rise to the discipline called MT, which analyzes the positions and interconnections of atoms within a molecule, giving structural information regarding length, ramifications, connection between atoms, shape, instaurations, etc.-in short, to the topological assembly or connectivity of the molecule [28]. Structurally related compounds will usually have similar values for their topological indices, but this can also occur in compounds with non-related structures.

Results
To obtain the discriminant functions (DF), we used the data from a previous study [34], in which the statistical program BMDP randomly formed two training groups with 26 active and 30 inactive compounds and two test groups with seven active and six inactive compounds. These test groups allowed evaluation of the quality of the selected functions.
The DFs formed in this study along with their statistical parameters are shown in Equations  The discriminant functions obtained can be applied to quinolones and quinolonelike compounds not used in the study as well as in molecules that have no structural relationship, since they will select those that have a similar mathematical-topological relationship [27]. Since topology is the part of the mathematical analysis that studies the positions and interconnections of elements within a set, when applied to molecules, it gives rise to the discipline called MT, which analyzes the positions and interconnections of atoms within a molecule, giving structural information regarding length, ramifications, connection between atoms, shape, instaurations, etc.-in short, to the topological assembly or connectivity of the molecule [28]. Structurally related compounds will usually have similar values for their topological indices, but this can also occur in compounds with non-related structures.

Results
To obtain the discriminant functions (DF), we used the data from a previous study [34], in which the statistical program BMDP randomly formed two training groups with 26 active and 30 inactive compounds and two test groups with seven active and six inactive compounds. These test groups allowed evaluation of the quality of the selected functions.
The DFs formed in this study along with their statistical parameters are shown in Equations (1)   The classification criterion was determined by the value of the DF: if the value of the equation for a given compound was equal to or greater than 0, such a compound was classified as active, whereas if the value of the equation for a compound was smaller than 0, such a compound was classified as inactive.

Discussion
All equations have a low value of λ, indicating that there is a low linear dependence between independent variables. Furthermore, the high value of F in the equations indicates that the selected independent variables contribute largely to the separation of the active and inactive groups. Moreover, all equations correctly classify each compound with its corresponding group with very high success rates (100% in most cases).
We obtained the first three functions using combinations of different types of indices: DF MRSA1 used the electrotopological state (S i ) [35]; DF MRSA2 used electrotopological and charge indices [36]; while DF MRSA3 used all topological indices excluding the connectivity indices [37]. We obtained DF MRSA4 using all 136 topological indices. DF MRSA1 involves four electrotopological indices (S-CH3 , S =C< , S aSa and S Cl ). Its value is influenced positively by the presence of sp 2 carbons (S =C< ) and chlorine atoms (S Cl ). However, the presence of methyl groups (S-CH3 ) and sulfur atoms in aromatic rings (S aSa ) have a negative influence on the DF value. It should be noted that, in the latter case, some of the training and test inactive compounds possess it, while all active compounds lack this functional group in their structure (the structure of all compounds as well as bibliographic references about their activity can be found in Tables S1 and S2). DF MRSA2 involves one valence charge index ( 3 J V ) and three electrotopological indices (S =C< , S >CH-and S Cl ). In this case, the presence of sp 2 carbons (S =C< ) and chlorine atoms (S Cl ) increases its value, while the presence of sp 3 carbons (S >CH-), and sp 2 oxygens (S =O ) decreases it. Regarding the charge index, the value of the DF is influenced negatively by the topological charge present in third-order sub-pseudographs ( 3 J V ). This type of index describes the distribution of the global charge in the molecule through the evaluation of charge transfer between pairs of atoms. Moreover, since it is also a valence index, it also considers heteroatoms and multiple bonds. DF MRSA3 involves two electrotopological indices (S >N-and S Cl ), one charge index ( 3 J) and one geometric index (PR2). In this case, the value of the equation increases due to the presence of chlorine atoms (S Cl ) and the geometric index PR2. Geometric indices are related to the shape and molecular surface [38]. The PR2 index in particular counts the number of pairs of branches (understanding as branches the points that contain three or more vertices) separated by two axes. This means that the presence of ramifications favors the anti-MRSA activity. On the other hand, the value of the equation is negatively influenced by the presence of tertiary amine groups (S >N-) and very negatively by the topological charge in third-order sub-graphs ( 3 J).
DF MRSA4 involves one connectivity index ( 3 χ ch ), two electrotopological indices (S Cl and S =CH-), and one charge index ( 3 J). The equation also shows a clear dependence of the activity relative to the Kier-Hall chain-type third-order index ( 3 χ ch ), which implies that the presence of a cyclopropyl group greatly enhances the activity against MRSA. Most of the active compounds have this group. On the contrary, the inactive compounds generally lack this group (see Tables S1 and S2). This group has been proven to have a positive influence on the activity of a large number and variety of drugs [39]. The other index that has a positive influence on the value of the equation is S Cl , meaning that the presence of chlorine atoms favors the anti-MRSA activity, while the presence of sp 2 carbons (S =CH-) is detrimental for such activity. The topological charge in third-order sub-graphs ( 3 J) has also a negative influence on the anti-MRSA activity.
After analyzing all the equations, we found that the electrotopological index S Cl was common to all of them, with a positive influence in all cases. This indicates that the presence of this atom favors the anti-MRSA activity, as we concluded in one of our previous studies [34]. Foroumadi et al. [40] experimentally demonstrated the importance of this atom by substituting fluorine or hydrogen for chlorine in positions 6 and/or 8 from the double aromatic ring within the 4-quinolone structure. This change drastically increased the in vitro activity against two clinically isolated strains of MRSA and other bacterial species in numerous analog compounds.
We plotted the corresponding pharmacological distribution diagrams (PDD) for every function in order to visualize the values of the function in which the probability of classifying a compound as active or inactive is maximum-in other words, to find areas where the overlap between the two groups of compounds is minimal. The PDDs obtained for the DFs built along with the highest activity range for each function are shown in  Compounds with values below the range were considered inactive while compounds with values over the range were considered unclassified. Thus, the value ranges derived from these PDDs establish the applicability domain for each of them [41].     When PDDs are used, the accuracy for active compounds decreases, whereas that for inactive ones increases, so the probability of selecting a false active compound after applying the PDD filter decreases. The average percentage for the four equations of correctly classified inactive compounds is 100% for both the training and test group, and the average percentage of accurately classified active compounds is 97.4% for the training group and 100% for the test group. Table 5 and Table 6 summarize the classification of the results obtained for all functions selected for both training groups, the active group and inactive group, respectively, and Table 7 summarizes the results for the test groups. As can be inferred from the tables, the training and test groups exhibit an average overall accuracy of 99.1%.  When PDDs are used, the accuracy for active compounds decreases, whereas that for inactive ones increases, so the probability of selecting a false active compound after applying the PDD filter decreases. The average percentage for the four equations of correctly classified inactive compounds is 100% for both the training and test group, and the average percentage of accurately classified active compounds is 97.4% for the training group and 100% for the test group. Table 5 and Table 6 summarize the classification of the results obtained for all functions selected for both training groups, the active group and inactive group, respectively, and Table 7 summarizes the results for the test groups. As can be inferred from the tables, the training and test groups exhibit an average overall accuracy of 99.1%. When PDDs are used, the accuracy for active compounds decreases, whereas that for inactive ones increases, so the probability of selecting a false active compound after applying the PDD filter decreases. The average percentage for the four equations of correctly classified inactive compounds is 100% for both the training and test group, and the average percentage of accurately classified active compounds is 97.4% for the training group and 100% for the test group. Tables 5 and 6 summarize the classification of the results obtained for all functions selected for both training groups, the active group and inactive group, respectively, and Table 7 summarizes the results for the test groups. As can be inferred from the tables, the training and test groups exhibit an average overall accuracy of 99.1%.

Compound Selection
We collected in vitro activity information on quinolones and quinolone-like compounds using search engines such as ISI Web of Science, Medline, and SciFinder (Caplus). We only considered as valid those activity data from in vitro tests conducted according to the criteria of the CLSI [42].
We finally selected in vitro activity data of 56 quinolones and structurally related compounds ( Figure 1) against MRSA, which we classified into two groups, active (26 compounds) and inactive (30 compounds), all of which had a 4-quinolone or a closely related structure (see Tables S1 and S2).
To consider a compound as active, it should have a minimum inhibitory concentration (MIC) equal to or under 1 mg/mL against MRSA, while those compounds considered inactive should have an MIC equal to or over 16 mg/mL against MRSA. Compounds with MICs between 1 and 16 mg/mL against MRSA were not included in the study. Regarding stereoisomers, if any of them were active, they were included in the active group. If all individual stereoisomers or the mixture of them in any ratio were inactive, they were included as a single graph in the inactive group.

Topological Descriptors
We calculated 301 topological indices for all 56 molecules. Those indices with values of 0 or with identical values for all compounds were removed. Finally, all compounds were characterized by a total of 136 non-redundant, significant descriptors specific to each molecule. These descriptors do not contain 3D parameters. The description of all the molecular descriptors can be found in Table S3 along with their definitions and references. Using MOLCONN-Z [43] and DESMOL13 [44] programs, we computed the adjacency topological matrix obtained from the hydrogen-depleted chemical pseudographs, previously drawn with the ChemBioDraw Ultra 12.0 molecule-editing program of the ChemBioOffice 2010 package.

Linear Discriminant Analysis (LDA)
LDA is a pattern recognition method that allows the classification of a compound in a given group or category (e.g., active and inactive) based on a combination of variables (e.g., topological indices). Based on the Fisher-Snedecor parameter F, which relates the variance explained by the equation to the residual variance, we chose the variables used to compute the linear classification functions in a stepwise manner. At each step, the variable with the greater value of F (i.e., the variable that causes the larger contribution to the differentiation of groups in the discriminant function) is entered. On the other hand, selected variables with a small value of F (i.e., variables that lower the statistical significance of the classification function) are removed.
The percentage of correct classifications attained for each set assesses the discriminant ability of each DF. The classification criterion is the minimal Mahalanobis distance (distance of each case to the mean of all the cases in a category). The quality of the discriminant function was evaluated through Wilk's U-statistical parameter, λ, which was obtained by a multivariate analysis of variance that tests the equality of group means for the variable in the discriminant model.
The BMDP 7M Biomedical package [45] was the software used for the LDA study. The program randomly chooses compounds for the training and test groups. With the training group, a predictive mathematical model relating activity to structural descriptors is obtained. Internal validation is performed by the Jack-Knife (JK) method. Finally, the program performs external validation by applying the classification function to the test group and calculating the percentage of classification success.

Pharmacological Distribution Diagrams (PDD)
PDDs are histogram-like plots of connectivity functions used to determine the intervals of the discriminant function in which the expectancy, E, to find active compounds is maximum. In these plots, expectancies appear on the ordinate axis. For an arbitrary interval of values of a given function, we can define the expectancy of activity as Ea = a/(i + 1), where "a" is the number of active compounds in the interval divided by the total number of active compounds, and "i" is the number of inactive compounds. The expectancy of inactivity is defined in a symmetrical way, as Ei = i/(a + 1) [41]. This representation provides good visualization of the regions of minimum overlap and helps selecting intervals in the abscissa axis with maximum probability of finding active compounds.
PDDs allowed us to carry out the assignment of thresholds useful to discriminate active from inactive compounds with the highest probability of success.

Conclusions
Currently, the development of resistance of microorganisms such as Staphylococcus aureus is one of the most important problems that has appeared in recent years in the treatment of infectious diseases. Molecular topology has been demonstrated to be a useful methodology for identifying new compounds with antimicrobial activity against MRSA. By combining it with LDA, we developed four mathematical-topological equations with outstanding discriminant ability according to the success rates. These results are supported by internal and external validation performed on all functions, as well as by all the statistical parameters, which can be considered very satisfactory in all cases.
We can conclude that the equations obtained in this study confirm molecular topology as a powerful and efficient tool in the discrimination of anti-MRSA activity, offering new insights in the search for new compounds with this specific activity as opposed to classical structure-activity relationships.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/ijms22115823/s1, Table S1: Active group used in the discriminant functions: Paper name/code, IUPAC name & structure and bibliographic references about activity for each compound, Table S2: Inactive group used in the discriminant functions: Paper name/code, IUPAC name & structure and bibliographic references about activity for each compound, Table S3