Molecular Analysis of Streptomycin Resistance Genes in Clinical Strains of Mycobacterium tuberculosis and Biocomputational Analysis of the MtGidB L101F Variant

Globally, tuberculosis (TB) remains a prevalent threat to public health. In 2019, TB affected 10 million people and caused 1.4 million deaths. The major challenge for controlling this infectious disease is the emergence and spread of drug-resistant Mycobacterium tuberculosis, the causative agent of TB. The antibiotic streptomycin is not a current first-line anti-TB drug. However, WHO recommends its use in patients infected with a streptomycin-sensitive strain. Several mutations in the M. tuberculosis rpsL, rrs and gidB genes have proved association with streptomycin resistance. In this study, we performed a molecular analysis of these genes in clinical isolates to determine the prevalence of known or novel mutations. Here, we describe the genetic analysis outcome. Furthermore, a biocomputational analysis of the MtGidB L101F variant, the product of a novel mutation detected in gidB during molecular analysis, is also reported as a theoretical approach to study the apparent genotype-phenotype association.


Introduction
Worldwide, tuberculosis (TB) is among the top 10 causes of death and the leading disease caused by a single infectious agent, Mycobacterium tuberculosis, ranking above HIV/AIDS [1,2]. TB control has become a global challenge due to the continued emergence of multidrug-resistant (MDR) strains [3,4]. Precise identification of such strains demands bacterial confirmation and drug resistance assessment using culture methods, molecular analysis and DNA sequencing [5,6]. Moreover, patients with MDR-TB require intensive treatment for at least nine months (up to 20 months), supported by constant pharmacovigilance to reduce adverse events [7,8].
Streptomycin (Str) was the first antibiotic used for the therapeutic control of TB [9,10]. However, drug-associated side effects and the emergence of Str-resistant (Str R ) strains prompted its removal from the group of first-line anti-TB drugs [11][12][13]. Despite this, WHO recommends using it in patients infected with a confirmed Str-susceptible strain [8,14,15]. Str is active against growing bacteria by inhibiting protein synthesis. It acts through irreversible binding to S12 protein and 16S rRNA, two molecular constituents of the small subunit of bacterial ribosomes [13,16].
Herein, we performed a molecular analysis of the rpsL, rrs and gidB genes in clinical isolates of M. tuberculosis that showed a noticeable Str R phenotype to determine the current prevalence of known or novel mutations in our region. The combined outcome of DNA sequencing and gene comparisons allowed us to identify point mutations associated with such drug resistance. Furthermore, a biocomputational analysis was used as a theoretical approach to study the apparent genotype-phenotype association observed in two Str R isolates containing the 301c>t mutation in gidB, which produces the non-synonymous substitution L101F in the gene product.

Molecular Analysis of the M. tuberculosis rpsL, rrs and gidB Genes
Patients suspected of having active TB disease who attended the GHT's TB Clinic provided the sputum samples. From a collection period of one year, 11 clinical isolates met the inclusion criteria for this study: positive Ziehl-Nielsen (ZN) staining and streptomycin resistance (minimum inhibitory concentration, MIC > 0.8 µg/mL). Using standard molecular methods for M. tuberculosis gDNA extraction and PCR amplification, the expected gene fragments for molecular analysis: 628 bp (rpsL), 645 bp (rrs) and 719 bp (gidB), were obtained. Once purified, the amplicons were sequenced and analyzed with various bioinformatics tools: NCBI's BLAST, SequentiX's DNA Dragon, SnapGene ® Viewer and EBI's Clustal Omega. Table 1 summarizes the data obtained from dsDNA sequencing and gene analysis.  01R  ND  ND  301c>t (L101F)  II  02R  ND  ND  236t>c (L79S)  II  03R  ND  491c>t *  ND  I  04R  ND  ND  ND  I  05R  ND  ND  ND  I  06R  ND  ND  ND  I  07R  ND  ND  ND  I  08R  ND  ND  37g>c (G13R); 47t>g (L16R)  II  09R  ND  ND  236t>c (L79S)  II  10R  ND  ND  301c>t (L101F)  II  11R  ND  ND  236t>c (L79S)  II ND: no mutation detected (i.e., 100% identical to the reference DNA sequence). * Mutation shared with streptomycin-sensitive clinical isolates. § Grouped by in-house criteria.
As initial analysis of the molecular results, none of the Str R isolates showed mutations in rpsL, i.e., fully identical to the reference gene sequence (Mycobrowser ID: Rv0682). Furthermore, most of them (10 out of 11) showed 100% identity within the 365-978 nucleotide segment of the reference rrs gene (Mycobrowser ID: MTB000019). The exception isolate (03R) showed the 491c>t mutation, which was also detected in Str-susceptible isolates, confirming a lack of genotype-phenotype association. Moreover, it represents an epidemiological biomarker assigned to the M. tuberculosis Latin American and Mediterranean sublineage 3 (LAM3) [28].
In contrast, the gidB analysis produced mixed results. Five isolates, 03R-07R, showed sequences 100% identical to the reference (Mycobrowser ID: Rv3919c). Another three: 02R, 09R and 11R, showed the 236t>c mutation (causing the L79S substitution). This variant is associated with a low-level Str R phenotype when detected as a sole mutation. However, when concurring with mutations in other genes (e.g., rrs 517c>t or rpsL K43R), isolates exhibit significant resistance to Str [24,[29][30][31]. Two others (R01 and R10) showed the 301t>c mutation (causing the L101F substitution). Knowledge about this gene variant and its contribution to the phenotype is limited. However, a common feature of Str R isolates containing such a variation is the absence of mutations in their rrs and rpsL genes, suggesting a genuine association with the observed phenotype [29]. Lastly, isolate 08R showed two mutations, 37g>c and 47t>g, causing the G13R and L16R substitutions. While G13R seems to be a novel mutation, L16R is a natural polymorphism associated with the LAM lineage [20,24,29,31,32].

Significance of the Molecular Analysis
So far, the observed results allow us to separate the analyzed Str R isolates into two groups (I and II, Table 1). Group I comprises of those lacking mutations in the commonly associated genes (i.e., rpsL, rrs and gidB) and those with mutations previously identified as polymorphisms (e.g., 491c>t in rrs). Interestingly, the identification of this group implies the existence of additional genes associated with the phenotype, as suggested by proteomic analyses [33][34][35]. On the other hand, group II includes those containing mutations in gidB and lacking known genetic variations in rpsL and rrs. Furthermore, while a subset of this group involves those carrying a mutation associated with a low level of resistance (i.e., L79S), another subset contains those revealing either G13R or L101F as a novel nonsynonymous substitution.
Regarding the latter, G13R is within the N-terminal domain, and L101F is in the SAM-dependent methyltransferase (SAM-MTase) domain, within the SAM-interacting region [25,36,37]. As there is no prior evidence regarding the G13R genotype contribution to the Str R phenotype, the effect of such a mutation on MtGidB function will not be analyzed further. However, to gain additional knowledge on the molecular basis of Str resistance in M. tuberculosis, we performed a biocomputational analysis of the MtGidB L101F variant to predict its functional consequences (given the conserved structure-function relationship among SAM-MTase domains).

Structural and Functional Analysis
A primary structure-based biocomputational analysis of MtGidB provided the initial information about the effect of the L101F substitution on protein function ( Table 2). The combined results of three bioinformatics tools predicted a negative effect of such a mutation on MtGidB function. Template-based modeling resulted in a reliable 3D structure to test whether L101F affects the functional conformation of MtGidB, as suggested before. The predicted 3D model showed a high confidence score ( Figure 1A): 1.17 (in the range −5 to 2, a higher value means high confidence). After refinement, the quality assessment result validated its structural accuracy: a global score of 0.7045 (values ≥ 0.4 are good scores). Furthermore, the Ramachandran plot showed that 89.1% of the non-Gly/Pro residues are in the most favored regions plus an additional 8.7% in allowed regions ( Figure 1B). In addition, the estimated Z-score for overall quality, −6.44, is within the range of values typically found for proteins of similar size ( Figure 1C). Before further structure-function analysis, the predicted model served as a framework to identify the presumed SAM-interacting residues: G69, S70, G71, L74, E92, P93, L94, R97, G117, R118, A119, E120, R137 and A138. A tertiary structure-based bioinformatic analysis of the MtGidB 3D model provided additional knowledge about the effect of the L101F substitution on protein stability. Overall, four biocomputational methods, along with a thermodynamic assumption: destabilizing (∆∆G < −1.0 Kcal/mol), neutral (−1.0 ≤ ∆∆G ≤ 1.0 Kcal/mol) and stabilizing (∆∆G > 1.0 Kcal/mol) [38], confirmed structural destabilization (Table 3).

Polymorphic Site Interaction Analysis
The analysis of interatomic contacts at the polymorphic site provided further insights into the local interactions and changes derived from residue substitution (Figure 2). In this regard, a comparative examination of the respective networks of non-covalent bonds revealed that the mutant residue (F101) establishes new interactions and loses others, in contrast to the wild-type (L101). A supplementary analysis of interatomic contacts of structural units confirmed this observation (Table 4). Furthermore, as L101 is involved in a residue contact network that includes SAM-interacting residues (i.e., G71, E92 and R97), it seems reasonable to suggest that the L101F substitution affects the MtGidB function by altering the ligand-binding site of the SAM-MTase domain.
the nearest distance between atoms of two residues; S, contact surface area between two residues; HB, hydrophilic-hydrophilic contact (hydrogen bond); Arm, aromatic-aromatic contact; Pho, hydrophobic-hydrophobic contact; DC, hydrophobic-hydrophilic contact (destabilizing contact); +/−, presence/absence of a specific contact.

Protein Dynamics Analysis
Molecular dynamics (MD) simulations are commonly applied to study protein mobility and flexibility [39,40]. Using coarse-grained (CG) models as reduced representations of protein residues, this theoretical approach provided additional knowledge about the conformational structure of MtGidB and its changes due to the L101F substitution in a 2000 ps time frame. Interestingly, both systems (wild-type and mutant) depicted a short phase of continuous decrease in the UNRES (united residue) potential energy followed by an apparent steady-state, which remained until the simulation end ( Figure 3A,B). However, the radius of gyration plots showed that the mutant system exhibits a higher degree of structural mobility than the wild-type system ( Figure 3C,D), supporting the hypothesis that implies changes in local flexibility are the consequence of the L101F mutation on the MtGidB conformation. Supplementary analysis of atomic fluctuations completed the knowledge on residuebased flexibility. Even though both retained the secondary structure, the mutant system showed increased overall flexibility than the wild-type system ( Figure 4A,B), with significant deviations in residues and regions of the SAM-MTase domain ( Figure 4C). This structural behavior suggests that the L101F substitution indirectly affects the flexibility of other residues and, thus, the global MtGidB structural stability.

Significance of the Biocomputational Analysis
Overall, the MD results suggest that the L101F mutation affects the flexibility and stability of the MtGidB structure, probably due to local and global intramolecular perturbations. Furthermore, as the L101 residue is involved in a contact network that includes SAM-interacting residues (i.e., G71, E92 and R97), it seems reasonable to suggest that the Leu>Phe substitution at position 101 affects the MtGidB function by altering the ligandbinding site of the SAM-MTase domain. However, an experimental approach is required to test the latter hypothesis and accurately establish the genotype-phenotype association observed in the clinical isolates of M. tuberculosis.

Sample Collection and Mycobacteriological Analysis
Sputum samples from patients suspected of having active TB were collected at the TB Clinic of the General Hospital of Tijuana (GHT) by qualified personnel. All samples were digested-decontaminated using a BBL MycoPrep™ System (Becton Dickinson). The mycobacteriological analyses and drug-susceptibility assays were performed at the TB Diagnostic Unit (GHT), using standard protocols. Out of 157 independent samples from a one-year collection period, 11 tested positive for two inclusion criteria: acid-fast bacilli by Ziehl-Nielsen (ZN) staining and Str-resistant (MIC > 0.8 µg/mL) by MGIT analysis (BACTEC 960 System, Becton Dickinson). In this case, 15 ZN-positive Str-sensitive samples, selected at random, were used as controls. Sample handling and subsequent procedures were according to standard protocols approved by the GHT's Ethics Committee.

Mycobacterial DNA Extraction
Genomic DNA was isolated using the DNAzol reagent (Becton Dickinson) and the protocol recommended by the manufacturer. Briefly, 0.2 mL of MycoPrep's sediment and 0.5 mL of 1X Dulbecco's PBS solution were mixed, and bacterial lysis was completed by heating at 80 • C (10 min). After cooling for 1 min on an ice bath, 1 mL of DNAzol reagent was added, mixed thoroughly, and centrifuged at 10,000 rpm for 10 min (to remove cell debris). The supernatant was mixed with 0.5 mL of cold ethanol and then centrifuged for 10 min at 14,500 rpm. The precipitated DNA was air-dried for 10 min and then dissolved in 30 µL of 8 mL NaOH.

PCR Amplification of Gene Fragments
Typical PCR reactions (20 µL) contained 10 picomoles of each primer (i.e., Fw/Rv) and 1 µL of template DNA in 1X Taq Master Mix (New England Biolabs). Table 5 lists the synthetic oligonucleotides used as primers for PCR amplification of the M. tuberculosis rpsL, rrs, gidB gene fragments. Reactions were completed in a C1000 Touch™ Thermal Cycler (Biorad) using the following settings: an initial denaturation step (2 min at 94 • C), 45 cycles of exponential amplification (20 s at 94 • C, 20 s at 55 • C, 20 s at 72 • C) and a final extension step (7 min at 72 • C). A reaction lacking template DNA was used as a negative control, while another containing M. tuberculosis H37Rv genomic DNA (1 ng) was positive (and reference DNA for comparative purposes).

Analysis and Purification of Amplicons
The amplification products were analyzed by 2.0% agarose gel electrophoresis, using EtBr as a fluorescent dye (0.5 µg/mL, final), and visualized/documented with a GelDoc™ EZ Imager (Biorad). The 100-bp DNA Ladder and λ-DNA/HindIII Digest (New England Biolabs) were the DNA markers used to assess molecular weights. The amplicons: 628 bp for rpsL, 645 bp for rrs and 719 bp for gidB, were purified using a QIAquick PCR Purification Kit, as recommended by the manufacturer (Qiagen).

Sequence-Based Function Predictions
Three bioinformatic predictors determined the effect of L101F substitution on the function of MtGidB: PolyPhen-2, PROVEAN and SIFT, with default settings. PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/, accessed on 1 October 2020) is an algorithm that combines sequence and structure-based attributes and uses a naive Bayesian classifier to identify missense mutations with an impact on the phenotype. Output levels of probably (0.85-1.0) and possibly (0.15-0.84) damaging are significant [44][45][46]. PROVEAN (http: //provean.jcvi.org/, accessed on 1 October 2020) is an alignment-based method that estimates the influence of amino acid substitutions on protein function. The final score designates the mutation as deleterious or neutral, according to a predefined threshold. Protein variants with a score equal to or less than −2.5 are deleterious [47,48]. SIFT (https://sift.bii.a-star.edu.sg/, accessed on 1 October 2020) is a sequence homology-based tool that classifies amino acid substitutions as tolerant (neutral) or intolerant (deleterious) mutations. Protein variants with a normalized probability value equal to or less than 0.05 are deleterious [49].

Examination of the Interatomic Contacts
The interatomic contacts at the polymorphic site were estimated using Arpeggio (http://biosig.unimelb.edu.au/arpeggioweb/, accessed on 15 December 2020), a web service for calculating the interatomic interactions in protein structures [68]. The MtGidB model was the wild-type structure, and its L101F variant the mutant. The local network of non-covalent interactions was analyzed using the PyMol system. The specific contacts were detected using the LPC/CSU software (http://oca.weizmann.ac.il/oca-bin/lpccsu, accessed on 15 December 2020) by automatic analysis of interatomic contacts of structural units (CSU) [69].

Coarse-Grained Molecular Dynamics Simulations
Coarse-grained (CG) models for (MD) simulations are an effective biocomputational approach for adequate sampling of the conformational space while maintaining physical rigor [70]. The CG-MD simulations were performed online by the UNRES web server (https://unres.pl/, accessed on 10 January 2021), using the MtGidB 3D model as a wildtype structure and its L101F variant as the mutant structure with default settings for standard protein dynamics. The CG united residue (i.e., UNRES) model is a highlyreduced physics-based representation of proteins, in which only two interaction sites per residue (united side chains and united peptide groups) are present [71][72][73]. The automatic output data, such as plots of potential energy and radius of gyration, were downloaded and analyzed as generated by the server. The fluctuations results were analyzed using the Pymol system.  Data Availability Statement: All data presented in this study are available on request from the corresponding author, without undue reservation, to any qualified researcher.