In Silico Characterization and Virtual Screening of GntR/HutC Family Transcriptional Regulator MoyR: A Potential Monooxygenase Regulator in Mycobacterium tuberculosis

Simple Summary In an era where the world faces new diseases and pathogens, another emerging challenge is neglected pathogens becoming more notorious. Transcriptional regulators play a vital role in the pathogenesis and survival of these pathogens. Hence, characterizing transcriptional regulators, either in vitro or in silico, is of great importance. Here, we present the first structural characterization of a GntR/HutC regulator in Mycobacterium tuberculosis via in silico methods. We have suggested its possible role and potential as a drug target as well as identified possible drug leads that can be used for further improvements. Abstract Mycobacterium tuberculosis is a well-known pathogen due to the emergence of drug resistance associated with it, where transcriptional regulators play a key role in infection, colonization and persistence. The genome of M. tuberculosis encodes many transcriptional regulators, and here we report an in-depth in silico characterization of a GntR regulator: MoyR, a possible monooxygenase regulator. Homology modelling provided a reliable structure for MoyR, showing homology with a HutC regulator DasR from Streptomyces coelicolor. In silico physicochemical analysis revealed that MoyR is a cytoplasmic protein with higher thermal stability and higher pI. Four highly probable binding pockets were determined in MoyR and the druggability was higher in the orthosteric binding site consisting of three conserved critical residues: TYR179, ARG223 and GLU234. Two highly conserved leucine residues were identified in the effector-binding region of MoyR and other HutC homologues, suggesting that these two residues can be crucial for structure stability and oligomerization. Virtual screening of drug leads resulted in four drug-like compounds with greater affinity to MoyR with potential inhibitory effects for MoyR. Our findings support that this regulator protein can be valuable as a therapeutic target that can be used for developing drug leads.


Introduction
Tuberculosis (TB), a disease that has plagued humankind throughout history, is caused mainly by the infection of Mycobacterium tuberculosis. It has been hypothesized that the genus Mycobacterium originated 150 million years ago, and the modern M. tuberculosis strain survived over 70,000 years, claiming millions of lives each year [1,2]. Even though antitubercular chemotherapy is the backbone of TB treatment, deaths due to the emergence of new strains of M. tuberculosis that are resistant to some or all antitubercular drugs (multi-drug resistant TB, MDR-TB) currently form a major health problem. Even decades after Koch's findings, new genetic and molecular insights are still required to divulge the mechanisms involved in the acquisition of drug resistance and the survival of bacteria under stress in the environment. Adaptation to stress responses is primarily mediated through the tight regulation of gene expression, where transcriptional regulators play a

3D Structure Modelling, Structure Assessment and Functional Domain Prediction of the Adjacent Gene Encoding Proteins
Homology modelling of the adjacent genes, Rv0791c, Rv0790c, Rv0789c and Rv0793 encoding proteins were done as mentioned above. The structure assessment for all the modelled structures were done in a similar manner as for the MoyR model. Additionally, functional domains of the adjacent genes were identified using NCBI Conserved Domain Database (CDD) and a blastP analysis was carried out using the UniProtKB as the targeted database. Functional domains were identified using matches with more than 70% identity.

Identifying Effector Binding Site and Druggability of MoyR
In order to determine the possible ligand binding pockets, a structure-based and geometry-based prediction was done using metaPocket 2.0 [33]. The metaPocket2.0 server consists of predictors, LIGSITE, PASS, Q-SiteFinder, SURFNET, Fpocket, GHECOM, Concavity and POCASA. The pockets sites identified by the different methods have different ranking scoring functions. In order to make ranking scores comparable a Z-score calculated for each site in different methods and pocket sites of each method were clustered according to their spatial similarity and total Z-score values of a cluster. CavityPlus web server [34] was also used to identify the cavities and the amino acids which the pockets are made of. Binding pockets of DasR, NagR and MoyR were compared using Pocket Match server [35], and amino acids involved in effector recognition were identified using sequence alignment. The conserved residues in the identified binding pocket of MoyR were determined using ConSurf Server. [36]. Druggability of the pockets were identified using PockDrug server [37].

Virtual Screening Study
The modelled structure of MoyR was used to screen the possible hit compounds, and the virtual screening was performed using AutoDock in PyRx virtual screening tool [38]. As for the preliminary screening, a blind docking was carried out where the protein molecule was set to a rigid file while the ligand was moved and rotated to find the best binding modes. Maybridge and ChEMBL were used as chemical databases for screening and approximately 53,000 compounds in total were used. The first 100 compounds with the lowest binding affinity (kcal/mol) were extracted from the docking results. To eliminate false negative values, the ligand interactions were analyzed using Protein-Ligand Interaction Profiler server (https://plip-tool.biotec.tu-dresden.de/plip-web/plip/index, accessed on 24 February 2020) and Discovery Studio Visualizer [39]. Drug-likeness and pharmacokinetics properties of the resulted compounds were determined by SwissADME.

Secondary Structure of MoyR
Multiple sequence alignment of the HutC regulators were carried out by ClustalW, and the confidence level of the multiple sequence alignment was analyzed using GUIDANCE server. The score for the alignment was 0.8434, and all the sequences scored higher than 0.6, which indicates a reliable alignment for further analysis. In the selected HutC regulators, the length of UTRA domain ranged between 128 to 142 amino acids and the length of MoyR UTRA domain was 128 amino acids (106-246 aa) with an E-value of 1.02 × 10 −15 . The length of the DBD of HutC regulators was about 59 amino acids with two highly conserved residues, namely proline and threonine, in the α-helixes (Figure 1). Secondary structure prediction of HutC regulators according to consensus sequences gave a higher number of β-strands towards the c-terminus, which is characteristic to HutC regulators ( Figure 1) and the secondary structure prediction of MoyR revealed the same pattern ( Figure 2A). The frequency analysis of the bases in the UTRA domain of the selected HutC regulators by WebLogo showed two highly conserved leucine residues at distant positions, in which the height of each base represents the relative frequency at each position ( Figure 2B). Two highly conserved leucine residues were found to be at the positions L131 and L210 in MoyR, corresponding to L121 and L202 in TraR protein of Streptomyces phaeochromogenes (Figure 1). The L121 residue was identified as a structurally important key residue in the oligomerization and repressor function of TraR [40]. These two leucine residues are conserved in both DasR from Streptomyces coelicolor (L130, L208) and NagR from Bacillus subtilis (L120, L198), which are MoyR homologues (Figure 1). higher number of β-strands towards the c-terminus, which is characteristic to HutC regulators ( Figure 1) and the secondary structure prediction of MoyR revealed the same pattern ( Figure 2A). The frequency analysis of the bases in the UTRA domain of the selected HutC regulators by WebLogo showed two highly conserved leucine residues at distant positions, in which the height of each base represents the relative frequency at each position ( Figure 2B). Two highly conserved leucine residues were found to be at the positions L131 and L210 in MoyR, corresponding to L121 and L202 in TraR protein of Streptomyces phaeochromogenes ( Figure 1). The L121 residue was identified as a structurally important key residue in the oligomerization and repressor function of TraR [40]. These two leucine residues are conserved in both DasR from Streptomyces coelicolor (L130, L208) and NagR from Bacillus subtilis (L120, L198), which are MoyR homologues (Figure 1).

Genomic Locus of MoyR
Many fundamental processes in bacteria, including carbon metabolism [41], amino acid metabolism [9], morphogenesis [42], virulence [43,44], biofilm formation [45], antibiotic resistance [46] and antibiotic production [42], are known to be controlled by GntR regulators. GntR regulators are often located adjacent to the genes that they control, and this could provide an insight into the effectors that these regulator proteins could bind to in the process of regulation. The gene locus of moyR consists of many hypothetical proteins ( Figure 3). It has been shown that the Rv0789c, Rv0790c, Rv0791c and moyR are mostly differentially expressed as an operon in the intracellular environment [47]. According to the correlation catalog of M. tuberculosis H37Rv genome, the highest positive correlation with moyR was given in Rv0790c and Rv0791c (http://tuberculosis.bu.edu/tbdb_sysbio/ CC/Rv0792c.html, accessed on 16 March 2019).

Genomic Locus of MoyR
Many fundamental processes in bacteria, including carbon metabolism [41], amino acid metabolism [9], morphogenesis [42], virulence [43,44], biofilm formation [45], antibiotic resistance [46] and antibiotic production [42], are known to be controlled by GntR regulators. GntR regulators are often located adjacent to the genes that they control, and this could provide an insight into the effectors that these regulator proteins could bind to in the process of regulation. The gene locus of moyR consists of many hypothetical proteins ( Figure 3). It has been shown that the Rv0789c, Rv0790c, Rv0791c and moyR are mostly differentially expressed as an operon in the intracellular environment [47]. According to the correlation catalog of M. tuberculosis H37Rv genome, the highest positive correlation with moyR was given in Rv0790c and Rv0791c (http://tuberculosis.bu.edu/tbdb_sysbio/CC/Rv0792c.html, accessed on 16 March 2019).

Homology Modelling of MoyR
The crystal structure of ligand-free HTH type DasR from Streptomyces coelicolor (4ZS8) was automatically selected as the template model in all three webservers that were

Homology Modelling of MoyR
The crystal structure of ligand-free HTH type DasR from Streptomyces coelicolor (4ZS8) was automatically selected as the template model in all three webservers that were used to model MoyR protein. SWISS-MODEL provided a sequence identity of 30.64% with DasR template and the GMQE value was 0.62 and QMEAN was −1.22. The confidence score for estimating the quality of the predicted model given by C-score in I-TASSER was 0.05, which can be considered as a good confidence score. The TM score and RMSD values were used to measure the structural similarity of the model and the known standard (TM value was 0.72 ± 0.11 and RMSD was 5.5 ± 3.5Å), which indicate a model of correct topology. The backbone confirmation of each residue of the modelled structure was calculated using PROCHECK by analyzing ϕ/ψ torsion angles [phi (ϕ) and psi (ψ)] determined by Ramachandran plot.Over 99.8% of the residues were in either the favoured region or the allowed region. Verify 3D further provides a percentage of 73.88 residues with a score of over 0.2 for the MoyR model. The ProQ neural network used for protein quality production in the MoyR model, which gives two scores LGscore and MaxSub. The LG score value was 3.8 (>2.5 very good) and Maxsub was 0.456 (>0.5 very good). The arrangement of different types of atoms with respect to one another in the protein model was assessed by ERRAT, which is sensitive for identifying incorrectly folded regions in preliminary protein models. The overall model quality was assessed by the ProSA-web server, the Z-score value for modelled MoyR is −6.74. The MoyR model was built according to the structural arrangement of DasR regulator. The model of MoyR is a homodimer each consist of two main domains, HTH-DBD and UTRA domain, which is characterized by the sixstranded antiparallel β-sheets in the core of the structure where the effector binding occurs ( Figure 4A). The topology of the MoyR monomer was predicted by the PDBsum server, and the topology map was drawn accordingly ( Figure 4B). NagR protein from Bacillus subtilis also shared a high similarity with the modelled MoyR where both DasR and NagR can be considered as structural homologues of MoyR. Few servers including Gpos-PLoc, PSORTb, CELLO v.2.5 and LoCTree were used to predict the subcellular localization of MoyR, and the cytoplasmic location was predicted with higher confidence values.  Secondary structure elements are displayed as arrows (β-sheets) and cylinders (α-helices). The DBD consist of α-helices, α 1 -α 3 and β-sheets, β 1 -β 2 . Most of the α-helices (α 5 -α 9 ) and β-sheets (β 4 -β 10 ) are concentrated in the EBD where the linker segment between the DBD and EBD is highlighted in bold.

Homology Modelling and Functional Annotation of MoyR Adjacent Gene Encoding Proteins
All the moyR adjacent gene encoding proteins were modelled, and the structure assessment was carried out as mentioned. The reliability of modelled Rv0790c and Rv0789c were very poor; therefore, these two protein models were excluded from further analysis. Structure modelling of Rv0791c revealed a luciferase-like monooxygenase from Bacillus cereus with a sequence identity of 28.79% with a GMQE value of 0.62 and a QMEAN of −2.87. Ramachandran plot analysis of Rv0791c revealed 98.8% of amino acid residues in Secondary structure elements are displayed as arrows (β-sheets) and cylinders (α-helices). The DBD consist of α-helices, α 1 -α 3 and β-sheets, β 1 -β 2 . Most of the α-helices (α 5 -α 9 ) and β-sheets (β 4 -β 10 ) are concentrated in the EBD where the linker segment between the DBD and EBD is highlighted in bold.

Homology Modelling and Functional Annotation of MoyR Adjacent Gene Encoding Proteins
All the moyR adjacent gene encoding proteins were modelled, and the structure assessment was carried out as mentioned. The reliability of modelled Rv0790c and Rv0789c were very poor; therefore, these two protein models were excluded from further analysis. Structure modelling of Rv0791c revealed a luciferase-like monooxygenase from Bacillus cereus with a sequence identity of 28.79% with a GMQE value of 0.62 and a QMEAN of −2.87. Ramachandran plot analysis of Rv0791c revealed 98.8% of amino acid residues in the favored region and allowed region. Verify 3D analysis results revealed that 88.08% of the residues have a score of more than 0.2 (3D-1D score ≥ 0.2) and Pro-Q analysis yielded a LG score of 5.5, which indicates that the Rv0791c model can be used as a reliable 3D structure for further analysis. Bacterial luciferases are in the class of flavin monooxygenases that catalyze the oxidation of long-chain aldehydes and releases energy in the form of visible light. Even though the crystal structure of Rv0793 is available, the modelled structure was used for docking purposes, which was highly similar to its crystal structure. The amino acid sequences of Rv0789c, Rv0790c, Rv0791c and Rv0793 were used to identify the domains using UniProtKB database. The sequence of Rv0789c did not result in any significant match with a functional domain, whereas blast of Rv0790c resulted in seven hits with more than 75% identity to transglutaminase enzyme from different organisms. The sequence of Rv0791c resulted in a conserved functional domain encoding for an F420 dependent oxidoreductase in four hits with more than 75% identity. Two matches with over 70% identity were identified for Rv0793 corresponding to antibiotic biosynthesis monooxygenases from Mycobacterium species and 10 hits with more than 50% identity corresponding to antibiotic biosynthesis binding domains mainly from Mycobacterium spp. and Streptomyces spp. were also found. The Rv0793 gene encodes a putative monooxygenase which is structurally very similar to Streptomyces coelicolor ActVA-Orf6 monooxygenase, which participates in the tailoring of polyketide antibiotic synthesis [48].

Physiochemical Properties of MoyR
MoyR protein monomer consists of 269 amino acids with a molar weight of 28.95 kDa and the theoretical pI is 8.54. A total of 28 negatively charged residues and 30 positively charged residues were identified. The instability index value was 43.8, suggesting that MoyR is unstable outside the cellular environment. The calculated aliphatic index is 101.82, indicating that MoyR is thermally stable, and the GRAVY value (grand average value of hydropathicity) is −0.050 reveals that MoyR is hydrophilic in nature.

Effector Binding Site of MoyR
MoyR binding pockets were determined using metaPocket 2.0 and CavityPlus web server in which two probable pockets were identified in the region between DBD and EBD (pockets 1 and 2) and two highly probable pockets in the EBD of chain A and B (pockets 3 and 4) ( Figure 5A). Hence, pockets 3 and 4 can be considered as the active sites in which ligand binding occurs. Therefore, pockets 1 and 2 can be considered as allosites to which allosteric drugs can bind, whereas pockets 3 and 4 can be considered as orthosites to which orthosteric drugs can bind. These two ligand-binding pockets (pockets 3 and 4) of MoyR and identified NagR and DasR pockets were compared using PocketMatch server. High similarity was obtained in pockets of DasR vs. NagR with a value of 0.8699. Values greater than 0.8 indicate that the pockets are very similar. The value for DasR vs. MoyR pockets was 0.5868 and that for NagR vs. MoyR was 0.6896, suggesting that MoyR shares a pocket similarity to some extent with DasR and NagR. Pairwise sequence identity matrix was generated by Clustal Omega server and overall sequence similarity ranged from 29.15 to 39.17 among the three proteins, indicating high sequence similarity among DasR and NagR. According to the published data, both DasR and NagR respond to the same ligands, glucoseamine-6-phosphate (GlcN-6-P) and N-acetylglucoseamine-6-phosphate (GlcNAc-6-P); for which effector recognition is highly similar [14]. Out of the 16 identified binding site residues in DasR and NagR crystal structure, 12 were similar. When compared with MoyR, only five residues were similar, indicating a lower affinity of glucose moieties to MoyR. To identify the residues that might be conserved in the predicted MoyR binding pocket, a multiple sequence alignment of 150 HutC sequences was generated. LEU 219, ILE 143 and VAL 221 in the antiparallel β strands and TYR 179 and THR 177 in the α helix ( Figure 5B). The identified conserved residues of ligand binding sites of MoyR, DasR and NagR were compared, revealing three highly conserved residues involved in effector binding in all three proteins (Table 1). glucoseamine-6-phosphate (GlcN-6-P) and N-acetylglucoseamine-6-phosphate (GlcNAc-6-P); for which effector recognition is highly similar [14]. Out of the 16 identified binding site residues in DasR and NagR crystal structure, 12 were similar. When compared with MoyR, only five residues were similar, indicating a lower affinity of glucose moieties to MoyR. To identify the residues that might be conserved in the predicted MoyR binding pocket, a multiple sequence alignment of 150 HutC sequences was generated. Conserved residues of MoyR pocket were identified with the aid of the Consurf analysis server. The identified conserved residues of MoyR pocket are ALA 193, ARG 223, GLU 234, ARG 141, ALA 199, LEU 219, ILE 143 and VAL 221 in the antiparallel β strands and TYR 179 and THR 177 in the α helix ( Figure 5B). The identified conserved residues of ligand binding sites of MoyR, DasR and NagR were compared, revealing three highly conserved residues involved in effector binding in all three proteins (Table 1).

Druggability of MoyR and Virtual Screening Analysis
Calculated ligandability and the druggability of the four predicted pockets of MoyR using CavityPlus and PockDrug servers are given in Table 2. Higher druggability of all the predicted four pockets suggests that both allosteric and orthosteric drugs can be used to identify drug leads for MoyR. Overall drug probability of MoyR was calculated and yielded a value of 0.99, suggesting that MoyR has high druggability. Hence, a virtual screening platform was established to screen possible drug candidates for MoyR. The ligands with values lower than −10.0 Kcal/mol were extracted from the virtual screening, and protein-ligand interactions were analyzed. The best four candidate compounds with the lowest binding energy are given in Table 3. The interactions were very similar with all the high-affinity ligands, including conventional hydrogen bonds with highly conserved residues TYR179, ARG223 and GLU234. Many of the predicted binding pocket residues interacted with high affinity ligands via attractive charges, van der Waals bonds, alkyl, Pi-cation and Pi-Pi stacked bonds. Considering the drug-likeness according to the Lipinski rule of five, all the high-affinity compounds can be considered as druglike compounds. Table 2. Estimated druggability of the predicted binding pockets of MoyR using CavityPlus and PockDrug servers. In ligandability Pred. Max pK d value greater than six suggests that all the cavities are suitable as binding sites. Drug score is calculated on the basis of the binding structure alone by using a desolvation-based free energy model.

Discussion
Transcriptional regulators play a crucial role in the survival of bacteria under various stresses and GntR family of HTH-type transcriptional regulators are an important class of proteins in the pathogenesis and survival of bacteria. Even though there are many GntR regulators, in the HutC subfamily, only a few have been crystallized and characterized to date. HutC family members are expected to bind a variety of different effector molecules. Thus far, there is no detailed study that has been carried out on HutC regulators in M. tuberculosis. Therefore, this study can be considered a preliminary piece of work, which can provide insights on MoyR structure, its druggability and regulatory role. Amino acid composition itself could provide important information on the structure of a protein as well as its physiochemical parameters. Here, we have identified MoyR as a thermally stable, cytoplasmic protein with a high isoelectric point (pI). Higher pIs contain more electropositive residues on their surfaces and are thus more likely to bind DNA indicative of DNA binding ability of MoyR.
Recent molecular biology studies of Streptomyces and Mycobacterium have revealed prominent similarities in the developmental and morphological characteristics of the two bacteria. One simple example is the similarities of the two crystal structures, Rv0793 from M. tuberculosis and ActVA-Orf6 from Streptomyces coelicolor. The protein Rv0793 is predicted as a monooxygenase that participates in the biosynthesis of type II polyketide antibiotics [48]. The Streptomyces ActVA-Orf6 monooxygenase is involved in the biosynthesis of actinorhodin produced by type II polyketide synthase (PKSs) [49]. The structural analogue global regulator DasR entailed in signaling cascade from nutrient sensing to development and acts as a switch for antibiotic production in Streptomyces [49]. According to the results of this study, functional domain annotation of the moyR-adjacent gene encoding proteins are homologous to different monooxygenases. We have previously reported the binding of MoyR to the intergenic region of Rv0793 and moyR [50]. Therefore, this study provides evidence that MoyR has a higher probability of regulating a group of monooxygenases that possibly involves a polyketide antibiotics synthesis or a type II polyketide synthesis pathway in the bacteria. There are no previous reports on isolating antibacterial compounds from M. tuberculosis to our knowledge. This finding can be directed towards the probable synthesis of type II polyketides as secondary metabolites. Such antibiotic production would be useful for the bacterium to compete against other bacteria and conquer environmental stresses during survival within the host. We have carried out a preliminary docking study using KEGG pathway intermediates and found that MoyR, Rv0793 and Rv0791c have similar affinities to type II polyketide intermediates (data not shown here). Both the regulators DasR and NagR share numerous effector binding features and respond to the same glucose moieties where MoyR effector binding residues were greatly differing from DasR and NagR, confirming that the affinity for sugar moieties is very weak in MoyR. Hence, by considering the ligand-binding pattern with polyketide intermediates, the genomic locus of moyR gene with possible monooxygenases and previously published data [48], it is highly likely that MoyR can play an important role in a polyketide synthesis pathway in the bacteria.
Ligand binding pockets of MoyR were identified using few servers and the key residues were determined according to the conservation of other HutC regulators and structure superimposition with homologue DasR and NagR. As secondary structure prediction and 3D profile further provide information on the spatial arrangement of the amino acids in the protein, this can yield the most probable binding sites for natural ligands and drugs. The conserved TYR 179, ARG 223 and GLU 234 residues in the binding pocket of MoyR are identified as crucial for its function in effector recognition. Two highly conserved leucine residues in the effector binding domain were identified in the sequence alignment of MoyR with other HutC regulators that can be crucial for structure stability and oligomerization of the protein. Binding of the drug-like compounds occurred in the orthosteric site of the effector binding domain of MoyR, indicating that these drug candidates can possibly compete for binding with natural ligands of the MoyR.
The accuracy of a protein model can be assessed by its 3D profile, regardless of whether the model has been derived by X-ray crystallography, NMR spectroscopy or computational methods. The structure assessment data of the 3D model of MoyR provide information on its reliability as a primary screening study of possible ligands. Even though in silico characterization would not provide a full picture of the regulatory role of MoyR without supporting biochemical analysis, this study identifies the properties of MoyR and its potential as a drug target. These findings can be extended to study the in vitro binding of the possible natural ligands with MoyR protein and predict its possible role in the cell. The strategies used in this study to annotate the function of MoyR transcriptional regulator and its adjacent genes can be beneficial for designing experimental approaches to further evaluate the function of the genes.

Conclusions
TB claims millions of lives each year, and the increased emergence of multi-drugresistant M. tuberculosis constitutes a serious global threat. As M. tuberculosis has developed resistance to current TB drug regimes, the search for new antibacterial agents directed towards novel targets is of paramount importance. Here, we have identified a GntR/HutC regulator, MoyR involving in regulating a group of monooxygenases. Homology modeling of MoyR and validation of the model suggested that MoyR model can be used as a reliable structure for preliminary screening of drug compounds. The high druggability of MoyR indicates that this protein could be useful as a drug target, and we have identified the best hit compounds for MoyR that warrant further validation using in vitro work.
Author Contributions: Conceptualization, I.C.P.; methodology, data curation, investigation, writing original draft, T.D.A.; supervision, writing-review and editing; I.C.P. All authors have read and agreed to the published version of the manuscript.