Comprehensive Comparative Analysis of Cholesterol Catabolic Genes/Proteins in Mycobacterial Species

In dealing with Mycobacterium tuberculosis, the causative agent of the deadliest human disease—tuberculosis (TB)—utilization of cholesterol as a carbon source indicates the possibility of using cholesterol catabolic genes/proteins as novel drug targets. However, studies on cholesterol catabolism in mycobacterial species are scarce, and the number of mycobacterial species utilizing cholesterol as a carbon source is unknown. The availability of a large number of mycobacterial species’ genomic data affords an opportunity to explore and predict mycobacterial species’ ability to utilize cholesterol employing in silico methods. In this study, comprehensive comparative analysis of cholesterol catabolic genes/proteins in 93 mycobacterial species was achieved by deducing a comprehensive cholesterol catabolic pathway, developing a software tool for extracting homologous protein data and using protein structure and functional data. Based on the presence of cholesterol catabolic homologous proteins proven or predicted to be either essential or specifically required for the growth of M. tuberculosis H37Rv on cholesterol, we predict that among 93 mycobacterial species, 51 species will be able to utilize cholesterol as a carbon source. This study’s predictions need further experimental validation and the results should be taken as a source of information on cholesterol catabolism and genes/proteins involved in this process among mycobacterial species.


Introduction
Tuberculosis (TB), is a chronic infectious disease caused by Mycobacterium tuberculosis, and is one of the leading causes of death worldwide, killing an estimated two million people annually [1,2]. It is estimated that one third of the world's population (approximately two billion people) is infected with this highly pathogenic organism [3]. Once it has entered the human body, and after ingestion by macrophages, this intracellular pathogen can survive in a modified phagosome and cause latent infection for years and sometimes decades without any symptoms [4]. Tubercle bacilli can persist in this dormant state, from which they may be reactivated and cause TB [4]. The reactivation of latent phase M. tuberculosis into the active phase is observed among people whose immune systems are weakened (by introducing a hydroxyl group onto the side chain), aldehyde and carboxylic acid metabolites. A sterol-CoA ligase catalyzes the final ATP-dependent step [19] (Figure 1).
Research has demonstrated that CYP125 does not play a key role in cholesterol catabolism in the M. tuberculosis H37Rv strain and suggests that this strain carries out compensatory activities [29]. However, investigation of the in vitro enzyme specificities found that CYP125 and CYP142 are the dominant P450 enzymes responsible for initiating sterol side chain degradation in M. tuberculosis [29], although in the CDC1551 strain, CYP142 is present as a pseudogene [30]. In vitro analysis has also demonstrated that CYP142 can support the growth of the H37Rv strain on cholesterol in the absence of cyp125A1 [29]. Using western blot analysis, researchers found that CYP124A1 was not detectably expressed in the H37Rv or CDC1551 strains, but CYP142 was found in H37Rv and not in CDC1551 [29]. In the absence of CYP125 or CYP142, cholest-4-en-3-one accumulates and inhibits bacterial growth on cholesterol [19].
β-oxidation is the pathway of the breakdown of fatty acids in the form of acyl-CoA molecules, [24]. Before the oxidative reactions of the β-oxidation cycle, the fatty acid is activated in a reaction catalyzed by an ATP-dependent ligase, to its thioester with coenzyme A (CoA). The thioester then undergoes dehydrogenation catalyzed by acyl-CoA dehydrogenase to form the enoyl-CoA, which is then hydrated to the hydroxyacyl-CoA by enoyl-CoA hydratase. Next, 3-hydroxyacyl-CoA dehydrogenase catalyzes the oxidation of the hydroxyl group. The thiolase in the next step, carryout the thiolytic cleavage of β-ketoacyl-CoA into two molecules of acyl-CoA as products, seems to correspond to the FadA5. A single round of the β-oxidation cycle of unbranched chain fatty acids produces acetyl-CoA and a CoA thioester of an acid that is shorter by two carbon atoms. The shortened fatty acyl-CoA then undergoes a further round of the β-oxidation cycle [24].
Genes believed to be encoding β-oxidation enzymes have been identified in the cholesterol regulons of M. tuberculosis [19]. One of these enzymes, a thiolase encoded by fadA5, catalyzes the thiolysis of acetoacetyl-CoA in vitro, which is consistent with removal of the side chain by β-oxidation, producing androstene metabolites, 4-androstenedione (AD) and 1,4-androstenedione (ADD). This activity is required for growth on cholesterol and virulence, especially during the late (chronic) stage of mouse infection, prior to the onset of the immune response [22,30]. Another set of enzymes, acyl-CoA dehydrogenases, is required to catalyze unsaturation reactions in β-oxidation of steroid-CoA substrates, and the M. tuberculosis genome contains six sets of these enzyme genes (fadE's). Regulated by cholesterol, each set of these genes is found adjacent to another within the same operon [31].

Degradation of Cholesterol: Sterol Ring Degradation
The first step in the breakdown of the sterol ring is the conversion of cholesterol to cholest-4-en-3-one ( Figure 1). This reaction is catalyzed by either a 3β-HSD or a cholesterol oxidase (ChoD). As mentioned earlier, Rv1106c encodes a 3β-HSD. This enzyme uses NAD+ as a cofactor and oxidizes cholesterol (among others) to its 3-keto-4-ene product, cholest-4-en-3-one [19]. Rv3409c encodes ChoD and is required for M. tuberculosis virulence [33]. However, in a study by Yang et al. [34] it was found that Rv3409c was not required for growth on cholesterol as a sole carbon source, and they concluded that 3β-HSD is required for the initial conversion of cholesterol and that a second ChoD activity is not present in M. tuberculosis. In addition to this, mice infection experiments confirmed the significance of ChoD in the pathogenesis of M. tuberculosis, where it drives the oxidation of 3β-hydroxy-5-ene to 3-keto-4-ene [33].
It is assumed that 3-ketosteroid-∆ 1 -dehydrogenase (∆ 1 KstD) is coded by the Rv3537 gene that is part of the cholesterol regulon [19,25]. This enzyme catalyzes the trans-axial elimination of the C1(α) and C2(β) hydrogen atoms (C1-C2 dehydrogenation) of the 3-ketosteroid A ring of 4-androstenedione (AD) to yield 1,4-androstenedione (ADD) (Figure 2) [19], and targeted disruption of this gene inhibited growth on cholesterol [35]. In research done by Brzostek et al. [35], direct evidence was found that M. tuberculosis degrades cholesterol exclusively via the AD/ADD intermediates, and that KstD plays an essential role in this process.

Genes/Proteins Involved in Cholesterol Catabolism in M. Tuberculosis H37Rv
Based on literature, 152 genes/proteins were found to be involved in cholesterol breakdown in M. tuberculosis H37Rv (Table 1). These genes/proteins can be classified into four different categories.

Genes Predicted to be Specifically Required for Growth on Cholesterol
Griffin et al. [26] identified 96 genes that are important for the growth of M. tuberculosis on cholesterol through a deep sequencing-based mapping approach (Table 1). Independent studies confirm the genes identified to be important for M. tuberculosis growth on cholesterol [19,22,25,29,30,41]. A standalone set of genes/proteins predicted to be specifically required for growth on cholesterol is presented in Table S1.

Cholesterol Catabolic Genes Proven to be or Predicted to be Essential for Survival of M. Tuberculosis in Macrophage Cells and in Murine Infection
In the article by Ouellet et al. [19], some of the cholesterol catabolic genes of M. tuberculosis were specified as genes proven to be essential for survival in macrophage cells and in murine infection (Table 1), or genes predicted to be essential for survival in macrophage cells and in murine infection (Table 1). Of the 24 genes listed in Table 1 that are proven to be essential for survival in macrophage cells and in murine infection, 17 genes were predicted to be specifically required for growth on cholesterol by Griffin et al. [26] and other studies [22,25,26,29,30,42]. A standalone set of genes/proteins proven to be essential for survival of M. tuberculosis in macrophage cells and in murine infection are presented in Table S2. Genes predicted to be essential for survival of M. tuberculosis in macrophage cells and in murine infection are presented in Table S3.

Genes/Proteins that are Up-Regulated during Growth on Cholesterol
Van Der Geize et al. [25] predicted a total of 28 genes to be involved in cholesterol catabolism in M. tuberculosis H37Rv. Fifty-one genes specifically expressed during growth on cholesterol in Rhodococcus jostii are also found in an 82-gene cluster in the M. tuberculosis and M. bovis bacillus Calmette-Guérin (BCG) genomes. To annotate the cholesterol catabolic genes, the researchers compared the sequence similarity of the gene products of R. jostii RHA1 and M. tuberculosis H37Rv strains and compiled a table with 28 genes annotated for M. tuberculosis H37Rv (Table 1). Independent studies confirmed the importance of these genes in cholesterol catabolism by M. tuberculosis [19,22,26,30]. Out of the 28 genes, 18 were predicted to be specifically required for growth on cholesterol; 10 of these genes were proven to be essential for survival of M. tuberculosis in macrophage cells and in murine infection and 3 were predicted to be essential for survival of M. tuberculosis in macrophage cells and in murine infection (Table 1). A standalone set of genes/proteins predicted to be involved in cholesterol catabolism is presented in Table S4.

Key Cholesterol Catabolic Genes/Proteins are Not Found in a Large Number of Mycobacterial Species
Because of the omission of 1 gene (Rv3512, as mentioned in Section 3.3.4), 151 genes/proteins were selected to assess the different mycobacterial species' ability for cholesterol catabolism instead of the initial 152 (Table 1). Mycobacterial species' ability to catabolize cholesterol was predicted based on the presence of two categories of genes/proteins (i.e., cholesterol catabolic genes/proteins proven or predicted to be essential or specifically required for growth of M. tuberculosis H37Rv on cholesterol). Comprehensive comparative analysis of different categories of genes/proteins in mycobacterial species is presented in Table 2. Table 2. Comparative analysis of cholesterol degrading genes/proteins in mycobacterial species. M. tuberculosis H37Rv homologs belonging to different categories not found in mycobacterial species were listed under different categories. The relevant data on BLAST analysis, homolog proteins and protein family analysis are presented in Supplementary Datasets 1-3, respectively. The cholesterol catabolic ability of mycobacterial species was predicted following the presence of genes/proteins that are proven to be essential, and predicted to be essential or specifically required for M. tuberculosis H37Rv growth on cholesterol.       (Figure 4 and Table 2). There were 10 mycobacterial species, namely M. tuberculosis  (Table 2), thus we did not predict their ability to catabolize cholesterol, considering that the complete cholesterol catabolic pathway had not been elucidated.  Table  3 for species codes) and 151 genes/proteins on the vertical axis.
Analysis of homologous genes/proteins among MTBC species followed the same criteria as described in Section 3.3, with some exceptions for certain homologs mentioned here. For Rv0495c, homolog proteins were identified based on percentage identity, as the NCBI CDD database did not assign proteins to a particular superfamily. The percentage identity was sourced from KEGG and ranged from 99 to 100%. For Rv0805, homolog proteins in M. tuberculosis RGTB423 and M. bovis BCG  Table 3 for species codes) and 151 genes/proteins on the vertical axis.
Analysis of homologous genes/proteins among MTBC species followed the same criteria as described in Section 3.3, with some exceptions for certain homologs mentioned here. For Rv0495c, homolog proteins were identified based on percentage identity, as the NCBI CDD database did not assign proteins to a particular superfamily. The percentage identity was sourced from KEGG and ranged from 99 to 100%. For Rv0805, homolog proteins in M. tuberculosis RGTB423 and M. bovis BCG ATCC 35743 were not identified, as NCBI CDD did not yield any results. Furthermore, the KEGG database showed only 49% identity compared to other species' homolog proteins that showed 100% identity. Based on this, we concluded that mti and mbx did not have Rv0805 homolog(s). For Rv1432, there were no hit data for M. tuberculosis CAS/NITR204, and KEGG data revealed a different dehydrogenase hit. Thus, it was concluded that the homolog was not present. Upon review of Rv2416c, we found that the homolog protein sequence for M. tuberculosis Haarlem/NITR202 was truncated and presented as 28 amino acids compared to the other species' homologs with more than 360 amino acids. Therefore, it was decided that the homolog of Rv2416c had not been found in M. tuberculosis Haarlem/NITR202.

M. Chelonae-Abscessus Complex Species Lack Key Cholesterol Catabolic Genes/Proteins
All 10 MCAC species lack the homolog gene of Rv3519 from M. tuberculosis H37Rv that has been proven to be essential for survival of M. tuberculosis H37Rv in macrophage cells and in murine infection ( Figure 5 and Table 2). The function of Rv3519 is not elucidated. In addition to this, all species lack a few genes that are predicted to be essential or specifically required for growth of M. tuberculosis H37Rv on cholesterol ( Figure 5 and Table 2). Due to the absence of key cholesterol catabolic genes/proteins in MCAC species, and considering the limited information available on cholesterol catabolism in mycobacterial species, at present we do not predict MCAC species' ability to catabolize cholesterol. Analysis of homologous genes/proteins among MCAC species followed the same criteria as described in Section 3.3, with the exception of Rv1906, as reported earlier in Section 2.3.1, where more than 40% identity to M. tuberculosis H37Rv was taken as positive across all the categories, as the proteins were hypothetical.  Table 3 for species codes) with the 151 genes/proteins on the vertical axes.  Table 3 for species codes) with the 151 genes/proteins on the vertical axes.

Most of the M. Avium Complex Species Have the Ability to Catabolize Cholesterol
Among 15 MAC species, 10 were predicted to be positive for their ability to catabolize cholesterol as a carbon source ( Figure 5 and Table 2). The remaining five MAC species, M. avium subsp. paratuberculosis MAP4; M. avium subsp. paratuberculosis E1; M. avium 104; M. avium subsp. avium DJO-44271 and M. intracellulare MOTT-02, did not have the either one or two homologous genes/proteins required for growth on cholesterol (Table 2). Among 151 genes, only 6 M. tuberculosis H37Rv homologs, Rv0153c, Rv1084, Rv3779, Rv3519, Rv3528c and Rv3566A, were not found in different MAC species ( Figure 5 and Table 2). Four homologs were not found in M. avium subsp. paratuberculosis E1, and two of these are predicted to be specifically required for growth on cholesterol.
Since only a few genes/proteins were missing in the five species, it is difficult to predict their capability to utilize cholesterol as carbon source.

Mycobacterium Causing Leprosy Species Does Not Have the Ability to Catabolize Cholesterol
Two MCL species were predicted to be negative for their ability to catabolize cholesterol as a carbon source ( Figure 5 and Table 2). Quite a large number of cholesterol catabolic genes/proteins were not found in both MCL species. Furthermore, experimental evidence proved that MCL species did not have the ability to utilize cholesterol as carbon source [44].

Uncertainty about Non-Tuberculosis Mycobacterium and Saprophyte Species' Ability to Utilize Cholesterol
Among eight NTM species, three species were predicted to be positive for cholesterol utilization as a carbon source ( Figure 6 and Table 2). Of the remaining five species, M. ulcerans, M. sinense, M. kansasii 662 and M. kansasii 824 had only one missing cholesterol catabolic homolog gene/protein predicted to be essential or specifically required for M. tuberculosis H37Rv growth on cholesterol, whereas M. haemophilum had three missing cholesterol catabolic homologous genes/proteins proven to be essential (Rv3534c) and predicted to be essential or specifically required for M. tuberculosis H37Rv growth (Rv1130 and Rv3534c) on cholesterol ( Figure 6 and Table 2). Because of the absence of only a few genes/proteins, it is difficult to predict the five NTM species' cholesterol utilization ability as a carbon source.
In the SAP species, Mycobacterium sp. JS623 (msa) and M. fortuitum (mft) lacked a single homologous gene/protein, and the other SAP species had more than one missing cholesterol catabolic homologous gene/protein predicted to be essential or specifically required for M. tuberculosis H37Rv growth on cholesterol ( Figure 6 and Table 2). However, considering the contrasting lifestyle and habitat of SAP species compared to M. tuberculosis H37Rv, the role of cholesterol catabolic genes/proteins proven to be or predicted to be essential for survival of M. tuberculosis in macrophage cells and in murine infection [19] that were not found in SAP species may indicate that these genes/proteins do not play any role in cholesterol utilization by SAP species, and possibly all SAPs can utilize cholesterol as a carbon source. The latest study by Guo et al. [45] strongly supports this argument where quite a number of saprophytes, including M. vanbaalenii, have been shown to degrade cholesterol. However, experimental evidence will shed more light on SAP species' ability to metabolize cholesterol. For this reason, we did not predict SAP species' ability to utilize cholesterol as carbon source.  Table 3 for species codes) with the 151 genes/proteins on the vertical axes.  Table 3 for species codes) with the 151 genes/proteins on the vertical axes.

Species and Database
In total 93 mycobacterial species belonging to 6 different categories were used in this study ( Table 3) The criteria for separation of the mycobacterial species into six different groups were based on their characteristic features, including ecological niches, as well as the nature and site of infection as described elsewhere [46,47]. Taxonomical grouping of mycobacterial species was also taken into consideration, as described elsewhere [48]. Detailed information on species, their categories and genome database links are listed in Table 3.

Cholesterol Catabolism
Published research and review articles [19,[22][23][24][25][26][27] were consulted to create a schematic diagram of the cholesterol catabolic pathway of M. tuberculosis H37Rv, showing the intermediate metabolites and the enzymes involved in different reactions. According to Ouellet et al. [19], the cholesterol catabolic pathway of M. tuberculosis can be divided into two major phases-firstly, the initial degradation of the aliphatic side chain, and then the subsequent degradation of the A-D rings. In this study, the two phases were drawn up separately using ChemDraw software [49].

Cholesterol Catabolic Genes/Proteins Analysis in Mycobacterial Species
In total, 152 genes/proteins identified in the study as part of the cholesterol catabolic pathway in M. tuberculosis H37Rv. These were selected for comparative analysis from 92 mycobacterial species. The selected 152 protein sequences were retrieved from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, using their respective gene codes.

BLAST Analysis
The protein sequences of 152 M. tuberculosis H37Rv proteins were copied and pasted into the Basic Local Alignment Search Tool (BLAST) in the KEGG database (http://www.genome.jp/tools/blast/). The amino acid sequence was entered in the "sequence data" field, then "favorite organism code or category" was selected under the "KEGG GENES" button, "Mycobacterium" was entered in the free text field provided and the "compute" link was selected at the top. Once the BLAST was complete, the "show all results" link was selected. The resulting output was copied and pasted into an Excel program to extract the required data (organism code, enzyme code, enzyme name, identity and homology (positives)) from all of the BLAST output data, which were then tabulated under each organism name and code (Supplementary Dataset 1).

Excel Program for Extracting KEGG BLAST Data
To extract the required data from the BLAST output data obtained from the KEGG database, an Excel program written in an Excel worksheet was used. The generated program is presented in the Supplementary Materials.

Data Collection and Protein Domain/Function Analysis
All the top hit protein sequences in 92 mycobacterial species were collected (Supplementary Dataset 2) and input into the National Center for Biotechnology Information Batch Web CD-search Tool (NCBI CDD) [50]. Based on the NCBI CDD results, proteins belonging to the same family/superfamily were identified (Supplementary Dataset 3). For some proteins, no results were obtained with the NCBI CDD. Thus, the KEGG database was searched for possible functions or domains to determine whether they belonged to the same group (Supplementary Dataset 1).

Assessing the Presence or Absence of Cholesterol Catabolic Gene/Protein Homologs in Mycobacterial Species
The superfamilies, as per the NCBI CDD output, were considered to determine whether the genes/proteins from the 92 mycobacterial species matched those from M. tuberculosis H37Rv. If no data on superfamilies were available in the NCBI database, a secondary review was performed of the KEGG BLAST output data by looking at the percentage identity, percentage homology and name (and thus also the function) of each of the genes/proteins. However, the presence or absence of some proteins in different mycobacterial species was determined based on the information below.
The Rv3512 gene/protein homolog was not identified in many species in the KEGG BLAST output. This may have been due to annotation errors, as M. tuberculosis H37Rv (1998) (mtu) and M. tuberculosis H37Rv (2012) (mtv) showed different results. Furthermore, this gene is not shown to be essential for cholesterol catabolism. Thus, this gene was omitted from the analysis.
For Rv1906, more than 40% identity to M. tuberculosis H37Rv was taken as positive across all categories, as the proteins are hypothetical. According to this, the negative species were M. abscessus For Rv3566A, Rv3527 and Rv3572, more than 40% identity to M. tuberculosis H37Rv was taken as positive across all categories, as the proteins are hypothetical.
The results were tabulated per complex by colour-coding the cells according to the following criteria: red = gene homolog present; green = gene homolog not found.

Generation of Gene/Protein Heatmaps
The presence or absence of genes/proteins in mycobacterial species was shown with heatmaps following the method described elsewhere [51]. Briefly, the data were represented as −3 for gene absence (green) and 3 for gene presence (red). A tab-delimited file was imported into a Multi-Experiment Viewer (Mev) [52]. A Euclidean distance metric was used to perform hierarchical clustering. Mycobacterial species are presented on the horizontal axis (see Supplementary Dataset 4 for codes) and the 151 genes on the vertical axis.

Conclusions
The study results were intended to predict mycobacterial species' ability to utilize cholesterol as a carbon source. To achieve this task, a comprehensive cholesterol catabolic pathway was deduced from the available literature. Genes/proteins involved in the cholesterol catabolism were identified, and comprehensive comparative analysis of M. tuberculosis H37Rv homologous genes/proteins in different mycobacterial species was performed, using a newly developed software tool to extract homologous protein data. Gene/protein sequences were collected and subjected to protein family assignment and functional analysis. Finally, based on the presence of genes/proteins critical for cholesterol catabolism, mycobacterial species' ability to catabolize cholesterol was determined. There are certain points to be taken from the study on predicting the cholesterol utilization capability of mycobacterial species belonging to categories such as MAC, SAP and NTM-i.e., that most of the homolog cholesterol catabolic genes/proteins missing from these species have in fact been proven to be essential for survival of M. tuberculosis H37Rv in macrophage cells and in murine infection, but the number of these missing genes/proteins is limited to a single gene in most cases. Thus, it is difficult to predict the cholesterol utilization ability for MAC and NTM species. It is not clear whether these genes/proteins play any role in cholesterol assimilation in SAP species, since these species have different lifestyle and habitat properties compared to M. tuberculosis H37Rv. Overall, this study opened new vistas on comparative analysis of cholesterol catabolic genes/proteins in mycobacterial species, and study results should be taken as a source of information on cholesterol catabolic genes/proteins in mycobacterial species.