More P450s Are Involved in Secondary Metabolite Biosynthesis in Streptomyces Compared to Bacillus, Cyanobacteria, and Mycobacterium

Unraveling the role of cytochrome P450 monooxygenases (CYPs/P450s), heme-thiolate proteins present in living and non-living entities, in secondary metabolite synthesis is gaining momentum. In this direction, in this study, we analyzed the genomes of 203 Streptomyces species for P450s and unraveled their association with secondary metabolism. Our analyses revealed the presence of 5460 P450s, grouped into 253 families and 698 subfamilies. The CYP107 family was found to be conserved and highly populated in Streptomyces and Bacillus species, indicating its key role in the synthesis of secondary metabolites. Streptomyces species had a higher number of P450s than Bacillus and cyanobacterial species. The average number of secondary metabolite biosynthetic gene clusters (BGCs) and the number of P450s located in BGCs were higher in Streptomyces species than in Bacillus, mycobacterial, and cyanobacterial species, corroborating the superior capacity of Streptomyces species for generating diverse secondary metabolites. Functional analysis via data mining confirmed that many Streptomyces P450s are involved in the biosynthesis of secondary metabolites. This study was the first of its kind to conduct a comparative analysis of P450s in such a large number (203) of Streptomyces species, revealing the P450s’ association with secondary metabolite synthesis in Streptomyces species. Future studies should include the selection of Streptomyces species with a higher number of P450s and BGCs and explore the biotechnological value of secondary metabolites they produce.


Streptomyces Species Have Large Number of P450s
Genome-wide data mining and annotation of P450s in 203 Streptomyces species (Supplementary  Table S1) revealed the presence of 5460 P450s in their genomes ( Figure 1, Table 1, and Supplementary Dataset 1). The P450 count in the Streptomyces species ranged from 10 to 69 P450s, with an average of 27 P450s. Apart from the complete P450 sequences, pseudo-P450s (6 hit proteins), P450-fragments (114 hit proteins), P450-derived glycosyltransferase activator proteins (22 hit proteins), and P450 false-positive hits (2 hit proteins) were also found in some Streptomyces species (Supplementary Table S2). The presence of these types of P450 hit proteins in species is common and, because of the nature of these proteins, they were not included in the study for further analysis. Among Streptomyces species, Streptomyces albulus ZPM was found to have the highest number of P450s in its genome (69 P450s) followed by S. clavuligerus (65 P450s); the lowest number of P450s was found in Streptomyces sp. CNT372 and S. somaliensis DSM 40738 (10 P450s each) ( Figure 1 and Table 1). Analysis of the most prevalent number of P450s revealed that 19 P450s was the prevalent number in Streptomyces species ( Table 1). The average number of P450s in Streptomyces species was found to be higher than in Bacillus species [22] and cyanobacterial species [23], and almost the same as in mycobacterial species [21] ( Table 2). A point to be noted is that the number of species greatly influences the average number of P450s and, thus, the higher the number of species in the analysis, the better and more accurate the results, as mentioned elsewhere [20,23]. This is the reason Streptomyces species showed a slightly lower average number of P450s in their genomes compared to mycobacterial species, since only 60 species were employed in the study [21]. Thus, future annotation of P450s in more mycobacterial species will provide accurate insights into this aspect.

Streptomyces Species Have Large Number of P450s
Genome-wide data mining and annotation of P450s in 203 Streptomyces species (Supplementary  Table S1) revealed the presence of 5460 P450s in their genomes ( Figure 1, Table 1, and Supplementary Dataset 1). The P450 count in the Streptomyces species ranged from 10 to 69 P450s, with an average of 27 P450s. Apart from the complete P450 sequences, pseudo-P450s (6 hit proteins), P450-fragments (114 hit proteins), P450-derived glycosyltransferase activator proteins (22 hit proteins), and P450 false-positive hits (2 hit proteins) were also found in some Streptomyces species (Supplementary Table  S2). The presence of these types of P450 hit proteins in species is common and, because of the nature of these proteins, they were not included in the study for further analysis. Among Streptomyces species, Streptomyces albulus ZPM was found to have the highest number of P450s in its genome (69 P450s) followed by S. clavuligerus (65 P450s); the lowest number of P450s was found in Streptomyces sp. CNT372 and S. somaliensis DSM 40738 (10 P450s each) ( Figure 1 and Table 1). Analysis of the most prevalent number of P450s revealed that 19 P450s was the prevalent number in Streptomyces species ( Table 1). The average number of P450s in Streptomyces species was found to be higher than in Bacillus species [22] and cyanobacterial species [23], and almost the same as in mycobacterial species [21] ( Table 2). A point to be noted is that the number of species greatly influences the average number of P450s and, thus, the higher the number of species in the analysis, the better and more accurate the results, as mentioned elsewhere [20,23]. This is the reason Streptomyces species showed a slightly lower average number of P450s in their genomes compared to mycobacterial species, since only 60 species were employed in the study [21]. Thus, future annotation of P450s in more mycobacterial species will provide accurate insights into this aspect.

CYP107 Family Was Found to Be Dominant and Conserved in 203 Streptomyces Species
Analysis of P450 families and subfamilies in 203 Streptomyces species revealed that 5460 P450s could be grouped into 253 P450 families and 698 P450 subfamilies ( Table 2 and Supplementary Table S3). Among Streptomyces species, S. clavuligerus had the highest number of P450 families (30) and P450 subfamilies (58) in its genome (Table 1). Although S. rimosus rimosus ATCC 10970 had the same number of P450 families as S. clavuligerus, the number of subfamilies was the third highest (52 subfamilies) ( Table 1). One interesting observation is that the species with the highest number of P450s did not have the highest number of P450 families, suggesting that some of the P450 families were populated (bloomed). Blooming of P450 families is common across species, and this phenomenon has been observed in different species belonging to different biological kingdoms [24,26,[34][35][36]. Phylogenetic analysis revealed that some of the P450 families were scattered across the evolutionary tree ( Figure 1). This phenomenon was also observed previously for Streptomyces species P450s, and it has been hypothesized that the phylogenetic-based annotation of P450s could be detecting similarity cues beyond a simple percentage identity cutoff [20]. Analysis of P450 families in the 155 Streptomyces species used in this study revealed the presence of 38 new P450 families, i.e., CYP1200A1, CYP1216A1, CYP1223A1, CYP1228A1, CYP1236A1, CYP1238A1, CYP1265A1, CYP1279A1, CYP1369A1, CYP1432A1, CYP1518A1, CYP1529A1, CYP1543A1, CYP1568A1, CYP159A1, CYP1607A1, CYP1658A1, CYP1759A1, CYP1810A1, CYP1832A1, CYP1866A1, CYP1896A1, CYP1920A1, CYP1929A1, CYP1931A1, CYP1940A1, CYP1941A1, CYP1943A1, CYP1972A1, CYP1984A1, CYP1994A1, CYP2076A1, CYP2080A1, CYP2134A1, CYP2180A1, CYP2349A1, CYP2427A1, and CYP2723A1. A detailed analysis of the number of new P450 families found in different Streptomyces species is presented in Supplementary Table S2. Among the P450 families, the CYP107 family was found to be dominant, with 1 235 P450s in Streptomyces species, followed by CYP105 with 684 P450s, CYP157 with 525 P450s, and CYP154 with 510 P450s (Figure 2 and Supplementary Table S3), indicating the possible blooming of these families in Streptomyces species, as observed in species belonging to different biological kingdoms [24,26,[34][35][36]. It is interesting to note that the CYP107 family was also found to be dominant in the Bacillus species [22], indicating its dominant role in the synthesis of secondary metabolites in both the Streptomyces and Bacillus genera. An interesting pattern was observed when comparing subfamily diversity in the dominant P450 families ( Figure 2, Table 3, and Supplementary Table S3). P450 families such as CYP107, CYP105, CYP183, and CYP113 had the highest diversity at the subfamily level, as numerous subfamilies were found in these families (Supplementary Table S3). This phenomenon of the highest diversity in P450 families being found in Streptomyces species is not uncommon, and this proved to be the key contributor in the production of diverse secondary metabolites in Streptomyces species compared to mycobacterial species [20]. Strong support for this argument is the fact that the CYP105 P450 family members in Streptomyces species have been shown to be involved in oxidation of numerous endogenous and exogenous compounds and in the generation of different secondary metabolites [32]. However, in contrast to the diversity at subfamily level for the P450 families CYP107, CYP105, CYP183, and CYP113, the rest of the dominant P450 families had single or double or triple subfamilies, indicating subfamily-level blooming in these P450 families (Table 3).
families such as CYP107, CYP105, CYP183, and CYP113 had the highest diversity at the subfamily level, as numerous subfamilies were found in these families (Supplementary Table S3). This phenomenon of the highest diversity in P450 families being found in Streptomyces species is not uncommon, and this proved to be the key contributor in the production of diverse secondary metabolites in Streptomyces species compared to mycobacterial species [20]. Strong support for this argument is the fact that the CYP105 P450 family members in Streptomyces species have been shown to be involved in oxidation of numerous endogenous and exogenous compounds and in the generation of different secondary metabolites [32]. However, in contrast to the diversity at subfamily level for the P450 families CYP107, CYP105, CYP183, and CYP113, the rest of the dominant P450 families had single or double or triple subfamilies, indicating subfamily-level blooming in these P450 families (Table 3).  P450 family conservation analysis revealed that the CYP107 family is conserved in all 203 Streptomyces species (Figure 3

Numerous P450s Involved in Secondary Metabolite Production in Streptomyces Compared to other Bacterial Species
Analysis of 144 Streptomyces species' genomes revealed the presence of 4457 BGCs in their genomes (Table 2 and Supplementary Table S4). The number of BGCs found in 144 Streptomyces species was found to be higher than in mycobacterial, Bacillus, and cyanobacterial species (Table 2), indicating the superiority of the Streptomyces species in producing secondary metabolites; two-thirds of the antibiotics used in the world currently come from these species [28]. The average number of BGCs in Streptomyces species was found to be double compared to mycobacterial species and close to four times higher than that in Bacillus and cyanobacterial species ( Table 2). Analysis of BGCs revealed that a large proportion of Streptomyces species' P450s are part of BGCs compared to other bacterial species; 32% of Streptomyces species' P450s were found to be part of BGCs compared to 22% in Bacillus species, 11% in mycobacterial species, and 8% in cyanobacterial species (Table 2). A total of 1231 P450s were found to be part of BGCs belonging to 135 P450 families (Figure 4 and Supplementary Table  S5). Among 135 P450 families, P450s belonging to the CYP107 family were dominantly present in BGCs, followed by CYP105, CYP157, and CYP154 (Figure 4 and Supplementary Table S5). This clearly suggests that the P450 families that are bloomed in Streptomyces species are actually involved in the production of secondary metabolites. This strongly supports the proposed hypothesis that in

Numerous P450s Involved in Secondary Metabolite Production in Streptomyces Compared to Other Bacterial Species
Analysis of 144 Streptomyces species' genomes revealed the presence of 4457 BGCs in their genomes ( Table 2 and Supplementary Table S4). The number of BGCs found in 144 Streptomyces species was found to be higher than in mycobacterial, Bacillus, and cyanobacterial species (Table 2), indicating the superiority of the Streptomyces species in producing secondary metabolites; two-thirds of the antibiotics used in the world currently come from these species [28]. The average number of BGCs in Streptomyces species was found to be double compared to mycobacterial species and close to four times higher than that in Bacillus and cyanobacterial species ( Table 2). Analysis of BGCs revealed that a large proportion of Streptomyces species' P450s are part of BGCs compared to other bacterial species; 1231 P450s in Streptomyces species compared to 112 in Bacillus species, 204 in mycobacterial species, and 27 in cyanobacterial species (Table 2). A total of 1231 P450s were found to be part of BGCs belonging to 135 P450 families (Figure 4 and Supplementary Table S5). Among 135 P450 families, P450s belonging to the CYP107 family were dominantly present in BGCs, followed by CYP105, CYP157, and CYP154 ( Figure 4 and Supplementary Table S5). This clearly suggests that the P450 families that are bloomed in Streptomyces species are actually involved in the production of secondary metabolites. This strongly supports the proposed hypothesis that in Streptomyces species, P450s are evolved to generate secondary metabolites, thus helping these bacteria to thrive in their environment [20]. In order to assess the in silico results generated by this study, in which a large number of Streptomyces species P450s were predicted to be involved in secondary metabolite production, we performed an extensive literature review to identify Streptomyces P450s involved in the production of secondary metabolites. As shown in Table 4, a large number of P450s belonging to different P450 families, as predicted in this study, were found to be involved in the production of different secondary metabolites. This strongly supports the notion that the P450s identified as part of different BGCs in this study produce secondary metabolites.

of 23
which a large number of Streptomyces species P450s were predicted to be involved in secondary metabolite production, we performed an extensive literature review to identify Streptomyces P450s involved in the production of secondary metabolites. As shown in Table 4, a large number of P450s belonging to different P450 families, as predicted in this study, were found to be involved in the production of different secondary metabolites. This strongly supports the notion that the P450s identified as part of different BGCs in this study produce secondary metabolites.  Supplementary Table S5. Analysis of the linkage between a particular P450 family and BGC revealed that some P450s are linked to a particular BGC (Supplementary Table S4), indicating horizontal transfer of BGCs between Streptomyces species. Streptomyces P450s such as CYP283A are linked to bacteriocin and bottromycin; CYP113K3 is linked to Bacteriocin-Nrps, CYP124G is linked to melanin, and CYP105A is linked to NRPS and butyrolactone. A point to be noted is that horizontal transfer of BGCs among different organisms is well-documented in the literature [37].  Analysis of P450 BGCs revealed the presence of 235 types of BGCs, where the BGC type, such as terpene, was dominant, followed by T1PKS, NRPS, and T3PKS (Figure 4 and Supplementary Table S5). A detailed analysis of P450s that are part of BGCs and types of BGCs containing P450s is presented in Supplementary Table S5. Analysis of the linkage between a particular P450 family and BGC revealed that some P450s are linked to a particular BGC (Supplementary Table S4), indicating horizontal transfer of BGCs between Streptomyces species. Streptomyces P450s such as CYP283A are linked to bacteriocin and bottromycin; CYP113K3 is linked to Bacteriocin-Nrps, CYP124G is linked to melanin, and CYP105A is linked to NRPS and butyrolactone. A point to be noted is that horizontal transfer of BGCs among different organisms is well-documented in the literature [37].   CYP248A1

Streptomyces thioluteus
Aureothin biosynthesis [83] Note: For some P450s, protein notations are given in parentheses. These P450s were annotated in this study (indicated with asterisk superscript) and previously (indicated with exclamation mark) [20] by browsing the individual biosynthetic gene-cluster sequences reported in the literature. To enable readers to match the P450s with the published literature, we have provided protein notations in the parentheses. If known, the name of the secondary metabolite of which P450s are involved in production is indicated in the table.

Information on Streptomyces Species and Genome Database
In total, 203 Streptomyces species genomes (permanent and finished draft genomes) available for public use at the Joint Genome Institute Integrated Microbial Genomes and Microbiomes (JGI IMG/M) [99] and Kyoto Encyclopedia of Genes and Genomes (KEGG) [100] were used in this study. The 203 Streptomyces species included 48 Streptomyces species for which P450s and BGCs were annotated previously [20]. For these 48 species, P450 and BGCs data were retrieved from published articles and used in the study [20]. Thus, 155 Streptomyces species were data-mined for P450s and BGCs in this study. Information on the species used in the study is provided in Supplementary  Table S1.

Genome Data Mining and Identification of P450s
Identification and annotation of P450s in Streptomyces species were carried out following a method described elsewhere [20][21][22]. Briefly, each Streptomyces species genome available at JGI IMG/M [99] was searched for P450s using the InterPro code "IPR001128". The hit protein sequences were then searched for the presence of P450 characteristic motifs such as EXXR and CXG [101]. Proteins having one of these motifs were considered pseudo-P450s, and proteins that were short in amino acid length and lacking both motifs as P450 fragments. Neither the pseudo-P450s nor the P450 fragments were considered for further analysis.

Allocating Family and Subfamily to P450s
The hit proteins that were collected were subjected to BLAST analysis against bacterial P450s at the website http://www.p450.unizulu.ac.za/. Based on the International P450 Nomenclature Committee rule [17][18][19], proteins with a percentage identity greater than 40% were assigned to the same family as named homolog P450s, and those that had greater than 55% identity were assigned to the same subfamily as named homolog P450s. Proteins that had a percentage identity less than 40% were assigned to a new family.

Streptomyces P450 Phylogenetic Analysis
Phylogenetic analysis of the Streptomyces P450s was carried out following the method described in the literature [102]. First, the Streptomyces P450 sequences were aligned using the MAFFT v6.864 program with an automatically optimized model option [103], available at the Trex web server [104]. The alignments were then automatically subjected to inference and optimization of the tree by the Trex web server with its embedded weighting procedure, and the best inferred tree was visualized and annotated by iTOL [105].

Streptomyces P450 Profile Heat-Maps
P450 profile heat-maps were generated following a method published previously [22,27] to check the presence and absence of P450s in Streptomyces species. Briefly, a tab-delimited file was imported into Multi-Experiment Viewer (Mev) [106] and hierarchical clustering using a Euclidean distance metric was used to cluster the data. In total, 203 Streptomyces species formed the vertical axis and P450 family numbers formed the horizontal axis. Data were presented as −3 for family absence (green) and 3 for family presence (red).

Identification of P450s That Are Part of Secondary Metabolite BGCs
Secondary metabolite BGCs analysis and identification of P450s that are part of these BGCs were carried out following the procedure mentioned previously [102], with slight modification. For each Streptomyces species genome available at JGI IMG/M, the secondary metabolite BGCs were searched for the presence of P450s. The DNA sequence of BGCs with P450s was collected and formatted to fasta format using PSPad editor (http://www.pspad.com/en/). The fasta-formatted files were then used to identify the type of cluster and most similar known clusters using the Antibiotics and Secondary Metabolite Analysis Shell (anti-SMASH) program [107]. The results obtained were recorded on Excel spreadsheets and represented as species-wise BGCs, type and similar known BGCs, percentage similarity to known BGCs, and P450s that are part of specific BGCs. Some Streptomyces species genome IDs did not pass through anti-SMASH analysis, and thus these species were not included in P450s analysis as part of secondary metabolite BGCs. A list of Streptomyces species subjected to anti-SMASH analysis is presented in Supplementary Table S4.

Data Analysis
All calculations were done following the method described in the literature [23]. The average number of P450s was calculated using the formula: Average number of P450s = Number of P450s/ Number of species. The average number of BGCs was calculated using the formula: Average number of BGCs = Total number of BGCs/Number of species. The percentage of P450s that formed part of BGCs was calculated using the formula: Percentage of P450s part of BGCs = 100 × Number of P450s part of BGCs /Total number of P450s present in species. For comparative analysis of P450s and BGCs, information for bacterial species belonging to the genera Bacillus [22], Mycobacterium [21], and Cyanobacteria [23] was resourced from published articles.

Conclusions
In the last five decades, research on cytochrome P450 monooxygenases (CYPs/P450s) has mainly focused on their function and structural aspects, with little focus on evolutionary analysis, especially in microbes. The availability of a large number of microbial species genomes gives us an opportunity to focus on exploring the evolutionary aspects of P450s. Because a typical nomenclature system that has been established for P450s, each species genome needs to be data-mined and P450 proteins need to be annotated (assigning family and subfamily). In this way, researchers around the world can make use of uniform P450 names. In this study, we therefore annotated a large number of P450s in 203 Streptomyces species and found 38 new P450 families. Some P450 families were found to be bloomed in Streptomyces species even at the subfamily level. Comparative analysis of key P450 features among different bacterial species revealed that Streptomyces species had a greater number of P450s, more secondary metabolite BGCs, and the highest number of P450s as part of BGCs compared to the bacterial species belonging to the genera Bacillus, Mycobacterium, and Cyanobacteria. This further confirmed that the higher the number of P450s, the higher the secondary metabolite diversity in a species. This was true for Streptomyces species, as large number of P450s were found to be involved in the generation of diverse secondary metabolites. One interesting phenomenon observed was the linkage between a particular P450 family and BGC. This indicates that these BGCs were horizontally transferred among different Streptomyces species. This study is a good addition to the comparative analysis of P450s and BGCs among different bacterial populations. Data presented in the study will serve as a reference for further annotation of P450s in Streptomyces species and other bacterial species. In silico predicted BGCs need to be experimentally validated to assess the secondary metabolites' biological properties.