An Unprecedented Number of Cytochrome P450s Are Involved in Secondary Metabolism in Salinispora Species

Cytochrome P450 monooxygenases (CYPs/P450s) are heme thiolate proteins present in species across the biological kingdoms. By virtue of their broad substrate promiscuity and regio- and stereo-selectivity, these enzymes enhance or attribute diversity to secondary metabolites. Actinomycetes species are well-known producers of secondary metabolites, especially Salinispora species. Despite the importance of P450s, a comprehensive comparative analysis of P450s and their role in secondary metabolism in Salinispora species is not reported. We therefore analyzed P450s in 126 strains from three different species Salinispora arenicola, S. pacifica, and S. tropica. The study revealed the presence of 2643 P450s that can be grouped into 45 families and 103 subfamilies. CYP107 and CYP125 families are conserved, and CYP105 and CYP107 families are bloomed (a P450 family with many members) across Salinispora species. Analysis of P450s that are part of secondary metabolite biosynthetic gene clusters (smBGCs) revealed Salinispora species have an unprecedented number of P450s (1236 P450s-47%) part of smBGCs compared to other bacterial species belonging to the genera Streptomyces (23%) and Mycobacterium (11%), phyla Cyanobacteria (8%) and Firmicutes (18%) and the classes Alphaproteobacteria (2%) and Gammaproteobacteria (18%). A peculiar characteristic of up to six P450s in smBGCs was observed in Salinispora species. Future characterization Salinispora species P450s and their smBGCs have the potential for discovering novel secondary metabolites.


Introduction
Cytochrome P450 monooxygenases (CYPs/P450s) comprise a superfamily of hemethiolate proteins. P450s are present in all species of different biological kingdoms, including in viruses considered non-living entities [1,2]. This suggests that these enzymes play an important role in species' primary and secondary metabolism. These enzymes were initially identified as monooxygenases due to their ability to introduce one oxygen atom into a substrate [3]. Subsequent research revealed that P450s are catalytically diverse enzymes performing some unusual enzymatic reactions [4][5][6][7][8]. The regio-and stereospecific oxidation of many substrates by P450s caught the attention of researchers for biotechnological exploration of these enzymes [9][10][11][12]. P450s reactions are essential in designing drugs such that drug toxicity of prodrugs is primarily assessed against these Microorganisms 2022, 10, 871 3 of 20 2022). Information on the species and their genome IDs used in the study is provided in Table S1.

Genome Data Mining and Identification of P450s
Genome data mining and identification of P450s in Salinispora species were carried out following the protocol described elsewhere [56,57]. Each Salinispora species genome available at JGI IMG/M [54,55] was searched for P450s using the InterPro code "IPR001128". The hit protein sequences were then searched for the presence of P450 characteristic motifs such as EXXR and CXG [58,59]. Proteins with one of these motifs or short amino acid length are considered P450-fragments. P450 fragments were not considered for the final P450 family and subfamily count.

Assigning Family and Subfamily to P450s
Above selected P450s were assigned to different families and subfamilies based on the International P450 Nomenclature Committee rule [60][61][62], proteins with a percentage identity greater than 40% were assigned to the same family as named homolog P450s, and those that had greater than 55% identity were assigned to the same subfamily as named homolog P450s. Proteins with a percentage identity of less than 40% were assigned to a new family. Salinispora species P450s, along with P450-fragments, are presented in Table S2.

Phylogenetic Analysis of P450s
Phylogenetic analysis of P450s was carried out following the procedure described elsewhere [63,64]. The phylogenetic tree of P450s was constructed using protein sequences. Firstly, the MAFFT v6.864 [65] was used to align the Trex web server's protein sequences [66]. The alignments were then used to interpret the best tree by the Trex web server [66]. Finally, the best-inferred tree was visualized, colored, and generated by a web-based tool, VisuaLife [67].

Salinispora Species P450s Profile Heat-Maps
P450 profile heat-maps were generated following a method described elsewhere [64,68] to check the presence and absence of or co-presence of or conserved nature of P450 families in Salinispora species. Briefly, a tab-delimited file was imported into Multi-Experiment Viewer (Mev) [69], and hierarchical clustering using a Euclidean distance metric was used to cluster the data. 126 Salinispora species formed the vertical axis, and P450 families formed the horizontal axis. Data were presented as −3 for family absence (green) and 3 for family presence (red).

Identification of P450s Part of smBGCs
P450s that are part of smBGCs were identified following the method described elsewhere [56,57]. Briefly, for each Salinispora species genome available at JGI IMG/M [54,55], the smBGCs were searched for the presence of P450s using the P450 gene ID. The cluster type is noted if a P450 is found as part of the cluster. Results were recorded on Excel spreadsheets and represented species-wise smBGCs, smBGC type, and P450s part of specific smBGCs. Among 126, only 103 Salinispora species smBGCs information is available at JGI IMG/M [54,55]. Thus the same 103 Salinispora species smBGCs were analyzed for the presence of P450s (Table S1).

Data Analysis
All calculations were carried out following the procedure reported previously by our laboratory [68]. The average number of P450s was calculated using the formula: Average number of P450s = Number of P450s/Number of species. The P450 diversity percentage was calculated using the formula: P450 diversity percentage = 100 × Total number of P450 families/Total number of P450s × Number of species with P450s. The percentage of P450s

Salinispora Species P450 Profiles
Genome-wide data mining and annotation of P450s in 126 Salinispora species revealed the presence of 2643 P450s in their genomes (Figure 1, Tables 1 and 2). The P450 count in Salinispora species ranged from 10 to 35 P450s, with an average of 21 P450s (Tables 1 and 2). Apart from the complete P450 sequences, 129 P450 fragments were also found in some Salinispora species (Table 2). P450 fragments in species are natural [58,70,74], and thus, these were excluded from further analysis. Among Salinispora species, S. arenicola CNY280 has the highest number of P450s (35 P450s), and S. pacifica CNS801 and S. pacifica CNT148 have the lowest number of P450s (10 P450s each) ( Table 2). Comparative analysis revealed that Salinispora species have the highest average number of P450s than species belonging to Cyanobacteria, Firmicutes, Alphaproteobacteria, and Gammaproteobacteria (Table 1). However, Salinispora species had the lowest average number of P450s compared to species belonging to Streptomyces and Mycobacterium (Table 1). A point to be noted is that, among bacterial species, species belonging to the phylum Actinobacteria have the highest average number of P450s (Table 1). This indicates selective enrichment of P450s in these species due to their adaptation to ecological niches vis a vis P450s, helping them adapt to diverse ecological niches described elsewhere [58,74,75]. Salinispora species P450s, along with P450-fragments, are presented in Table S2.  Species analysed  126  203  60  114  972  599  1261  Species without P450s  0  0  0  0  743  370  1091  Species with P450s  126  203  60  114  229  229  169  Percentage of species with P450s  100  100  100  100  24  38 Table S2. A high-resolution phylogenetic tree is provided in Figure S1.   Table S2. A high-resolution phylogenetic tree is provided in Figure S1.

CYP105 and CYP107 Families Are Bloomed in Salinispora Species
Based on the International P450 Nomenclature Committee Rules [60][61][62], all 2643 P450s can be grouped into 45 families and 103 subfamilies (Tables 1 and 3). Phylogenetic analysis revealed that large P450 families CYP105 and CYP107 were scattered across the evolutionary tree ( Figure 1). Previously, this phenomenon was observed for these P450 families [56,72]. Authors suggested that phylogenetic-based annotation of P450s could detect similarity cues beyond a simple percentage identity cutoff [56,72]. Except for CYP105 and CYP107, the rest of the P450s are grouped as per their families (Figure 1). A point to be noted is that most of the P450s are orthologs considering the Salinispora species analyzed in this study are different strains of three species. Comparative analysis revealed that Salinispora species have the lowest number of P450 families and subfamilies compared to other actinomycetes such as Streptomyces and Mycobacterium (Table 1).  Among Salinispora species, S. arenicola CNY280 had the highest number of P450 families (18) and P450 subfamilies (32) in its genome (Table 1). This is quite an interesting observation where a species with the highest number of P450s also had the highest number of P450 families and subfamilies. This phenomenon was not found in other actinomycetes such as Streptomyces [56] and Mycobacterium [72,73]. For example, in Streptomyces species, Streptomyces albulus ZPM had the highest number of P450s, but Streptomyces rimosus rimosus ATCC 10970, and Streptomyces clavuligerus had the highest number of P450 families and subfamilies, respectively [56]. Among mycobacterial species, Mycobacterium rhodesiae NBB3 had the highest P450s and P450 families, but M. marinum had the highest P450 subfamilies [72,73].
Analysis of P450 families and subfamilies suggested that P450s in Salinispora species bloomed (presence of more copies of the same P450 family in a species by duplication of an ancestral gene) ( Table 3). Among P450 families, the CYP105 was dominant with 600 members, followed by CYP107 with 551 members, CYP211 with 225 members, CYP125 with 164 members, CYP154 with 155 members, CYP1005 with 127 members, and CYP208 with 126 members (Table 3). These P450 families contributed more than 70% to the total P450s (Table 3). This indicates that P450 families such as CYP105, CYP107, CYP211, CYP125, and CYP154 are bloomed, whereas CYP1005 and CYP208 families are expanded in these species. Comparing the dominant P450 families revealed that CYP105 is prevalent only in Salinispora species (Table 1), where this family was second most dominant in Streptomyces species (Table 1). Interestingly, the second most dominant P450 family of Salinispora species, CYP107, was dominant in species belonging to bacterial groups Streptomyces, Firmicutes and Gammaproteobacteria (Table 1). The blooming was also observed at the subfamily level, indicating these P450s are preferred by Salinispora species for a particular reason. For example, subfamily AB was dominant with 124 members in CYP105; Subfamily AY was dominant with 116 members in CYP107, subfamily A was dominant with 128 members in CYP125, Subfamily M was dominant with 150 members, subfamily A was dominant with 126 members in CYP208, and Subfamily B dominant with 124 members in CYP211 (Table 3). Due to the blooming of specific P450s at the family level, Salinispora species had the lowest P450 diversity percentage, the same as Firmicutes species (Table 1). The blooming or expansion of P450s is a common phenomenon in organisms and is observed in other bacterial species (Table 2). It has been hypothesized that species enrich specific P450s in their genomes that are beneficial to them, particularly to adapt to ecological niches [56,72].

CYP107 and CYP125 Are Conserved in Salinispora Species
P450 family conservation analysis revealed that CYP107 and CYP125 families are conserved in 126 Salinispora species (Figure 2). Except for a few species, CYP208 (4 species), CYP105 (one species), CYP211 (one species), and CYP1005 (2 species), the rest of the Salinispora species have these families ( Figure 2). In addition to this, P450 families such as CYP154, CYP244, CYP245, CYP166, CYP248, and CYP1056 are co-present in many species (Figure 2). This suggests a prominent role of these P450 families in these species, possibly in secondary metabolism as observed in other bacterial species [58,72,74]. Conservation or co-presence of specific P450s in other bacterial species was also reported. The CYP107 family is conserved in all 203 Streptomyces species, and P450 families such as CYP156, CYP105, CYP154, and CYP157 are also present in the majority of the Streptomyces species [56]. Ten P450 families, CYP51, CYP123, CYP125, CYP130, CYP135, CYP136, CYP138, CYP140, CYP144, and CYP1128, were conserved in mycobacterial species [73]. Analysis of conservation of P450 families in 229 Firmicutes species and 114 cyanobacterial species revealed no conservation of the P450 family [70,71]. Still, some of the P450 families were co-present in most of the species. The P450 families CYP152, CYP107, CYP012, and CYP109, were found to be a co-presence in most Firmicutes species [70], and the P450 families CYP110 and CYP120 were found to be a co-presence in most cyanobacterial species [71].
If a P450 family is conserved or few P450 families are co-presence, these families play an important role in a species's primary-or secondary-metabolism. Previous studies showed that this type of P450s prominently plays a role in secondary metabolism, helping species adapt to diverse ecological niches [58,59,72,74,75]. The importance of P450 families that are conserved and co-presence in Salinispora species is discussed in detail in the next section. If a P450 family is conserved or few P450 families are co-presence, these families pla an important role in a species's primary-or secondary-metabolism. Previous studi showed that this type of P450s prominently plays a role in secondary metabolism, helpin species adapt to diverse ecological niches [58,59,72,74,75]. The importance of P450 famili that are conserved and co-presence in Salinispora species is discussed in detail in the ne section.

Unprecedented Number of P450s Involved in smBGCs
Analysis of the P450s part of smBGCs revealed that many P450s (47%) are part these clusters, indicating their involvement in producing different secondary metabolit in Salinispora species (Tables 4 and S1). The percentage of P450s part of smBGCs in Sa nispora species was found to be unprecedented compared to other bacterial species, i cluding other actinomycetes Streptomyces species and mycobacterial species that had 30 and 27% of P450s as part of smBGCs (Table 1). This suggests that Salinispora species ded icated half of their P450s to the production of secondary metabolites.
smBGCs revealed a strong correlation between the dominant P450 families (Table 3) being dominant in smBGCs (Figure 3). This suggests that Salinispora species are enriched by blooming or expanding these P450 families (as discussed in the previous section) in their genome to produce secondary metabolites. Detailed information on secondary metabolite clusters, species, and P450s are shown in Table S1.
Analysis of P450 smBGCs revealed the presence of 18 types (Tables 4 and S2). Among the types, Type I PKS (Polyketide synthase) (T1PKS) was dominant with 223 clusters, followed by nonribosomal peptides (NRPS) (205 clusters) and Type II PKS (T2PKS) (76 clusters) (Tables 4 and S1). This suggests that most of the secondary metabolites produced by P450 smBGCs are T1PKS. When the P450 smBGCs were further analyzed for the number of P450s and P450 families, the dominant BGC type was not found to be dominant concerning the number of P450s being part of that smBGC type (Tables 4 and S1). NRPS had the highest number of P450s (395 P450s), followed by T1PKS (275 P450s), oligosaccharide (121 P450s), and indole (105 P450s) (Tables 4 and S1). The difference being not having more P450s despite being dominant smBGCs such as T1PKS is that the other smBGCs have more P450s per se more than one P450 being part of that type (Tables 4 and S1). This phenomenon of more than one P450 being part of smBGCs has been reported earlier in other bacterial species [75]. However, having up to 6 P450s as part of smBGCs is unprecedented (Table 4), suggesting these clusters produce diverse secondary metabolites. The P450s co-present in different Salinispora species were part of the same cluster (Table 4). Based on the arrangement of P450s concerning their family/subfamily and the number of P450s in smBGCs, it is clear that these smBGCs are orthologs (Table 4). These smBGCs are  Table S1.
Analysis of P450 smBGCs revealed the presence of 18 types (Table 4 and Table S2). Among the types, Type I PKS (Polyketide synthase) (T1PKS) was dominant with 223 clusters, followed by nonribosomal peptides (NRPS) (205 clusters) and Type II PKS (T2PKS) (76 clusters) ( Table 4 and Table S1). This suggests that most of the secondary metabolites produced by P450 smBGCs are T1PKS. When the P450 smBGCs were further analyzed for the number of P450s and P450 families, the dominant BGC type was not found to be dominant concerning the number of P450s being part of that smBGC type (Table 4 and Table S1). NRPS had the highest number of P450s (395 P450s), followed by T1PKS (275 P450s), oligosaccharide (121 P450s), and indole (105 P450s) ( Table 4 and Table S1). The difference being not having more P450s despite being dominant smBGCs such as T1PKS is that the other smBGCs have more P450s per se more than one P450 being part of that type (Table 4 and Table S1). This phenomenon of more than one P450 being part of smBGCs has been reported earlier in other bacterial species [75]. However, having up to 6 P450s as part of smBGCs is unprecedented (Table 4), suggesting these clusters produce diverse secondary metabolites. The P450s co-present in different Salinispora species were part of the same cluster (Table 4). Based on the arrangement of P450s concerning their family/subfamily and the number of P450s in smBGCs, it is clear that these smBGCs are orthologs (Table 4). These smBGCs are passed into different Salinispora species from a single ancestor before diverging into S. arenicola, S. pacifica and S. tropica.

Functional Prediction of Salinispora Species P450s
Most of the Salinispora species P450s are orphans without an assigned biological function. Based on the homolog P450s from other organisms and being part of smBGCs, some P450 functions can be predicted. CYP105 and CYP107 members are involved in the degradation/biotransformation of xenobiotics and biosynthesis of secondary metabolites [76][77][78][79][80]. CYP107 from S. arenicola CNS-205 is involved in secondary metabolite biosynthesis [53]. It catalyzes multiple oxidative rearrangement reactions in the biosynthesis of saliniketal and rifampin [53]. CYP105 and CYP107 members' enzymatic functions could help Salinispora species utilize diverse compounds as carbon sources, detoxify toxic compounds, or kill other bacterial species to thrive in the environment. It is no doubt that due to these beneficial properties, Salinispora species enriched these family members in their genomes. CYP125 members conserved in Salinispora species are cholesterol and cholest-4-en-3-one hydroxylases [81,82]. One can assume that CYP125 members possibly help Salinispora species utilize cholesterol or cholesterol-like molecules as carbon sources. Growth of S. arenicola CNS-205 on cholesterol where complete degradation of cholesterol was observed [83] strongly supports this assumption considering these species do have CYP125 in their genome.
Interestingly, the presence of CYP125 members as part of smBGCs as observed in Salinispora species (Table 4) is also observed in mycobacterial species [75], indicating CYP125 members do have other functions apart from cholesterol oxidation. CYP146 members are involved in β-hydroxytyrosine formation, a precursor for the biosynthesis of vancomycin antibiotics [84]. Interestingly, only a single member was found in Salinispora species (Table 3) and is not part of smBGCs, complicating predicting its role in these species.
CYP154 members are involved in regio-and stereo-selective hydroxylation of different steroids [85,86]. CYP154 from Nocardia farcinica IFM10152 is a bifunctional enzyme with O-dealkylation and ortho-hydroxylation activities [87]. This P450 converts formononetin, an isoflavone compound, into ortho-dihydroxy-isoflavone [87]. In Salinispora species, CYP154 members are dominant, indicating they may attribute the above-said activities to these species. However, the role of CYP154 in the generation of secondary metabolites and these compounds' properties concerning Salinispora species is of future interest ( Figure 3 and Table 4).
CYP163A and CYP163B members produce novobiocin, aminocoumarin antibiotic [88], and skyllamycin, a potent inhibitor of the platelet-derived growth factor [89]. CYP162A members are involved in peptidyl nucleoside antibiotic nikkomycin synthesis [90,91]. CYP161A members are involved in the biosynthesis of antibiotics, pimaricin [92], and amphotericin [93]. CYP113 members are involved in the production a variety of antibiotics erythromycin [94,95], tylosin [96,97] and himastatin [98,99]. The presence of the CYP161-CYP163 and CYP113 members as part of smBGCs in Salinispora species (Figure 3 and Table 4) suggests that these members are certainly involved in the production of secondary metabolites in these species.
CYP244 and CYP245 members are involved in the biosynthesis of antibiotic rapamycin [100,101]. These two P450s together as part of smBGCs clusters in Salinispora species (Table 4) indicate they are working together in producing secondary metabolite. CYP248A members are involved in the production of antibiotic aureothin [102]. Salinispora species have 63 CYP248A members (Table 3), and 40 of them are part of smBGC ( Figure 3 and Table 4), indicating their prominent role in secondary metabolites production. CYP124 members are known for their terminal hydroxylation of methyl branched-lipids in M. tuberculosis [103]. None of these members were found as part of smBGCs in Salinispora species (Table 4), indicating their limited role possibly in the oxidation of different methylated-aliphatic lipids in these species.
It is evident from the data presented in this article that close to half of Salinispora species P450s (1236 P450s) are part of smBGCs. Thus, we predict that these P450s play a role in producing different secondary metabolites characteristic of smBGC types (Table 4 and Table S2). The detailed information on species name, list of P450s part of smBGCs, their cluster information, and BGC type is presented in Table S1.

Conclusions
Salinispora species being marine organisms within the phylum Actinomycetes, are considered model organisms for studying bacterial diversity and secondary metabolite production. Compared to the genera Streptomyces and Mycobacterium, the genus Salinispora has an unprecedented number of P450s as part of secondary metabolite biosynthetic gene clusters (smBGCs), indicating a great diversity of secondary metabolites produced by these species. The presence of up to six P450s as part of smBGCs is unusual and not observed in other bacterial species. Future functional characterization of P450s sheds lighter on the untapped secondary metabolite biotechnological potentials from Salinispora species. Based on the data presented in this article and the literature published on P450s function, we predict that Salinispora species enriched or expanded specific P450s in their genome to utilize diverse compounds as carbon sources to detoxify toxic compounds or kill other bacterial species to thrive in the environment.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/microorganisms10050871/s1. Figure S1: Phylogenetic analysis of Salinispora species P450s. 2643 P450s were used to construct the tree, and the members of the eight most abundant P450 families are highlighted in different colors and indicated in the figure. P450 protein sequences used to build the tree are listed in Table S2. Table S1: Identification of P450s that are part of secondary metabolite biosynthesis tic gene clusters (smBGCs) in Salinispora species. Cluster-ID and BGC type is retrieved from Integrated Microbial Genomes & Microbiomes (IMG/M) database [54,55]. BGC Type was indicated for consistency with the standard BGC Type name terminology available in the anti-SMASH database [74]. Table S2: P450 sequences identified and annotated in Salinispora species. Each P450 is presented with its assigned name followed by gene ID (in parenthesis) and species name.

Data Availability Statement:
The authors confirm that the data supporting the findings of this study are available within the article and its Supplementary Materials.

Conflicts of Interest:
The authors declare no conflict of interest, and the funders had no role in the study's design, in the collection, analyses, or interpretation of data, in the writing of the manuscript, or in the decision to publish the results.