Lifestyles Shape the Cytochrome P450 Repertoire of the Bacterial Phylum Proteobacteria

For the last six decades, cytochrome P450 monooxygenases (CYPs/P450s), heme thiolate proteins, have been under the spotlight due to their regio- and stereo-selective oxidation activities, which has led to the exploration of their applications in almost all known areas of biology. The availability of many genome sequences allows us to understand the evolution of P450s in different organisms, especially in the Bacteria domain. The phenomenon that “P450s play a key role in organisms’ adaptation vis a vis lifestyle of organisms impacts P450 content in their genome” was proposed based on studies on a handful of individual bacterial groups. To have conclusive evidence, one must analyze P450s and their role in secondary metabolism in species with diverse lifestyles but that belong to the same category. We selected species of the phylum Proteobacteria classes, Alpha, Beta, Gamma, Delta, and Epsilon, to address this research gap due to their diverse lifestyle and ancient nature. The study identified that the lifestyle of alpha-, beta-, gamma-, delta-, and epsilon-proteobacterial species profoundly affected P450 profiles in their genomes. The study determined that irrespective of the species associated with different proteobacterial classes, pathogenic species or species adapted to a simple lifestyle lost or had few P450s in their genomes. On the contrary, species with saprophytic or complex lifestyles had many P450s and secondary metabolite biosynthetic gene clusters. The study findings prove that the phenomenon mentioned above is factual, and there is no link between the number and diversity of P450s and the age of the bacteria.


Introduction
Cytochrome P450 monooxygenases (CYPs/P450s), heme thiolate proteins, play a key role in organisms' primary and secondary metabolism. These proteins are present across the species of different biological kingdoms, including in non-living entities such as viruses [1,2]. Due to their regio-and stereo-specific oxidation properties, their biotechnological potentials have been explored in various fields of biology [3]. One of the best clude species belonging to the genera Citrobacter, Enterobacter, Escherichia, Klebsiella, Proteus, Salmonella, Serratia, Shigella, and Yersinia [12]. Information on well-known pathogens of this class is presented elsewhere [12].
Deltaproteobacteria consists of the most peculiar species in Proteobacteria [20,26]. Some species live as predators of other bacteria. For example, species belonging to the genus Bdellovibrio have known parasites of other Gram-negative bacteria. Some species belonging to the genera Desulfovibrio, Desulfobacter, and Desulfuromonas are well-known sulfatereducing bacteria using sulfate as the final electron acceptor in the electron transport chain [20,26]. Species belonging to myxobacteria display complex developmental life cycles, such as forming multicellular structures known as fruiting bodies [20,26]. These species live in the soil and feed on other bacteria or decaying material [20,26]. Apart from predatory behavior, myxobacterial species produce various secondary metabolites, and their biotechnological potentials, including cancer treatment, have been explored [27].
Epsilonproteobacteria consist of well-known enteropathogen species of humans and animals [20,26]. Species belonging to the genera Campylobacter and Helicobacter cause food poisoning, chronic gastritis, stomach ulcers, and duodenum ulcers [20,26]. These species mainly obtain energy from amino acids or tricarboxylic acid cycle intermediates [20,26]. Other species include sulfate or sulfur-reducing bacteria belonging to the genus Sulfurospirillum and plant symbiotic nitrogen-fixing bacteria Arcobacter nitrofigilis [20,26].
As indicated above, considering the ancient nature, highly diverse lifestyle, and availability of many genomes, proteobacterial species are ideal for studying the evolution and diversity of P450s and their role in secondary metabolism concerning the impact of lifestyle, if any, on the P450 repertoire in organisms.

Deltaproteobacterial Species Have the Highest P450 Diversity
Genome-wide data mining for P450s in 2696 proteobacterial species belonging to four different classes (Alpha-, Beta-, Gamma-, Delta-, and Epsilon-proteobacteria) revealed the presence of P450s only in 764 species (28%), indicating that most of the proteobacterial species do not have P450s in their genomes (Table 1 and Table S1). Among proteobacterial species, the highest number of species with P450s was found in Betaproteobacteria (57%), followed by Alphaproteobacteria (38%), Epsilonproteobacteria (25%), Deltaproteobacteria (21%), and Gammaproteobacteria (13%). This indicates that most gammaproteobacterial species have no P450s in their genomes (Table 1 and Table S1). In a bacterial group, a few species having P450s is not unique, as observed in Firmicutes [13]. Genera level analysis revealed that some of the species belonging to a particular genus in different classes have no P450s in their genomes (Table S1). Based on the number of species analyzed, we conclude that species belonging to the genera include Neisseria, Taylorella, and Kinetoplastibacterium of Betaproteobacteria; Geobacter, Desulfovibrio, Pseudodesulfovibrio of Deltaproteobacteria, and Helicobacter (except for Helicobacter winghamensis) of Epsilonproteobacteria does not have P450s (Table S1). A recent study identified that species belonging to the genera Shewanella, Aeromonas, Haemophilus of Gammaproteobacteria and Rickettsia, Bartonella, Ehrlichia, Wolbachia, and Anaplasma of Alphaproteobacteria also do not have P450s in their genomes [12,13]. A comparative analysis of P450s revealed the presence of the highest number of P450s in alphaproteobacterial species (874 P450s), and the lowest number of P450s is in epsilonproteobacterial species (53 P450s) ( Table 1). The P450 count in other proteobacterial classes is as follows: beta-, 603 P450s; delta-, 333 P450s; and gamma-proteobacterial species, 277 P450s ( Table 1). The average number of P450s was found to be highest in delta-(14 P450s), followed by alpha-species (4 P450s), beta-and gamma-(2 P450s), and epsilon-proteobacterial species (single P450) ( Table 1). Among 764 species belonging to the four different proteobacterial classes, the highest number of P450s was found in Archangium gephyra (56 P450s), followed by Archangium violaceum (43 P450s), Cystobacter fuscus (42 P450s), Melittangium boletus (28 P450s), Chondromyces crocatus (26 P450s) and Minicystis rosea (24 P450s) of Deltaproteobacteria (Table S1). The P450 count in other Proteobacteria classes is as follows: a single to nine P450s in betaproteobacterial species; a single P450 to six P450s in gammaproteobacterial species [12]; a single to seventeen P450s in alphaproteobacterial species [15]; and only one P450 found in epsilonproteobacterial species (Table S1). Detailed information on genera, species, and P450 information for Beta-, Deltaand Epsilon-proteobacteria is presented in Table S1. Reference [15] This study [12] This study This study

Proteobacterial Species Have Highly Diverse P450s
Annotation and assigning the family and subfamily of the P450s were done as per International P450 Nomenclature Committee rules [28][29][30] that includes phylogenetic analysis ( Figure 1). the 2139 P450s from five different proteobacterial classes can be grouped into 292 P450 families and 564 P450 subfamilies (Table S2). As shown in Figure 1, P450s belonging to the same family are grouped, indicating the correct annotation of P450s. However, a few CYP107 P450s are scattered across the evolutionary tree ( Figure 1). This phenomenon was also observed previously for Streptomyces species P450s [10,11]. It has been hypothesized that the phylogenetic-based annotation of P450s could detect similarity cues beyond a simple percentage identity cutoff [10,11]. Beta-, delta-, and epsilon-proteobacterial species P450s identified in this study, along with their protein sequences and species, are presented in Table S3.
Among proteobacterial species, alpha-had the highest number of P450 families (143 families), followed by gamma-(81 families), beta-(79 families), and delta-(74 families), indicating high P450 family diversity in alpha-proteobacterial species (Table 1). In contrast, epsilonproteobacterial species have only 2 P450 families ( Table 1 and Table S2). Among P450 families, CYP107 has the highest number of members (94 P450s), followed by CYP153 (84 P450s), CYP229 (74 P450s), CYP202 (70 P450s), and CYP116 (62 P450s) ( Figure 2A and Table S2). Proteobacterial species belonging to different classes expanded a particular P450 family in their genomes ( Figure 2B-F and Table S2), suggesting the important role of this dominant P450 family in their physiology. The dominant P450 families in different proteobacterial species are CYP202 in alpha-, CYP116 in beta-, CYP113 and CYP107 in gamma-, CYP264 and CYP107 in delta-, and CYP172 in epsilon-proteobacterial species ( Figure 2B-F and Table S2). The analysis of the P450 subfamilies revealed the presence of 564 subfamilies where dominant P450 families have the highest number of subfamilies, indicating further diversity at the subfamily level (Table 1 and Table S2). Interestingly, subfamily level preference was observed in proteobacterial species where a particular subfamily has more members ( Table 1 and Table S2), indicating the importance of these P450s in their physiology. The analysis of subfamilies among proteobacterial species revealed the presence of the highest P450 subfamilies in alpha-(214 subfamilies), followed by delta-(171 subfamilies), beta-(119 subfamilies), gamma-(102 subfamilies), and epsilon-proteobacterial species (2 subfamilies). This indicates that the alphaproteobacterial species have the highest number of P450 families and subfamilies in their genomes. P450 families that are expanded in these species and proteobacterial classes were highlighted in different colors and indicated in the figure. P450 protein sequences used for constructing the phylogenetic tree are presented in Table S3. A high-resolution phylogenetic tree is provided in Figure  S1.
Among proteobacterial species, alpha-had the highest number of P450 families (143 families), followed by gamma-(81 families), beta-(79 families), and delta-(74 families), indicating high P450 family diversity in alpha-proteobacterial species (Table 1). In contrast, epsilonproteobacterial species have only 2 P450 families (Tables 1 and S2). Among P450 families, CYP107 has the highest number of members (94 P450s), followed by CYP153 (84 P450s), CYP229 (74 P450s), CYP202 (70 P450s), and CYP116 (62 P450s) ( Figure  2A and Table S2). Proteobacterial species belonging to different classes expanded a particular P450 family in their genomes (Figure 2B-F and Table S2), suggesting the important role of this dominant P450 family in their physiology. The dominant P450 families in different proteobacterial species are CYP202 in alpha-, CYP116 in beta-, CYP113 and CYP107 in gamma-, CYP264 and CYP107 in delta-, and CYP172 in epsilon-proteobacterial species ( Figure 2B-F and Table S2). The analysis of the P450 subfamilies revealed the presence of 564 subfamilies where dominant P450 families have the highest number of subfamilies, indicating further diversity at the subfamily level (Tables 1 and S2). Interestingly, subfamily level preference was observed in proteobacterial species where a particular subfamily has more members (Tables 1 and S2), indicating the importance of these P450s in their physiology. The analysis of subfamilies among proteobacterial species revealed the presence of the highest P450 subfamilies in alpha-(214 subfamilies), followed by delta-(171 P450 families that are expanded in these species and proteobacterial classes were highlighted in different colors and indicated in the figure. P450 protein sequences used for constructing the phylogenetic tree are presented in Table S3. A high-resolution phylogenetic tree is provided in Figure S1. P450 family conservation across proteobacterial species belonging to four different classes (Alpha, Beta, Gamma, Delta, and Epsilon) was carried out, except for the epsilonproteobacterial species, as they have only two P450 families. A comparative analysis of P450 families among proteobacterial species revealed the conservation of six P450 families, CYP101, CYP102, CYP105, CYP107, CYP117, and CYP152, across species belonging to four different classes (Figure 3), indicating their common ancestral origin. A moderate number of P450 families were shared among alpha-, beta-, gamma-, and delta-proteobacterial species, except for no shared P450 found between alpha-and delta-proteobacterial species (Figure 3). One of the interesting features was that many P450 families were unique in different proteobacterial species. The number of unique P450 families in proteobacterial species is as follows: 104 families in alpha-, 51 in delta-, 43 in gamma-, and 29 in betaproteobacterial species (Figure 3). This suggests the high diversity of P450 families in these species, possibly indicating their lifestyle influence on the P450 repertoire, the same as observed in other species [10][11][12][13][14][15]. One of the two P450 families in epsilonproteobacterial species, such as CYP172, was also present in gammaproteobacterial species (Figure 3 and Table S2). Two notable mentions are CYP51, the sterol 14α-demethylase, present in gammaand delta-proteobacterial species, whereas the CYP125, the cholesterol side-chain oxidase, is present in alpha-, beta-delta-proteobacterial species and not found in gammaproteobacterial species (Figure 3). The presence of CYP51 in proteobacterial species is reported earlier in support of the bacterial origin of CYP51 that ultimately passed to eukaryotes, as described elsewhere [31]. A point to be noted is that a recent study showed that alphaproteobacterial species are capable of cholesterol oxidation [15], and the presence of CYP125 in other proteobacterial species suggests this is a common phenomenon in these species.
subfamilies), beta-(119 subfamilies), gamma-(102 subfamilies), and epsilon-proteobacterial species (2 subfamilies). This indicates that the alphaproteobacterial species have the highest number of P450 families and subfamilies in their genomes.  Table S2. P450 family conservation across proteobacterial species belonging to four different classes (Alpha, Beta, Gamma, Delta, and Epsilon) was carried out, except for the epsilonproteobacterial species, as they have only two P450 families. A comparative analysis of P450 families among proteobacterial species revealed the conservation of six P450 families, CYP101, CYP102, CYP105, CYP107, CYP117, and CYP152, across species belonging to four different classes (Figure 3), indicating their common ancestral origin. A moderate number of P450 families were shared among alpha-, beta-, gamma-, and delta-proteobacterial species, except for no shared P450 found between alpha-and delta-proteobacterial species (Figure 3). One of the interesting features was that many P450 families were unique in different proteobacterial species. The number of unique P450 families in proteobacterial species is as follows: 104 families in alpha-, 51 in delta-, 43 in gamma-, and 29 in betaproteobacterial species (Figure 3). This suggests the high diversity of P450 families in these species, possibly indicating their lifestyle influence on the P450 repertoire, the same as observed in other species [10][11][12][13][14][15]. One of the two P450 families in epsilonproteobacterial species, such as CYP172, was also present in gammaproteobacterial species (Figure 3 and Table S2). Two notable mentions are CYP51, the sterol 14α-demethylase, present in gamma-and delta-proteobacterial species, whereas the CYP125, the cholesterol side-chain oxidase, is present in alpha-, beta-delta-proteobacterial species and not found in gammaproteobacterial species (Figure 3). The presence of CYP51 in proteobacterial species is reported earlier in support of the bacterial origin of CYP51 that ultimately passed to eukaryotes, as described elsewhere [31]. A point to be noted is that a recent study showed that alphaproteobacterial species are capable of cholesterol oxidation [15], and the presence of CYP125 in other proteobacterial species suggests this is a common phenomenon in these species.
An analysis of the P450 diversity percentage revealed that deltaproteobacterial species have the highest P450 diversity percentage (0.97%), and betaproteobacterial species  have the lowest P450 diversity percentage (0.05%) ( Table 1). This indicates the presence of highly diverse P450s in deltaproteobacterial species. A point to be noted is that despite having the highest number of P450 families and subfamilies, the P450 diversity percentage in alphaproteobacterial species was found to be lowest compared to deltaproteobacterial species (Table 2). This is partly due to that fact that many alphaproteobacterial species were analyzed in this study compared to deltaproteobacterial species. In the future, the availability of more deltaproteobacterial species genomes will undoubtedly provide more insights into this aspect. A detailed analysis of the families, subfamilies, and member count is presented in Table S2. The analysis of P450s that are part of secondary metabolite biosynthetic gene clusters (smBGCs) revealed that only 22% of proteobacterial species P450s from four different classes (Alpha-, Beta-, Gamma-, and Delta-) are involved in the production of secondary metabolites (Table 1). P450s from epsilonproteobacterial species were not part of smBGCs (Table  Table S2. An analysis of the P450 diversity percentage revealed that deltaproteobacterial species have the highest P450 diversity percentage (0.97%), and betaproteobacterial species have the lowest P450 diversity percentage (0.05%) ( Table 1). This indicates the presence of highly diverse P450s in deltaproteobacterial species. A point to be noted is that despite having the highest number of P450 families and subfamilies, the P450 diversity percentage in alphaproteobacterial species was found to be lowest compared to deltaproteobacterial species (Table 2). This is partly due to that fact that many alphaproteobacterial species were analyzed in this study compared to deltaproteobacterial species. In the future, the availability of more deltaproteobacterial species genomes will undoubtedly provide more insights into this aspect. A detailed analysis of the families, subfamilies, and member count is presented in Table S2.

More P450s Are Involved in Secondary Metabolism in Delta-, Compared to Alpha-, Gamma-, and Beta-Proteobacterial Species
The analysis of P450s that are part of secondary metabolite biosynthetic gene clusters (smBGCs) revealed that only 22% of proteobacterial species P450s from four different classes (Alpha-, Beta-, Gamma-, and Delta-) are involved in the production of secondary metabolites (Table 1). P450s from epsilonproteobacterial species were not part of smBGCs (Table S4), indicating their role was confined only to primary metabolism. Most of the epsilonproteobacterial species did not have the smBGCs in their genomes, and the ones that had only had a single smBGCs in their genome (Table S4). In contrast, species belonging to classes Alpha, Gamma, Beta, and Delta have many smBGCs in their genomes (Table S4). The percentage of P450s part of smBGCs in proteobacterial species was found to be highest compared to other bacterial species such as Cyanobacteria (8%) [14], mycobacterial species (11%) [10], and Firmicutes species (18%) [13] and lowest with only one percent compared to Streptomyces species (23%) [10,11]. The analysis of smBGCs P450s revealed that betaproteobacterial species have the highest number of P450s part of smBGCs (107 P450s), followed by delta-(69 P450s), gamma-(49 P450s), and alpha-proteobacterial species (21 P450s) ( Table 2). However, when the percentage of P450s part of BGCs was calculated, deltaproteobacterial species had the highest percentage of P450s as part of smBGCs (21%). In contrast, it was 18% for beta-and gamma-proteobacterial species, and it was only 2% for alphaproteobacterial species (Table 1), indicating deltaproteobacterial species P450s play a prominent role in secondary metabolism. Furthermore, deltaproteobacterial species had the highest number of P450 families as part of smBGCs (37 families) compared to gamma-(22 families), beta-(18 families), and alpha-proteobacterial species, which had the lowest number of P450 families (16 families) as part of smBGCs (Table 1). An interesting pattern, such as no correlation was observed between the dominant P450 family in species and the dominant P450 family as part of smBGCs in the same group of species (Tables 1 and 2). This means a P450 family can be dominantly present in species, but it may not play a dominant role in secondary metabolism. CYP202, CYP116, CYP113, and CYP264 are prevalent in the alpha-, beta-, gamma-and delta-proteobacterial species (Table 1), but they are not dominant concerning being part of smBGCs except for CYP107, which is prevalent in smBGCs in gamma-, and delta-proteobacterial species (Table 2).

P450 Repertoire of Proteobacterial Species Shaped by Their Lifestyle
To understand the effect of lifestyle on P450 profiles vis a vis P450s' role in organism adaptation to particular ecological niches, one should look at an organism's lifestyle and its evolutionary position in the tree of life. Based on the available literature, the evolutionary order from old to young is Proteobacteria, Firmicutes, Actinobacteria, Planctomycetacia, Cyanobacteria, Chloroflexi, Bacteroidetes [32,33]. Within Proteobacteria, from old to young, are Epsilon-, Delta-, Alpha-, Gamma, Beta-proteobacteria [32,33]. This means that proteobacterial species are ancient bacterial species compared to the other bacteria mentioned here. An analysis of P450s' profiles in these species should provide insights on P450s' evolutionary pattern and reflect the phenomenon discussed above, considering these species display extreme diversity concerning their lifestyle and ecological niches.
Among epsilonproteobacterial genera, Helicobacter and Campylobacter have the highest number of known species (Table S1). These species are enteropathogens or commensals and extract energy from amino acids and tricarboxylic acid cycle intermediates [20,26]. Due to this lifestyle, such as surviving on more straightforward carbon sources, these species have no P450s, as observed in other species [34]. Furthermore, these species have the lowest number of P450s and only two P450 families (Table 1). They also have the lowest number of smBGCs, indicating they hardly produce secondary metabolites, and none of the P450s were found to be part of smBGCs (Table S4).
Deltaproteobacterial species have complex lifestyles, such as living in soil with a predatory or saprophytic behavior (living on decaying materials) [20,26]. They form multicellular fruiting bodies resembling the eukaryotic lifestyle [20,26]. Due to this, they have the highest average number of P450s in their genome compared to other proteobacterial species (Table 1). Furthermore, the P450s part of smBGCs is the highest among proteobacterial species, suggesting these species produce a diverse array of secondary metabolites that help them survive in their ecological niches. Some of these metabolites target cellular structures and help them prey on other bacteria [20,26,27]. The number of P450s in deltaproteobacterial species and the percentage of P450s part of smBGCs is comparable with species belonging to the genera Streptomyces and Mycobacterium [10,11]. This strongly indicates that irrespective of evolutionary order of origin, the lifestyle certainly influences the P450s profiles vis a vis P450s helping organisms adapt to diverse ecological niches.
Alphaproteobacterial species are adapted to diverse ecological niches and can survive in a low-nutrient environment. In this class, human pathogenic species belonging to the genera Rickettsiaceae and Bartonellaceae have no P450s, and species belonging to the genus Brucellaceae have a single P450s, albeit some of the species have no P450s [15]. Furthermore, species belonging to the genus Acetobacter have no P450s [15]. These species are well known for their fermentation ability and adaptation to the utilization of simple sugars. Thus, they have no P450s. Loss of P450s/having few P450s due to organisms' adaptation to simpler organic nutrients previously been reported [34]. Nonetheless, other species with diverse lifestyles do have P450s in their genome in this class (Table 1 and Table S1). However, alphaproteobacterial species have very few P450s that are part of smBGCs (Table 1), indicating most of the alphaproteobacterial P450s play a role in primary metabolism.
The most striking example of loss of P450s or having few P450s in pathogenic species was reported in Gammaproteobacteria [12]. The study revealed that most pathogenic species have no P450s, mainly Citrobacter, Enterobacter, Escherichia, Klebsiella, Proteus, Salmonella, Serratia, Shigella, and Yersinia [12]. On the contrary, species of environmental importance had P450s, and 18% of P450s were part of smBGCs, indicating they are involved in producing secondary metabolites [12].
Betaproteobacteria species are one of the most heterogeneous species in Proteobacteria. Clear evidence of a pathogenic lifestyle leading to having no or fewer P450s can be found in the species belonging to the genera Bordetella, Burkholderia, Neisseria, Taylorella, Acidovorax, Ralstonia, and Xylophilus (Table S1). On the other hand, biotechnologically valuable species that produce secondary metabolites or degrade various xenobiotics have more P450s in their genome (Table S1). Due to this type of lifestyle, Betaproteobacteria has the highest number of species with P450s and the highest number of P450s part of smBGCs in Proteobacteria (Table 1).
Considering the above facts, based on the literature published on bacterial P450s and following the evolutionary order, we conclude that the lifestyle of organisms profoundly impacts P450 repertoire vis a vis P450s' key role in organisms' adaptation to ecological niches.

Species and Their Genome Database Information
Proteobacterial species belonging to the Beta-(513 species), Delta-(107 species), and Epsilon (216 species) classes that are available for public use in the Kyoto Encyclopedia of Genes and Genomes (KEGG) [35] database were used. Information on genera, species names, and species codes are presented in Table S1.

Genome Data Mining and Annotation of P450s
P450 data mining and annotation were carried out following the standard procedure described previously by our laboratory [12,13,15]. Briefly, proteomes of each bacterial species were downloaded from the KEGG and subjected to the NCBI Batch Web CD-Search Tool [36]. The result was analyzed, and proteins that belong to the P450 superfamily were selected and searched for the presence of characteristic P450 motifs, EXXR, and CXG [37,38]. Proteins that were short in amino acid length and lacked both motifs were regarded as P450 fragments, and these P450 fragments were not considered for further analysis. Proteins having both motifs were selected for the assignment of the family and subfamilies. Following the International P450 Nomenclature Committee rule [28][29][30], proteins with >40% identity and >55% identity were grouped under the same family and subfamily, respectively. P450s with less than 40% identity were assigned to a new P450 family. Beta-, delta-, and epsilon-proteobacterial species P450s identified in this study, along with their protein sequences and species, are presented in Table S3.

Phylogenetic Analysis of P450s
Phylogenetic analysis of P450s was carried out following the procedure described recently by our laboratory [12,13]. The phylogenetic tree of P450s was constructed using protein sequences (Table S3). Firstly, the MAFFT v6.864 [39] was used to align the Trex web server's protein sequences [40]. The alignments were then used to interpret the best tree by the Trex web server [40]. Lastly, a web-based tool, VisuaLife, was used to create, visualize, and color the tree [41].

Generation of P450 Profile Heat-Maps
The generation of the heat map profile was carried out using the method previously reported by our laboratory [12,13]. The data were represented as (−3) for P450 family/subtype absence (green) and (3) for P450 family/subtype presence (red). A tabdelimited file was imported into Mev (Multi-experiment viewer) [42]. Hierarchical clustering using a Euclidean distance metric was used to cluster the data. P450 families formed the vertical axis, and proteobacterial classes formed the horizontal axis. The P450 families that are shared between four different proteobacterial classes such as Alpha, Beta, Gama, and Delta, are presented in the figure.

Identification of P450s Part of smBGCs
P450s part of the smBGCs was identified using the procedure described by our laboratory [13,15]. Briefly, genome ID of beta, delta, and epsilon-proteobacterial species was submitted to anti-SMASH (antibiotics & Secondary Metabolite Analysis Shell) [43,44] to identify smBGCs. Anti-SMASH results were downloaded in gene cluster sequences and Excel spreadsheets representing species-wise cluster information. P450s that formed part of a specific gene cluster were identified by manual data mining of gene cluster sequences. Standard gene cluster abbreviation terminology available at the anti-SMASH database [43] was maintained in this study.

P450 Key Features Analysis
All calculations were carried out following the procedure reported previously by our laboratory [12]. The average number of P450s was calculated using the formula: Average number of P450s = Number of P450s/Number of species. The P450 diversity percentage was calculated using the formula: P450 diversity percentage = 100 × Total number of P450 families/Total number of P450s × Number of species with P450s. The percentage of P450s that formed part of B.G.C.s was calculated using the formula: Percentage of P450s part of B.G.C.s = 100 × Number of P450s part of BGCs/Total number of P450s present in species.

Comparative Analysis of P450s and smBGCs Data
P450s and smBGCs data for alpha-and gamma-proteobacterial species were retrieved from published articles [12,15] and used for comparative analysis.

Conclusions
Organisms change their gene pool as per their necessity to adapt to diverse ecological niches. Specific genes, amplification, expansion, gain, or loss, entirely depend on the organism's lifestyle. Here, we provide conclusive evidence of such a phenomenon concerning cytochrome P450 monooxygenases (CYPs/P450s). P450s are ubiquitously present in organisms due to their important role in primary and secondary metabolism. P450s analysis in the ancient and diverse bacterial phylum Proteobacteria revealed that pathogenic species or species adapted to living on simple carbon sources lost or have fewer P450s than saprophytes or species with complex lifestyles. In addition to this, most proteobacterial species, especially pathogens, are facultative or obligate anaerobes. This kind of lifestyle may also be the reason for them not having P450s in their genomes. Furthermore, P450s were found to play a role in secondary metabolism in saprophytes or species with complex lifestyles, and thus, these species expanded P450s in their genomes.