Bioinformatics Analysis of the Genome of Rhodococcus rhodochrous IEGM 1362, an (−)-Isopulegol Biotransformer

A genome of Rhodococcus rhodochrous IEGM 1362 was sequenced and annotated. This strain can transform monoterpene alcohol (–)-isopulegol with the formation of two novel pharmacologically promising metabolites. Nine genes encoding cytochrome P450, presumably involved in (–)-isopulegol transformation, were found in the genome of R. rhodochrous IEGM 1362. Primers and PCR conditions for their detection were selected. The obtained data can be used for the further investigation of genes encoding enzymes involved in monoterpene biotransformation.


Introduction
(-)-Isopulegol ((1R,3R,4S)-p-menth-8-en-3-ol, C 10 H 18 O, CAS 89-79-2) is a monoterpene alcohol widely found in the essential oils of plants.Using (-)-isopulegol for the synthesis of novel bioactive compounds has a number of advantages, such as its low cost, availability, and variety of properties [1].However, the biological transformation of this monoterpene is poorly investigated.There are only fragmented data on the enzymatic conversion of (-)-isopulegol by cutinases from Aspergillus oryzae and Humicola insolens [2] and its whole cell degradation by Rhodococcus sp.[3].Rhodococcus actinomycetes are biotechnologically promising microorganisms due to their broad substrate specificity and high catalytic activity even under extreme environmental conditions [4].Representatives of R. rhodochrous have a wide range of applications due to their ability to degrade ecotoxic compounds, such as pharmacological pollutants [5], polycyclic aromatic hydrocarbons [6], and pulp and paper industry waste [7].Furthermore, they are known to be effective catalysts for the synthesis of bioactive compounds [8][9][10][11], as well as being enzyme sources [12].In our recent study, R. rhodochrous IEGM 1362 was shown to biotransform monoterpene alcohol (-)-isopulegol into two new 10-hydroxy and 10-carboxy derivatives [13,14].Both compounds may have biological functions, such as antitumor, respiratory-stimulating and anticancer capabilities.
The ability of bacterial cells to transform various compounds is due to the action of their enzyme systems.A logical trend in the development of biotechnology is the isolation and purification of bacterial enzymes, as well as their overexpression in model organisms in order to scale the production of target compounds.The use of individual enzymes allows for highly selective reactions with repeated use, reduces sterility requirements, and simplifies the process of isolation and purification of the target product [15].In this regard, studies on the catalytic activity of Rhodococcus spp. in relation to terpene substrates should Genes 2024, 15, 992 2 of 12 be accompanied by the study of microbial enzymes involved in transformations, as well as the functional genes encoding them.Several rhodococcal genes and enzymes have been identified that catalyze the transformations of monoterpenoids 1,8-cineole, limonene, carveol, p-cymene, and terpineol [16][17][18][19][20].However, data on bacterial genes and enzymes for the bioconversion of (-)-isopulegol are lacking.Our research focuses on the primary analysis of the whole genome of R. rhodochrous IEGM 1362, an (-)-isopulegol biotransformer, to identify putative genes encoding enzymes involved in (-)-isopulegol bioconversion.

Whole Genome Sequencing
For DNA extraction, the R. rhodochrous IEGM 1362 cells were grown in LB broth at 28 • C and 160 rpm for 28 h.Genomic DNA was isolated using a DNeasy PowerSoil Pro Kit (Qiagen, Hilden, Germany).The shotgun library was prepared using an NEBNext Ultra II DNA library prep kit (New England Biolabs, Ipswich, MA, USA) and sequenced on an Illumina MiSeq instrument in paired read mode (2 × 300 nt).A total of 506,369 read pairs were generated.Low-quality sequences were trimmed using Sickle v. 1.33 (q = 30).The draft genome was assembled with Flye v. 2.8.1.

Bioinformatics Analysis
The search for functional genes presumably involved in the biotransformation of (-)-isopulegol and rRNA genes was carried out using the online service RAST (Rapid Annotation using Subsystem Technology, https://rast.nmpdr.org/,accessed on 3 June 2024) [21] based on an automatically annotated whole genome of the biotransformer strain.
A comparison of the target gene sequences was carried out using the BLASTN and BLASTP services available on the NCBI website.The search for biosynthetic gene clusters in the genome of the biotransformer strain was carried out using genomic mining on the online service AntiSMASH (https://antismash.secondarymetabolites.org/,accessed on 3 June 2024) [24].An analysis of the amino acid sequences and the construction of metabolic pathways were carried out using the KEGG database (Kyoto Encyclopedia of Genes and Genomes, https://www.genome.jp/kegg/,accessed on 3 June 2024) and the GhostKOALA service [25].

Primer Design
Primers were designed using Primer-BLAST.When selecting them, we were guided by recommendations from the NCBI, the uniqueness of specific regions in the genome, a product length of 70−1000 bp, a melting temperature of 57−63 • C, and a GC composition of 40−60%.In this case, the difference in the GC composition should be no more than 10%, the difference in the melting temperatures between the forward and reverse primers Genes 2024, 15, 992 3 of 12 should be no more than 0.99 • C, the self-complementarity indicator should be no more than 8 arbitrary units, and the maximum complementarity of pairs should be no more than 5 arbitrary units.

Determination of Genes Encoding Enzymes Involved in (-)-Isopulegol Transformation
DNA extraction was carried out using biomass obtained after a 24 h cultivation of the strain in an LB nutrient broth (Diaem, Moscow, Russia) according to the protocol for the Ex-tractDNA Blood genomic DNA isolation kit (Evrogen, Moscow, Russia).The concentration and purity of the isolated DNA were assessed using a Qubit TM fluorimeter (Thermo Fisher Scientific, Waltham, MA, USA) with a QuDye dsDNA BR kit (Lumiprobe, Moscow, Russia) and a NanoPhotometer N50 spectrophotometer (Implen, Munich, Germany), respectively.The resulting DNA was used for PCR with a qPCRmix-HS SYBR (Evrogen, Moscow, Russia) with the selected primers on a CFX Connect TM Real-time system (Bio-Rad, Hercules, CA, USA).Species-specific primers based on the 16S rRNA gene for R. rhodochrous served as a positive control.The PCR protocol included the following steps and conditions: The presence and size of amplicons in the reaction mixture after PCR were determined using horizontal electrophoresis in an agarose gel (1.5% agarose in TBE buffer) using a Bio-Rad Gel Doc XR+ gel documentation system (Bio-Rad, Hercules, CA, USA).Electrophoretic separation was carried out at a voltage of 70 V for 40 min.GelRed (Diaem, Moscow, Russia) was used as a nucleic acid dye.PCR products (5 µL) were added to an agarose gel in 4X Gel Loading Dye, Blue loading buffer (0.5 µL) (Evrogen, Moscow, Russia).To determine the size of the PCR products, a DNA length marker from 700 to 50 bp (Evrogen, Moscow, Russia) was added to the gel.

Phylogeny and Overall Genome Characteristics
The assembly consisted of 140 contigs with a total sequence length of 5,733,046 bp, an N50 value of 126,441 bp, a GC content of 68%, and coverage of 53.0×.A total of 5331 CDSs, 5209 CDSs with proteins, and 67 RNAs were found in the R. rhodochrous IEGM 1362 genome (Table 1).Using phenotypic methods (morphology, physiological tests, chemotaxonomy), the IEGM 1362 strain was identified as belonging to the R. rhodochrous species.During the analysis of the genome and phylogenetic markers, such as the 16S rRNA gene, it was revealed that R. rhodochrous IEGM 1362 was not a common member of this species.The scores for the key taxonomic markers (16S rRNA gene similarities, dDDH, and ANI) of R. rhodochrous IEGM 1362 were close to threshold levels of 98.7%, 70%, and 94-95%, respectively [26].According to the gene relatedness of 16S rRNA, the R. rhodochrous strain IEGM 1362 was in the same phylogenetically close group with the R. coprophilus strain NCTC10994, the R. gordoniae strain DND3, and various R. pyridinivorans and R. rhodochrous strains (Figure 1).Only the percent of identity with the 16S rRNA gene of R. coprophilus NCTC10994 was below the threshold of <98.7% [26], while the levels of identity with 16S rRNA genes of other strains were ≥98.7% (Figure 1).This indicated that R. rhodochrous IEGM 1362 could be classified as R. gordoniae, R. pyridinivorans, or R. rhodochrous.* Formula d4 (GGDC formula 2) was used: sum of all identities found in high-scoring segment pairs (HSPs) divided by overall HSP length [27].

The Whole Genome Analysis
The category distribution of genes with known functions is shown in Figure 2. In the genome of IEGM 1362, 75 CDSs encoding monooxygenases/hydroxylases, 31 CDSs encoding dioxygenases, 7 CDSs encoding peroxidases, and 351 CDSs encoding dehydrogenases were found.Among monooxygenases, the genes of two alkane 1-monooxygenases, The dDDH scores were more than 70% in comparisons of the IEGM 1362 genome with genomes of R. rhodochrous type strains and less than 70% in comparisons with genomes of R. gordoniae and R. pyridinivorans type strains (Table 2).This parameter was evidence for the IEGM 1362 strain belonging to R. rhodochrous, which agrees with the phenotypic traits of this strain.However, ANI scores were above the threshold of 94-95% for two species, R. pyridinivorans and R. rhodochrous, and G+C content was less different (0.05-0.25%) between R. rhodochrous IEGM 1362 and type strains of R. gordoniae and R. pyridinivorans than that between the IEGM 1362 strain and type strains of R. rhodochrous (0.38-0.45%) (Table 2).* Formula d4 (GGDC formula 2) was used: sum of all identities found in high-scoring segment pairs (HSPs) divided by overall HSP length [27].
According to the ANI and dDDH values, the IEGM 1362 strain did not belong to R. gordoniae.However, its difference from R. pyridinivorans was not evident.Summarizing all the scores (the 16S rRNA gene relatedness, the highest ANI score, and dDDH values within the species borders), the IEGM 1362 strain continued to belong to R. rhodochrous.Apparently, R. rhodochrous IEGM 1362 accumulated genetic traits of both closely related species R. pyridinivorans and R. rhodochrous and occupied an intermediate position between these two species.The heterogeneity of the R. rhodochrous IEGM 1362 genome, distancing this strain from common representatives of R. rhodochrous, could be a basis for its advanced catalytic properties, including its ability to transform (-)-isopulegol.
4.6.1.12),1-hydroxy-2-methyl-2-(E)-butenyl-4-diphosphate synthase (EC 1.17.7.1), and 4hydroxy-3-methylbut-2-enyldiphosphate reductase (EC 1.17.1.2) (Supplementary Materials, Figure S1) [28].This pathway provides the formation of isopentenyl diphosphate and dimethylallyl diphosphate, which are necessary for the synthesis of terpenes, sterols, carotenoids, and dolichols.It should be noted that, at present, the vast majority of processes for the biosynthesis of terpene compounds in actinomycetes have been discovered and studied mainly for representatives of the genus Streptomyces [29], whereas for representatives of the genus Rhodococcus, only a few examples of the biosynthesis of zeatins, isoprenoid cytokinins with a dimethylallyl moiety attached to the N atom of adenine or adenosine [30,31], and valuable carotenoids such as dihydroxyneurosporine, hydroxyequinenone [32], beta-carotene, zeaxanthin, and isorenieratin [33], etc. [34], have been detected.Since the strain we used in our work is characterized by the red color of the colonies, the discovered pathway is most It should be noted that, at present, the vast majority of processes for the biosynthesis of terpene compounds in actinomycetes have been discovered and studied mainly for representatives of the genus Streptomyces [29], whereas for representatives of the genus Rhodococcus, only a few examples of the biosynthesis of zeatins, isoprenoid cytokinins with a dimethylallyl moiety attached to the N atom of adenine or adenosine [30,31], and valuable carotenoids such as dihydroxyneurosporine, hydroxyequinenone [32], beta-carotene, zeaxanthin, and isorenieratin [33], etc. [34], have been detected.Since the strain we used in our work is characterized by the red color of the colonies, the discovered pathway is most likely responsible for the synthesis of terpene carotenoids.In addition, using the online genomic mining service AntiSMASH, the presence of three biosynthetic gene clusters for the synthesis of terpenes in the genome, as well as biosynthetic gene clusters for the synthesis of polyketides, non-ribosomal peptide synthases, ectoines, butyro-, and beta-lactones and ε-poly-L-lysine, was shown (Table 3).
Based on the recorded transformations of (-)-isopulegol, including the stages of hydroxylation and oxidation (Figure 3), and the analysis of the literature data [35], we assumed that enzymes of the cytochrome P450 family can be involved in the bioconversion process.Automatic annotation of the obtained sequences of R. rhodochrous IEGM 1362 using the RAST service made it possible to search for genes encoding enzymes presumably involved in the bacterial oxidation of (-)-isopulegol.We discovered nine genes encoding enzymes of the family of cytochrome P450-dependent oxygenases and hydroxylases (Table 4).Pairwise comparison of these genes within strain in the BLAST service showed the absence of significant matches in Megablast mode and no more than 33.5% similarity accordingly to the blastp (protein-protein BLAST) algorithm (Supplementary Materials, Figure S2).This confirms the fact that all of the discovered genes are not copies, but separate functional units.It is worth noting that, based on the results of a search for genes encoding CYP450 in the NCBI database, 13 genes were identified, but the transcription products of 4 of them in the RAST system were identified more specifically as steroid C27-monooxygenases (EC 1.14.13.141) and lanosterol 14-alpha demethylase (EC 1.14.13.70); therefore, they were not used by us in further experiments.
Based on the recorded transformations of (-)-isopulegol, including the stages of hydroxylation and oxidation (Figure 3), and the analysis of the literature data [35], we assumed that enzymes of the cytochrome P450 family can be involved in the bioconversion process.Automatic annotation of the obtained sequences of R. rhodochrous IEGM 1362 using the RAST service made it possible to search for genes encoding enzymes presumably involved in the bacterial oxidation of (-)-isopulegol.We discovered nine genes encoding enzymes of the family of cytochrome P450-dependent oxygenases and hydroxylases (Table 4).Pairwise comparison of these genes within strain in the BLAST service showed the absence of significant matches in Megablast mode and no more than 33.5% similarity accordingly to the blastp (protein-protein BLAST) algorithm (Supplementary Materials, Figure S2).This confirms the fact that all of the discovered genes are not copies, but separate functional units.It is worth noting that, based on the results of a search for genes encoding CYP450 in the NCBI database, 13 genes were identified, but the transcription products of 4 of them in the RAST system were identified more specifically as steroid C27monooxygenases (EC 1.14.13.141) and lanosterol 14-alpha demethylase (EC 1.14.13.70); therefore, they were not used by us in further experiments.Based on the detected sequences, for the first time, pairs of primers for individual CYP450 genes of R. rhodochrous IEGM 1362 were selected using the Primer-BLAST service.Optimal conditions for their amplification were selected.These guaranteed the formation of products with an expected size without the amplification of extra DNA fragments of other sizes typical for GC-rich matrices (Table 4, Figure 4).4.
Genes encoding CYP450 enzymes in R. rhodochrous IEGM 1362 differed from their homologues in other R. rhodochrous strains; this could be a basis for the advanced transformation capabilities of this strain.Percents of identity between the CYP450 nucleotide and amino acid sequences varied in wide ranges of 88.72%-100.00%and 56.36%-100.00%,respectively (Table 5).The highest (100.00%) and the lowest (88.72% and 56.36%) similarities were not typical and were detected in a comparison of R. rhodochrous IEGM 1362 with  4.
Genes encoding CYP450 enzymes in R. rhodochrous IEGM 1362 differed from their homologues in other R. rhodochrous strains; this could be a basis for the advanced transformation capabilities of this strain.Percents of identity between the CYP450 nucleotide and amino acid sequences varied in wide ranges of 88.72-100.00%and 56.36-100.00%,respectively (Table 5).The highest (100.00%) and the lowest (88.72% and 56.36%) similarities were not typical and were detected in a comparison of R. rhodochrous IEGM 1362 with a few R. rhodochrous strains.Some of R. rhodochrous IEGM 1362 cytochrome genes were rare and unique for bacteria.Gene nos.4, 6, and 7 (see Table 4) were found only in 13, 59, and 46 bacterial strains.Among R. rhodochrous strains, gene no.6 was unique for the IEGM 1362 strain, since it was not found in other representatives of R. rhodochrous.However, according to amino acid sequencing, an enzyme with 94.55% identity with CYP450 no.6 of IEGM 1362 was detected in one R. rhodochrous strain (Table 5).This enzyme was also found in R. pyridinivorans with high identities of 99.85-99.92%and 99.77% for DNA and amino acid sequences, respectively (Table 5).The high similarity of R. rhodochrous IEGM 1362 cytochromes with those in R. pyridinivorans was also evident from the close identities of these enzymes, which varied in ranges of 82.05-100.00%and 56.11-100.00%for DNA and amino acid sequences, respectively (Table 5).Moreover, cytochromes of R. rhodochrous IEGM 1362 were more frequently detected in R. pyridinivorans.Homologues of these enzymes and their genes were found in 1-27 strains of R. pyridinivorans compared with 0-12 strains of R. rhodochrous (Table 5).This was not related to the over-representation of sequenced genomes for one of the species in the NCBI.There were 31 strains with sequenced genomes for both species.A search for similar DNA and protein sequences was performed against standard NCBI databases of nucleotide collection and non-redundant protein sequences using the megablast and blastp programs, respectively.Only the percents of identity at query coverages ≥ 91% are presented.The gene numbers are the same as those in Table 4. * The numbers of the strains found are shown in brackets.
The genetic surroundings of CYP450 genes were analyzed.The genes that coded for transcriptional regulators were found directly before or in close proximity to eight CYP450 genes.Only gene no.6 had no transcription regulators nearby.However, partner proteins for cytochromes, such as ferredoxin and ferredoxin reductase, were detected only in the surrounding of CYP450 no. 6.We supposed that genes encoding CYP450 no.6, ferredoxin, and ferredoxin reductase consisted of one gene cluster expressed as a polycistron transcript, and that these proteins were responsible for a distinct metabolic process, probably (-)-isopulegol transformation.Other cytochromes in R. rhodochrous IEGM 1362 could be involved in complex metabolic processes requiring global regulation along with other genes.Such processes could be the biosynthesis of lipids, fatty acids, or acylated polymers (glycolipids or lipoproteins), since genes coding for corresponding enzymes were found near CYP450 genes.Other metabolic processes could include the neutralization of toxic organic compounds, since genes coding for various transporting proteins and other oxidoreductases were also found near CYP450 genes.Gene no. 8 apparently was involved in the metabolism of aromatic compounds, since genes encoding benzoate transporters, benzoate, and catechol dioxygenases, and some specific dehydrogenases, were annotated in the same gene cluster (Table 6).A common characteristic of cytochromes is their wide substrate specificity.Consequently, any of the CYP450 enzymes found in the genome of R. rhodochrous IEGM 1362 can theoretically enable the transformation of (-)-isopulegol, including providing co-oxidation conditions.
Additionally, a perspective of horizontal gene transfer (HGT) of genes coding for cytochrome P450 in R. rhodochrous IEGM 1362 was estimated.Mobile elements were only found near three CYP450 genes.Just one mobile element protein was detected in each case (Table 6).Apparently, HGT plays a minor role in the distribution of cytochrome-coding genes between Rhodococcus species; however, it is probably responsible for the appearance of the unique CYP450 no.6 in the genome of R. rhodochrous IEGM 1362.
The data obtained expand the understanding of the molecular genetic basis of the transformation of terpenoids by actinomycete cells of the genus Rhodococcus and create the prerequisites for further analysis of gene expression levels in order to identify genes and enzymes for the highly efficient selective biotransformation of the plant monoterpenoid (-)-isopulegol.

Genes 2024 , 13 Figure 1 .
Figure 1.Distance tree for pairwise alignments between 16S rRNA gene sequences of R. rhodochrous IEGM 1362 and the most phylogenetically closed strains.The tree method was Fast Minimum Evolution.The tree was unrooted.Only strains with a complete sequence of a 16S rRNA gene were selected for tree production.Values near the edges show the percentages of identity between 16S rRNA genes.

Figure 1 .
Figure 1.Distance tree for pairwise alignments between 16S rRNA gene sequences of R. rhodochrous IEGM 1362 and the most phylogenetically closed strains.The tree method was Fast Minimum Evolution.The tree was unrooted.Only strains with a complete sequence of a 16S rRNA gene were selected for tree production.Values near the edges show the percentages of identity between 16S rRNA genes.

Figure 2 .
Figure 2. Distribution of subsystem categories in the genome of R. rhodochrous IEGM 1362.Image obtained using SEED Viewer 2.0.

Figure 2 .
Figure 2. Distribution of subsystem categories in the genome of R. rhodochrous IEGM 1362.Image obtained using SEED Viewer 2.0.

Figure 4 .
Figure 4. Electropherogram of PCR products of R. rhodochrous IEGM 1362 with specific primers for CYP450 genes: М, DNA length marker from 700 to 50 bp.The gene numbers are the same as the gene numbers in Table4.

Figure 4 .
Figure 4. Electropherogram of PCR products of R. rhodochrous IEGM 1362 with specific primers for CYP450 genes: M, DNA length marker from 700 to 50 bp.The gene numbers are the same as the gene numbers in Table4.

Table 2 .
Overall genomic characteristics of R. rhodochrous IEGM 1362 compared with type strains from the TYGS database.

Table 2 .
Overall genomic characteristics of R. rhodochrous IEGM 1362 compared with type strains from the TYGS database.

Table 3 .
Biosynthetic gene clusters in the genome of R. rhodochrous IEGM 1362.

Table 4 .
Specific primers for genes encoding CYP450 enzymes of R. rhodochrous IEGM 1362, and optimal PCR conditions for their detection.

Table 5 .
Percents of identities for genes encoding CYP450 enzymes in R. rhodochrous IEGM 1362 compared with those in R. rhodochrous and R. pyridinivorans strains.