Polyketide Synthase and Nonribosomal Peptide Synthetase Gene Clusters in Type Strains of the Genus Phytohabitans

(1) Background: Phytohabitans is a recently established genus belonging to rare actinomycetes. It has been unclear if its members have the capacity to synthesize diverse secondary metabolites. Polyketide and nonribosomal peptide compounds are major secondary metabolites in actinomycetes and expected as a potential source for novel pharmaceuticals. (2) Methods: Whole genomes of Phytohabitans flavus NBRC 107702T, Phytohabitans rumicis NBRC 108638T, Phytohabitans houttuyneae NBRC 108639T, and Phytohabitans suffuscus NBRC 105367T were sequenced by PacBio. Polyketide synthase (PKS) and nonribosomal peptide synthetase (NRPS) gene clusters were bioinformatically analyzed in the genome sequences. (3) Results: These four strains harbored 10, 14, 18 and 14 PKS and NRPS gene clusters, respectively. Most of the gene clusters were annotated to synthesis unknown chemistries. (4) Conclusions: Members of the genus Phytohabitans are a possible source for novel and diverse polyketides and nonribosomal peptides.


Introduction
The discovery of new bioactive secondary metabolites remains one of the most important tasks for current pharmaceutical developments. Actinomycetes are well known as a potential source of diverse secondary metabolites. Numerous numbers of actinomycete strains have been isolated from soils and contributed to the discovery of useful bioactive compounds. However, it is nowadays becoming difficult to find novel compounds from soil-derived strains. That is because the majority of actinomycetal strains isolated from soil samples belong to the genus Streptomyces [1] and intensive exploration of Streptomyces strains is leading to frequent re-discovery of already reported compounds [2]. Consequently, attention has shifted from Streptomyces to the rare actinomycetes, especially new genera, because they are not extensively examined for the aim. New ecological niches are drawing attention as sources of new actinomycetes. The microbial flora in plant matter was reported to be different from that in soil samples and that the majority of species in such samples were rare actinomycetes [3].
Post-genomic studies for actinomycetes revealed that each actinomycete harbors in general diverse secondary metabolite-biosynthetic gene clusters (smBGCs) even if the strain had been reported to produce only few compounds [4]. This led to an approach based on analyzing smBGCs in whole genomes, called genome-mining, to search new natural products, resulting in effective isolation of new secondary metabolites [5]. Major smBGCs in actinomycetes are associated with polyketide synthase (PKS) and/or nonribosomal peptide synthase (NRPS) pathways [4,6]. Polyketide and nonribosomal peptide compounds are structurally and pharmacologically diverse and expected as a source for pharmaceutical seeds. Type-I PKSs and NRPS are large enzymes, composed of multiple catalytic spectrometer (LC-MS) (UltiMate 3000 UHPLC coupled with Q Exactive, Thermo Fisher Scientific K.K., Tokyo, Japan). Acquity UPLC BEH C18 1.7 µm (2.1 × 50 mm) (Nihon Waters K.K., Tokyo, Japan) was used as a reverse phase column for separation in the system. Water (solvent A) and acetonitrile (solvent B), both containing 0.1% (v/v) formic acid, were used as the mobile phase in the following linear gradient program: 5% B for 0.5 min, 5% B to 85% B in 5 min, 85% B to 100% B in 0.5min, 100% B for 2 min. The flow rate was set to 0.6 mL/min and the column oven temperature was set at 40 • C. Compounds in the eluate were detected in the electrospray ionization positive-ion mode with a spray voltage at 3.5 kV and a capillary temperature at 300 • C. Nitrogen sheath gas and auxiliary gas were set at 50 and 15 arbitrary units, respectively. A full MS scan was performed in the range of 150-2000 (m/z) at 70,000 resolution. Data were acquired with Xcalibur 2.0 software (Thermo Fisher Scientific K.K., Tokyo, Japan).

Genome Analysis of Four Type Strains in the Genus Phytohabitans
Whole genomes of P. flavus NBRC 107702 T , P. rumicis NBRC 108638 T , P. houttuyneae NBRC 108639 T and P. suffuscus NBRC 105367 T were sequenced by PacBio. Their genome sizes were 9.6 Mb, 10.7 Mb, 11.3 Mb and 10.2 Mb, respectively. The G+C contents ranged from 70.8 to 72.0%. Each genome encoded 10 to 18 PKS and NRPS gene clusters as summarized in Table 1. Type-II PKS gene clusters were not observed in the genomes.

PKS and NRPS Gene Clusters Common between/among Species
Two type-I PKS (t1pks), two type-III PKS (t3pks), three NRPS (nrps) and one hybrid PKS/NRPS (pks/nrps) gene clusters were shared between/among multiple species (Table 2). T1pks-1, t3pks-1 and mrps-3 gene clusters were present in all the test strains. T1pks-1 gene cluster encoded only one PKS gene, whose domain organization is KS/AT/KR/DH. The unusual domain organization, such as not DH-KR but KR-DH and lack of ACP, is characteristic for the iterative PKS for enediyne synthesis [19]. The organization of adjacent genes is similar to those of maduropeptin, sporolide, calicheamicin and neocarzinostatin ( Figure 1a). The phylogenetic analysis of the PKSs suggested those of t1pks-1 in P. flavus NBRC 107702 T and P. rumicis NBRC 108638 T is included in a clade of compounds with 9-membered enediyne moiety whereas those in P. houttuyneae NBRC 108639 T and P. suffuscus NBRC 105367 T are closer to that of calicheamicin, which enediyne moiety is 10-membered ( Figure 1b). Thus, the products of P. flavus NBRC 107702 T and P. rumicis NBRC 108638 T will be compounds similar to these 9-member enediyne compounds. In contrast, those of P. houttuyneae NBRC 108639 T and P. suffuscus NBRC 105367 T would be different from them. T3pks-1 gene cluster encoded one type-III PKS, which is an ortholog of agqA for the synthesis of alkyl-O-dihydrogeranyl-methoxyhydroquinones. As this cluster also encoded orthologs of non-PKS/NRPS family genes, agqB to agqD [20] (Pfav_061719 to Pfav_06721, Prum_045400 to Prum_045380, Phou_043850 to Phou_045870, Psuf_029230 to Psuf_029250), the product was deduced to be alkyl-O-dihydrogeranyl-methoxyhydroquinones. Nrps-3 gene cluster resembled BGCs for siderophores, such as scabichelin and albachelin [21,22]. This cluster was predicted to synthesize a siderophore, composed of four to five amino-acid residues such as methyl-acetyl-hydroxy-ornithine Life 2020, 10, 257 4 of 16 (mHaOrn), methyl-ornithine (mOrn) and acetyl-hydroxy-ornithine (HaOrn) by antiSMASH analysis, although some monomers cannot be predicted. The selectivity-conferring codes [23] of A domains in the fifth modules was DAWEGGLVDK or DAWEVGLVDK, which is identical or quite similar to that of scabichelin-BGC (DAWEGGLVDK) loading hydroxy-ornithine (hOrn) [21]. Nrps-3 gene clusters of P. flavus NBRC 107702 T , P. houttuyneae NBRC 108639 T and P. suffuscus NBRC 105367 T likely share similar domain organizations (A haorn /MT/T-C/A/T-C/A orn /MT/T-C/A haorn /T-C/A/T/E), whereas that of P. rumicis NBRC 108638 T lacked C/A/T of the last module. Therefore, the products will be different between the three strains and P. rumicis NBRC 108638 T , which are predicted as mHaOrn-x-mOrn-HaOrn-hOrn and mHaOrn-x-mOrn-HaOrn, respectively. Pks/nrps-3 gene cluster was present in three strains except for P. rumicis. This cluster harbors ten KR domains and eleven DH-KR domain pairs, catalyzing hydroxy groups and C=C double bond formations [7], respectively, were present as optional domains. Therefore, the product will be a large polyene compound, although it is unpredictable how the A domain in the last ORF is involved in the synthesis. It may suggest a novel function of A domain. T1pks-4 and nrps-4, t3pks-4 and nrps-11, and nrps-2 gene clusters were distributed between P. rumicis and P. houttuyneae, between P. houttuyneae and P. suffuscus, and between P. flavus and P. rumicis, respectively. T1pks-4 gene cluster encoded one PKS with single module, whose polyketide backbones were not predicted by their domain organization. Nrps-4 genes did not show high sequence similarities to published genes whose products are identified. However, based on the module numbers, the products were deduced to be a small molecule. T3pks-4 gene cluster encoded a type-III PKSs and combined with a terpenoid gene cluster. The hybrid cluster resembled diazepinomicin BGC, which suggests being responsible for the synthesis of the compound. Nrps-11 gene cluster did not show high sequence similarities to published genes whose products are identified. However, based on the domain organization, the product was deduced to be a nonapeptide, in which some amino acid residues may be modified by multiple C domains that were predicted to be involved in modification of the incorporated amino acid residues (C M ). Nrps-2 gene cluster encoded seven NRPSs. According to the domain organization, which assembly line is T-C/A/T-C/A asn /T-C/A ser /T-C/T-C/A asn /T-C/A gly /T, the product is predicted to be a heptapeptide including two Asn, one Ser and one Gly residues. Nonribosomal peptides similar to those of nrps-2, -4 and -11 were not found in our database search using Norine.
Slash is inserted between domains, whereas hyphen is between modules. Amino acid residues in italics were predicted only by antiSMASH. Abbreviations: A, adenylation; Aad, 2-aminoadipic acid; ACP, acyl carrier protein: AHBA, aminohydroxybenzoate; Asx, Asp or Asn; AT, acyltransferase; AT e , AT for ethylmalonyl-CoA; AT m , AT for malonyl-CoA; AT p , AT for methylmalonyl-CoA; C, condensation; C Cy , heterocyclization (Cyc) domain that catalyze both peptide bond formation and subsequent cyclization of cysteine, serine or threonine residues.; D C L , C domain that link an L-amino acid to a growing peptide ending with a D-amino acid; C Du , dual E/C domain that catalyze both epimerization and condensation; L C L , C domain that catalyze a peptide bond between two L-amino acids; C M , C domain that appears to be involved in the modification of the incorporated amino acid, for example the dehydration of serine to dehydroalanine; C S , starter C domain (first dominated and classified as a separate subtype here) which acylates the first amino acid with a β-hydroxy-carboxylic acid (typically a β-hydroxyl fatty acid); CoL, CoA-ligase; DH, dehydratase; Dha, dehydroalanine; DHB, hydroxybenzoate; dVal, D-Val; dx, D-amino acid; E, epimerase; ER, enoylreductase; HaOrn, acetyl-hydroxy-Orn; hOrn, hydroxy-Orn; KS, ketosynthase; KS 1 , KS domain present in the first module of assembly lines; KR, ketoreductase; mHaOrn, methyl-HaOrn; mOrn, methyl-Orn; MT, methyltransferase; mx, methyl-amino acid; n/a, not present; Orn, ornithine; pk, polyketide moiety; s, starter molecule; T, thiolation (peptidyl carrier protein); TE, thioesterase; TD, termination domain; x, unidentified amino acid residue; y, unknown residue due to the lack of A domain. Inferior 3-letter-abbriviated amino acids just after A domains are substrates of the A domain.

PKS and NRPS Gene Clusters Specific to Each Strain
3.3.1. P. flavus NBRC 107702 T Five gene clusters were specific to P. flavus NBRC 107702 T (Table 3). T1pks-2 gene cluster encoded AT-less PKSs. The domain organization of Pfav_13200 to Pfav_13340 was similar to that of PKS for anthracimycin synthesis [23]. However, this cluster encodes additional PKSs (Pfav_012890 to Pfav_012980), which are not present in the anthracimycin biosynthetic gene cluster (BGC). Thus, the product was predicted to be a larger polyketide than anthracimycin, which includes an   The phylogenetic tree was reconstructed by the neighbor-joining method using ClustalX 2.1. Numbers on the branches represent the confidence limits estimated by bootstrap analysis with 1000 replicates; values above 50% are at branching points. Accession numbers of used sequences are as follows: MdpE, AAQ17110; Strop_2697, ABP5514; SgcE, AAL06699; NcsE, AAM78012; CalE8, AAM94794; AcmE, ATV95639; KedE, AFV52145; UcmE, AMK92560; DynE, AAN79725. Kedarcidin also includes 9-membered enediyne moiety whereas the enediyne moieties of calicheamicin, uncialamycin and dynemicin are 10-membered.  (Table 3). T1pks-2 gene cluster encoded AT-less PKSs. The domain organization of Pfav_13200 to Pfav_13340 was similar to that of PKS for anthracimycin synthesis [23]. However, this cluster encodes additional PKSs (Pfav_012890 to Pfav_012980), which are not present in the anthracimycin biosynthetic gene cluster (BGC). Thus, the product was predicted to be a larger polyketide than anthracimycin, which includes an anthracimycin-like moiety as a part. T1pks/t3pks was a hybrid gene cluster encoding 21 type-I PKSs and one type-III PKS. The type-I PKS harbors five KR domains and nine DH-KR domains. Hence, the product will be a large polyene compound whose starter unit is a chalcone-like moiety derived from type-III PKS. Nrps-1 gene cluster harbored less than two modules. Hence, the product would be simple. Pks/nrps-1 gene cluster encoded two hybrid PKS/NRPS proteins. They are predicted to form only two modules, which load AHBA and methylmalonyl-CoA, respectively. The molecule synthesized by this cluster would be small and simple. Pks/nrps-2 gene cluster encoded six NRPSs and one PKS. They included one loading module, one PKS module and five NRPS modules and it is deduced to synthesize a peptide containing one Asp and two Ser molecules and a polyketide unit. Hybrid polyketide/nonribosomal peptide compounds resembling that of pks/nrps-2 were not found in our database search.   Table 2. One t1pks, two t3pks, three nrps and two pks/nrps gene clusters were specific to P. rumicis NBRC 108638 T (Table 4). T1pks-3 gene cluster resembled pyrrolomycin BGC and their domain organizations were the same. Therefore, it will be responsible for the synthesis of pyrrolomycin. The products of t3pks-2 and -3 gene clusters were not able to be predicted by this bioinformatic analysis. However, as t3pks-2 gene cluster also encoded terpenoid-biosynthetic genes, we predicted the product to be a terpenoid with a polyketide moiety derived from type-III PKS. The products of nrps-5, -6 and -7 gene clusters were predicted to be tetrapeptides as shown in Table 4. Similarly, the products of pks/nrps-4 and -5 were deduced to be tetra-and penta-peptides, respectively, with a polyketide moiety. Nonribosomal compounds like those of nrps-5, nrps-6, nrps-7, pks/nrps-4 and pks/nrps-5 were not found in our database search.   Table 2. 3.3.3. P. houttuyneae NBRC 108639 T Two t1pks, one t3pks, five nrps and three pks/nrps gene clusters were specific to P. houttuyneae NBRC 108639 T (Table 5). T1pks-5 encoded one PKS with single module, whose polyketide backbones were not predicted by their domain organization. T1pks-6 gene cluster showed similarity to deschlorothricin-BGC. However, as their domain organizations were different each other, the product of t1pks-6 will not be deschlorothricin but a deschlorothricin-like compound. Since t3pks-4 gene cluster did not show high amino acid sequence similarities to product-identified gene clusters, its product was not able to be Life 2020, 10, 257 9 of 16 speculated. Nrps-8, -9, -10 and -12 gene clusters also did not show high similarities to gene clusters whose products are identified. However, according to the domain organizations and/or substrates of A domains, their products were deduced to be di-, nona-, penta-, and penta-peptides, respectively, as shown in Table 5. Pks/nrps-7 gene cluster encoded ten NRPSs harboring multiple domains, forming 24 NRPS modules, and one type-III PKS. Hence, it will synthesize a large peptide composed of 24 amino-acid residues with a polyketide moiety derived from type-III PKS. In contrast, pks/nrps-6 and -8 gene clusters encoded less NRPSs and their products were predicted to be tri-, and tetra-peptides with a moiety derived from each small PKS, respectively. Nonribosomal peptide-and/or hybrid polyketide/nonribosomal peptide-compounds shown as deduced product in Table 5 were not found in our database search.

Gene Cluster ORF (Phou_) Size (aa) Domain Organization Deduced Product
pks/nrps-7  Table 2. 3.3.4. P. suffuscus NBRC 105367 T One t1pks, four nrps and four pks/nrps gene clusters were specific to P. suffuscus NBRC 105367 T ( Table 6). T1pks-7 gene cluster encoded 16 PKS proteins, whose modules were twelve. Since there are six DH-KR pairs, yielding C=C double bonds, this product will be a polyene polyketide. Nrps-14 was assigned to be a BGC for pentapeptides as shown in Table 6. As nrps-14 gene cluster was similar to BGC for cephamycin, we considered it to be a cephamycin BGC. Nrps-15 gene cluster encoded four NRPSs, two of which included a terminal TE domain, respectively. Although it is unclear which TE of the two is functional, we predicted the product to be DHB-Ser based on ORFs of Psuf_002170, Psuf_002160 and Psuf_002150. Such a part is often observed in siderophores. By catalyzing iteratively, this cluster may synthesize a siderophore like enterobactin, which is composed of three pairs of DHB-Ser. Pks/nrps-9 gene cluster encoded one iterative PKS for enediyne and one type-III PKS in addition to 13 NRPSs for a total number of module of six. The product would be a hexapeptide including Ser, Cys, Pro as the amino-acid residues, a polyketide component derived from type-III PKS, and an enediyne moiety. Although the product of pks/nrps-10 gene cluster was unclear, it will be a polyketide with a thiazoline residue formed by cyclization of Cys. Pks/nrps-11 gene cluster was considered to be a chlorizidine BGC according to the similarity between their gene organizations. Pks/nrps-12 gene cluster encoded 22 proteins, whose PKS modules were 13, and one NRPS. In the PKS domain organization, four KR domains and four DH-KR domain pairs were present, suggesting the product to be a polyene compound with a moiety derived from Leu. Deduced products of nrps-13, pks/nrps-9, pks/nrps-10 and pks/nrps-12 were not reported in our database search.

Genomic Positions of the Gene Clusters
Genomic positions of the PKS and NRPS gene clusters were diagrammatically shown in Figure 2. Orthologous clusters present between/among the strains are connected by line in the figure. All the strains harbored t1pks-1, t3pks-1 and nrps-3 gene clusters. Pks/nrps-3 were present in three strains except for P. rumicis NBRC 108638 T . Nrps-4 and t1pks-3 were distributed between P. rumicis NBRC 108638 T Life 2020, 10, 257 11 of 16 and P. houttuyneae NBRC 108639 T , whereas t3pks-4 and nrps-11 were between P. houttuyneae NBRC 108639 T and P. suffuscus NBRC 105367 T . The remaining 31 gene clusters were not shared between different species: five, eight, ten and eight were specific to P. flavus NBRC 107702 T , P. rumicis NBRC 108638 T , P. houttuyneae NBRC 108639 T and P. suffuscus NBRC 105367 T , respectively, as shown by closed circles in the figure.

Production of Unknown Compounds
The four strains were cultured on two kinds of agar medium.  15, 446.11, 659.11), enterobactin (669.14) and chlorizidines (441.94, 415.97), were not observed. In contrast, the other ion peaks were observed, among which we here picked up ones listed in Table 7. Ion peaks of m/z 656.31 eluted at 2.8 min (3) were observed in the culture extracts of the three strains except for P. rumicis NBRC 108638 T whereas ion peaks of 1 and 2 were observed specifically in that of P. rumicis NBRC 108638 T . Ion peaks (4) and (5) were specific for P. suffuscus NBRC 105367 T and P. houttuyneae NBRC 108639 T , respectively. We searched reported compounds with these accurate mass values in the database of Dictionary of Natural Products and consequently there are not significant hits, suggesting that these compounds are likely novel. 000160 * 89 ACP * Encoded in the complementary strand. Abbreviations are the same as those of Table 2.

Genomic Positions of the Gene Clusters
Genomic positions of the PKS and NRPS gene clusters were diagrammatically shown in Figure  2. Orthologous clusters present between/among the strains are connected by line in the figure. All the strains harbored t1pks-1, t3pks-1 and nrps-3 gene clusters. Pks/nrps-3 were present in three strains except for P. rumicis NBRC 108638 T . Nrps-4 and t1pks-3 were distributed between P. rumicis NBRC 108638 T and P. houttuyneae NBRC 108639 T , whereas t3pks-4 and nrps-11 were between P. houttuyneae NBRC 108639 T and P. suffuscus NBRC 105367 T . The remaining 31 gene clusters were not shared between different species: five, eight, ten and eight were specific to P. flavus NBRC 107702 T , P. rumicis NBRC 108638 T , P. houttuyneae NBRC 108639 T and P. suffuscus NBRC 105367 T , respectively, as shown by closed circles in the figure.

Discussion
In conventional screenings for novel secondary metabolites, re-isolation of known compounds has been problematic. This caused a change in the strategies used for natural product discovery by shifting to new sources as producing microorganisms. Prediction of products based on smBGCs, such as PKS and NRPS gene clusters, is a powerful approach to reduce the frequency to isolate known compound although further investigations need to be carried out.
Here, we sequenced whole genomes in four type strains of the genus Phytohabitans, which have not been studied by genome sequence-based strategies, by PacBio, analyzed their PKS and NRPS gene clusters and bioinformatically predicted the chemical structures of the products derived from these gene clusters. Fifty-six gene clusters were identified from the four strains, which are involved in the biosynthesis of 40 different types of polyketide and/or nonribosomal peptide compounds. Although analysis focusing on the domain organizations and bioinformatical substrate prediction is not sufficient to conclude the products are the same because of possible variants derived from low substrate selectivity of A and AT domains, few gene clusters were shared between/among different species. Each strain harbored five to eleven specific gene clusters. Most of the gene clusters are not for known compounds and their predicted chemical structures are novel. Among the 40 biosynthetic gene clusters, only six were identified to produce known products. These known compounds were not produced in our culture conditions. These BGCs may be cryptic in the condition and/or their productivity may too low to detect them in samples derived from small scale cultures. To express these BGGs and/or produce more, further investigations are necessary. The duplication of the putative metabolites within the genus were only nine as shown in Table 2, suggesting many are specific in each species. Therefore, members of the genus Phytohabitans are considered as an attractive source for novel and diverse secondary metabolites. To confirm it, we analyzed the culture extracts by LC-MS as a preliminary study. As expected, ion peaks corresponding to some putative novel compounds were observed. We are guessing that the products of ion peaks 1 to 3 are siderophores derived from nrps3. Nrps-3 gene cluster is distributed to the four strains but that in P. rumicis NBRC 108638 T lacks the fifth module and its product will be smaller than the others. Nrps-3 resembles that of scabichelin, whose exact mass is 647.36. Although the second amino acid residue of the product by nrps-3 was unpredictable in this study, the exact mass will be close to that of scabichelin because they are similar siderophores. Thus, the compound of m/z 656.31 is plausible as the product. Furthermore, P. rumicis NBRC 108638 T , whose nrps-3 lacks the fifth module, did not produce the product of m/z 656.31, but produced smaller ones, which can be account for by the absence of the fifth module. Unfortunately, it is not possible to determine chemical structures of final products because PKS and NRPS assembly lines determine chemical structures of the backbones [7], but do not those of final products since the backbones are usually modified by other enzymes to yield the final products. It is unclear at present which gene clusters in P. suffuscus NBRC 105367 T and P. houttuyneae NBRC 108639 T synthesize the two putative novel compounds (4,5). Except for siderophores, remarkable ion peaks with high intensities were not observed from P. flavus NBRC 107702 T and P. rumicis NBRC 108638 T . This may be due to their poor growth in our culture conditions. Compared with general type-I PKSs and NRPSs, some of those in the genus Phytohabitans were observed to split on many proteins. It is still unclear if it is artifact from sequencing and/or assembly technologies. However, obvious ORFs that are likely involved in biosynthetic pathway, such as accessory enzymes, were not observed between such the split PKS and NRPS genes. To confirm whether modular enzyme genes are often split in the genus Phytohabitans or whether it is due to technological artifact(s), more reliable method(s) should be employed.
During this study, a novel species Phytohabitans kaempferiae was reported, which is an endophytic actinomycete isolated from the leaf of Kaempferia larsenii [24]. Although the whole genome has yet to be sequenced, the analysis will also reveal further potential of the genus because different species, in general, harbor specific PKS and NRPS pathways, as shown in this and our previous studies on actinomycetes [15][16][17].