Recent Advances in the Discovery and Biosynthetic Study of Eukaryotic RiPP Natural Products

Natural products have played indispensable roles in drug development and biomedical research. Ribosomally synthesized and post-translationally modified peptides (RiPPs) are a group of fast-expanding natural products attribute to genome mining efforts in recent years. Most RiPP natural products were discovered from bacteria, yet many eukaryotic cyclic peptides turned out to be of RiPP origin. This review article presents recent advances in the discovery of eukaryotic RiPP natural products, the elucidation of their biosynthetic pathways, and the molecular basis for their biosynthetic enzyme catalysis.


Common Features of RiPP Biosynthesis
Ribosomally synthesized and post-translationally modified peptides (RiPPs) are ribosomally synthesized and post-translationally modified peptide natural products. As their name indicates, all RiPP natural products are encoded by structural genes and are initially synthesized as precursor peptides by ribosome ( Figure 1). In most RiPPs, the precursor peptide consists of a sequence-conserved amino N-terminal leader peptide and a hypervariable core sequence. Many eukaryotic precursor peptides, as described in this review, have a carboxyl C-terminal recognition sequence that is important for excision and cyclization [1]. In general, the precursor peptide is first synthesized by the ribosome. Then, the core peptide is subjected to post-translational modifications, many of which are guided by leader peptides and recognition sequences. Finally, the leader sequences and recognition sequences are removed by proteolysis to generate mature peptides. Notably, some post-translational modifications are leader/recognition sequence independent, catalyzed after removal of the flanking sequences.
Due to the fact that RiPP core peptides are directly translated from open reading frames (ORFs) in the genomes of the producing organisms, genome mining algorithms and toolkits were developed to correlate the mature RiPP with its corresponding biosynthetic gene cluster (BGC) and to use that information to search for more homologous BGCs. On the other hand, by analyzing the sequence of a putative homologous BGC, the sequence and even the raw structure of its corresponding mature RiPP can be predicted as well [2,3].

Designation of RiPP Families
Nisin, produced by Lactococcus lactis, is one of the longest known RiPPs first reported in the 1920s and has been used as a food preservative since the 1960s [1]. Its structure is characterized by the presence of lanthionine residues, giving the name lanthipeptides (for lanthionine-containing peptides) to this family of RiPPs. It was not until recently that the biosynthetic mechanisms for lanthionine

Amatoxins and Phallotoxins
Mushrooms in the genus Amanita account for most of fatal mushroom poisonings [8]. Their toxicity is caused by a group of bicyclic peptides named amatoxins. Amatoxins are also biosynthesized by mushrooms in other unrelated genera, such as Galerina, Lepiota, and Conocybe. They can cause liver failure and death by inhibiting ribonucleic acid (RNA) polymerase II [9]. Amanita mushrooms are also responsible for producing a structurally related group of toxins, called phallotoxins, which are orally inactive but toxic when injected. Phallotoxins act by stabilizing F-actin, and have been utilized to stain the cytoskeleton (Figure 2) [9]. Genome survey sequencing revealed that amatoxins and phallotoxins are biosynthesized via ribosomal pathways [8,10]. It was also shown that genes encoding precursor peptides for amatoxins and phallotoxins are prevalent in toxic mushrooms and form a large family, known as the MSDIN family for the first five conserved amino acid residues in the precursor peptides [11,12]. Members of the MSDIN family are characterized by a hypervariable core region flanked by conserved leader and recognition sequences. Moreover, the core is flanked by invariant proline residues that act as proteolytic targets by a prolyl oligopeptidase (POP), named POPB ( Figure 3). POPB is a member of the POP family of serine proteases. It differs from conventional POP (such as POPA that is also present in Amanita mushrooms) in that it catalyzes two nonprocessive reactions: Hydrolysis of leader peptide following the proline residue, and transpeptidation to form macrocycle of the core peptide [13]. This two-step mechanism of POPB catalysis was also supported by kinetic and structural studies ( Figure 4). The enzyme first hydrolyzes N-terminal leader by the removal of 10 residues from a 35-residue precursor. The resulting 25 amino-acid peptide is conformationally trapped and forced to be released. After dissociation from the enzyme, the 25-mer is conformationally rearranged and rebounded by the enzyme. This process is possibly directed by the C-terminal follower peptide. Finally, the follower peptide is removed and the core peptide is macrocyclized in the active site of the same enzyme [14]. Due to its unusual two-step mechanism, high substrate tolerance in the core region, and satisfying kinetic efficiency, POPB has been exploited as a general catalyst for peptide macrocyclization [15].

Borosins
Omphalotin A was isolated from the basidiomycete Omphalotus olearius with potent and selective nematotoxic activity. The structure of omphalotin A is characterized by a peptidic macrocycle with nine N-methylations on the amide backbone ( Figure 5). It was postulated that omphalotin A was biosynthesized by a nonribosomal peptide synthetase (NRPS) pathway, because backbone N-methylation had never been observed for RiPP pathways. It was not until 2017 that a RiPP biosynthetic gene cluster, oph, was confirmed to be responsible for producing omphalotin A by two groups in parallel [16,17]. Even more surprisingly, the precursor peptide of omphalotin A is not present as a stand-alone substrate for post-translational modifications, but rather fused to the C-terminal of a protein with sequence homology to S-adenosylmethionine (SAM)-dependent methyltransferases [18,19]. The gene encoding this fusion protein was named ophA. Additional experiments showed that OphA autocatalytically methylates its own C terminus in a sequential manner from N to C terminus, followed by cleavage and cyclization by the prolyl oligopeptidase OphP to form omphalotin [16]. Van der Velden et al. [17] proposed the name "borosins" after the ancient mythological symbol Ouroboros for this new family of RiPPs. The molecular mechanism of OphA automethylation was proposed based on structural studies by two groups in parallel ( Figure 6) [20,21]. OphA acts by forming a homodimer, with each monomer resembling the appearance of a ring. In the co-complex structure, the C-terminal core peptide in monomer A sits into the methyltransferase active-site of monomer B (and vice versa), giving the dimer the appearance of two interlocked rings [20]. This structural arrangement results in substrate proximity and suggests an acid-base catalysis mechanism. The amide nitrogen is first deprotonated with the help of a basic amino acid residue (possibly arginine). The resulting negative charge is stabilized by tyrosine residues in close proximity. Finally, the proximity of a reactive SAM molecule promotes the alkylation reaction [21].

Dikaritins
Ustiloxins Ustiloxins are the first example of natural products that are biosynthesized by RiPP pathways in filamentous fungi. The study of ustiloxin biosynthetic pathway represents the first example of complete RiPP gene cluster characterization in fungi [22]. Ustiloxin B was originally discovered from plant pathogenic fungus Ustilaginoidea virens with phytotoxic activity by inhibiting microtubule assembly [23]. The structure of ustiloxin B consist of a Tyr-Ala-Ile-Gly (YAIG) tetrapeptide and contains unusual norvaline modification on the hydroxylated tyrosine residue (Figure 7). The biosynthetic origin of ustiloxin B remained unknown until the genome mining method MIDDAS-M was developed for the detection of natural product biosynthetic gene clusters in fungi [24,25]. MIDDAS-M is the abbreviation of motif-independent de novo detection algorithm for secondary metabolite biosynthetic gene clusters. By scoring transcription levels of all putative gene clusters in Aspergillus flavus under different culture conditions, the MIDDAS-M method identified the gene cluster for ustiloxin B, named the ust cluster, which was later confirmed by knockout studies. In depth sequence analysis of the cluster revealed that the precursor peptide UstA contains 16-fold repeats of the YAIG core, and that each repeated core is flanked by conserved ED and KR motifs which are likely necessary for recognition by post-translationally modifying enzymes ( Figure 8). Moreover, the N-terminal of UstA is a signal peptide-like sequence that also contains a KR motif. The overall organization of UstA, and the fact that KR motif is a known recognition site for Kex2 protease, which is a type of universal serine proteases, strongly suggest that ustiloxin B is a RiPP and that ustA encodes the precursor peptide [22]. Accordingly, the biosynthetic gene cluster of ustiloxin B in its original host, U. virens, was also characterized and confirmed to be a RiPP cluster [26]. The entire biosynthetic pathway of ustiloxin B has been studied in detail by gene inactivation, heterologous expression, and in vitro biosynthetic enzyme functional reconstitution (Figure 9a) [27]. Gene disruption studies revealed that the three genes ustQYaYb are essential to give the first intermediate 2.
UstQ is a tyrosinase homolog. UstYa/UstYb are mutual homologs containing the DUF3328 motif and have no homology with functionally known enzymes. Heterologous expression of ustQYaYb in Aspergillus oryzae gave 2 as the sole product. Based on these results, it was speculated that the UstA precursor is first digested into 16 trideca-/tetradecapeptides by Kex2 proteases before cyclization by UstQYaYb. However, the proteases that removes the N-and C-terminal sequences flanking the core are still unknown. UstM is a methyl transferase. Introduction of ustM into ustQYaYb transformants generated 3. UstF1/UstF2 are Class B bifunctional flavoprotein monooxygenases (FMO). Purified maltose binding protein (MBP)-tagged UstF1/UstF2 showed yellow color and strong absorption at 450 nm, indicating binding of flavin adenine dinucleotide (FAD). Incubating 4 with UstF1 in the presence of nicotinamide adenine dinucleotide phosphate (NADPH) resulted in 5, which was transformed into an entgegen/zusammen (E/Z) mixture of 6 after incubating with UstF2 in the presence of NADPH. Treatment of 6 with 0.1% trifluoroacetic acid (TFA) afforded 8, a hydrate form of 7. ustD gene showed homology with pyridoxal 5'-phosphate (PLP)-dependent enzyme. Incubating 8 with MBP-tagged UstD in the presence of PLP and aspartic acid generated ustiloxin B. The reaction mechanism of UstD was studied by incubating the enzyme with PLP and aspartic acid and treating the reaction mixture with dansyl chloride. The above experiment generated dansylated alanine, indicating that UstD catalyzes decarboxylation of aspartate to form an enamine, which acts as a nucleophile and reacts with 7 to give 1 (Figure 9b). Given that ustYa/ustYb are located near the precursor peptide gene ustA, combined queries of homologs of ustYa/ustYb and ustA identified 94 homologous clusters in Aspergilli genome sequences.

Asperipins
Guided by the finding of 94 precursor peptide gene candidates by querying ustYa/ustYb and ustA in combination, a new cyclic peptide asperipin-2a was isolated from Aspergillus flavus [28]. Although asperipin-2a has high homology to ustiloxins in their gene clusters, they have distinct structural characteristics. Asperipin-2a has a hexa-peptidic core sequence of FYYTGY, forming a bicyclic structure connected by ether linkages between tyrosine side chains and β-carbons ( Figure 10). The putative biosynthetic gene cluster for asperipin-2a is only composed of four genes: A precursor peptide gene aprA, a ustYa/ustYb homolog aprY, a transporter aprT, and an isoflavone reductase aprR. Heterologous expression of asperipin-2a gene cluster in Aspergillus oryzae showed that aprY is essential for biosynthesizing asperipin-2a, and indicated a sequential oxidative macrocyclization function for AprY [29].

Phomopsins
Phomopsins are a group of cyclic hexapeptide produced by the plant pathogenic fungus Phomopsis leptostromiformis. The structures of phomopsins are characterized by a 13-member macrocyclic ring formed by ether linkage between tyrosine and isoleucine ( Figure 11). They are potent antimitotic compounds that target the vinca domain of tubulin, causing liver disease in livestock fed on infected plants [30]. A RiPP gene cluster was confirmed to be responsible for producing phomopsins by analyzing the genome sequence of P. leptostromiformis ATCC 26115 [31]. Similar to ustiloxin B precursor peptide gene ustA, phomA, the precursor peptide gene for phomopsins is also arranged in the same pattern. The N-terminal of PhomA is a signal-peptide like leader sequence, followed by eight repeats of core peptide flanked by conserved KR motifs. Knockout studies showed that the tyrosinase PhomQ is essential for phomopsin biosynthesis and is likely involved in forming the cyclic scaffold. In vitro enzymatic assays revealed that the methyltransferase PhomM installs methyl groups onto the N-terminal α-amino group. A search for PhomA, PhomQ, and PhomM homologous proteins in the National Center for Biotechnology Information (NCBI) database resulted in the identification of 27 similar gene clusters, suggesting the presence of a family of fungal RiPP natural products. Because these compounds appear to associate with strains of the subkingdom Dikarya, the name "dikaritins" was proposed for this new family of peptides. A global sequence similarity network was constructed for all of the putative proteins from the identified gene clusters, showing that these gene clusters contain a set of highly conserved proteins, including PhomA homologs, PhomQ homologs, PhomR-like zinc finger transcription-regulating proteins, and S41 family peptidases. Noteworthy is the presence of DUF3328 proteins in all of the gene clusters, such as UstYa/UstYb and AprY, whose role in dikaritin biosynthesis remains to be elucidated [31].

Epichloёcyclins
Epichloёcyclins were discovered from grass endophytic fungi belonging to the genus Epichloё. MS/MS analyses indicated their structure characteristics to be a hepta-peptidic ring formed by oxidative cyclization on the tyrosine residue, and methylations on the lysine residue. Detailed structures of epichloёcyclins remain to be determined. The precursor peptide gene for epichloёcyclins was identified from fungal transcripts in endophyte-infected grasses and designated gigA (grass induced gene). GigA is composed of a signal sequence at its N-terminal, followed by four repeats of sequences containing the core peptide and conserved motifs, such as KR recognition site for Kex2 protease. Epichloёcyclins are the first example of RiPP natural products found in mutualistic symbiotic fungus, suggesting a possible bioactive role [32]. Whether epichloёcyclins belongs to the family of dikaritins or forms its own family of RiPP remains to be determined until the full biosynthetic gene cluster of epichloёcyclins can be identified.

Cyclotides
Cyclotides are plant derived RiPPs that are characterized by a head-to-tail cyclic peptide backbone and a signature cyclic cystine knot (CCK) motif [33]. They were discovered from plants of the Rubiaceae, Violaceae, Cucurbitaceae, and Fabaceae families [34][35][36][37][38]. Due to their insecticidal activities, cyclotides were thought to be plant defense agents. The broad range of other biological activities, such as antiviral, antimicrobial, and cytotoxic activities made cyclotides attractive for pharmaceutical applications [1]. The precursors of cyclotides can be present as dedicated proteins, similar to other RiPPs from bacteria and fungi. However, it was found that cyclotide precursors in Clitoria ternatea (Fabaceae family) are embedded within an albumin precursor, indicating RiPPs might be much more common than has been thought [36,37]. Cyclotide precursors consists of an endoplasmic reticulum (ER) domain, a pro-region (PRO), an N-terminal region (NTR), and one or more copies of the core sequence. The protease that removes the leader remains to be characterized. Butelase 1, a Asx-specific peptide ligase from cyclotide producing C. ternatea, was characterized to be responsible for cyclotide backbone cyclization [39]. Butelase 1 has high sequence homology with asparaginyl endopeptidase (AEP), and indeed showed AEP activity. However, it recognizes the C-terminal Asn/Asp-His-Val (D/NHV) sequence and is capable of cyclizing various peptides of plant and animal origin with high catalytic efficiencies ( Figure 12) [39]. A recent structural study revealed that the active site of butelase 1 has only subtle differences from conventional AEPs, suggesting its efficient macrocyclization activity may be attributed to its peptide binding region (Figure 13) [40]. A co-crystal structure of butelase 1 with its peptide substrate will help us understand the mechanism of macrocyclization. Due to its high promiscuity and fast kinetics, butelase 1 has been applied in protein labelling [41,42], chemoenzymatic synthesis of bacteriocins [43], generating cyclic peptides with non-native amino acids [44], decorating E. coli cell surfaces [45], making peptide dendrimers [46], and preparing C-to-C fusion proteins [47].

Orbitides
Orbitides refer to N-to-C cyclized plant peptides that do not contain disulfides. They are produced by at least nine plant families: Annonaceae, Caryophyllaceae, Euphorbiaceae, Lamiaceae, Linaceae, Phytolaccaceae, Rutaceae, Schizandraceae, and Verbenaceae [1]. Similar to cyclotides, orbitide precursor peptides also contain multiple copies of core sequences, resulting in a single precursor to be processed to multiple cyclic peptides. The biosynthetic pathway of orbitide segetalin A has been studied in detail ( Figure 14). The 32-amino acid precursor presegetalin A1 is first processed by the serine protease OLP1 to remove the N-terminal 15 residues. Then, the peptide cyclase PCY1 cleaves the C-terminal 13 amino acids, with concomitant macrocyclization of the remaining six residues to form segetalin A [48]. PCY1 is identified as a member of the S9A protease family that includes POP enzymes. Kinetic analysis showed that PCY1 has similar k cat values, and five~10-fold higher K M values comparing to butelase 1 involved in cyclotide macrocyclization. Crystal structures of PCY1 revealed its transamidation and cyclization mechanisms (Figure 15). Upon binding of the follower peptide, PCY1 is maintained in a closed state that precludes solvent from the active site, potentially limiting the competing hydrolysis reaction. A key residue His659 sits on a mobile loop, which contributes to two roles: Activating Ser nucleophile to form acyl-enzyme intermediate, and deprotonating the α-amine of the substrate for transamidation [49]. Using the obtained knowledge of PCY1 reaction molecular basis, a three residue C-terminal extension (F/I-Q-A/T) was designed to replace the native long recognition tail FQALDVQNASAPV, permitting PCY1 to work on synthetic substrates [50].

Animal RiPP
Marine snails, such as cone snails, are known to produce a variety of ribosomally synthesized and post-translationally modified peptide venoms. Those venomes are produced by predatory cone snails, injected into the prey, and lead to paralysis. The best characterized marine snail peptides are conopeptides, also known as conotoxins. It was estimated that more than 500 species of predatory marine Conus snails are capable of producing conotoxins [51,52]. Those Conus species can produce as many as 70,000 structurally diverse conotoxins [53,54]. Many conotoxins act by targeting ion channels, thus have been widely used as basic research tools in neuroscience. A number of conotoxins showed therapeutic potential due to their unparalleled potency and selectivity against a wide range of receptors and ion channels [55]. For example, Ziconotide, a calcium channel agonist isolated from Conus magus, was approved by the United States Food and Drug Administration (US FDA) in 2004 for the treatment of chronic pain. A detailed review of conopeptides discovery and biosynthesis was made in 2013, and more structures of conopeptides have been characterized since then [1,[56][57][58].
The precursor peptide sequences of conotoxins were studied by analyzing the transcriptomes of cone snails. Those studies revealed that conotoxin precursor transcript sequences consist of three regions: An ER signal peptide, a mature peptide region, and pre-/postpropeptide regions [59,60]. The ER signal peptide sequence is highly conserved, whereas the mature peptide region is highly diverse [60]. Types of conotoxin post-translational modifications include disulfide-bond formation, proline hydroxylation, O-glycosylation on serine or threonine residues, and glutamate γ-carboxylation [61,62]. A web-based ConoServer database (conoserver.org) was established to record known structures of conopeptides, classifications, post-translational modifications, and their general statistics. Due to that conopeptide biosynthetic genes are not organized in clusters, biosynthetic studies of animal RiPPs are extremely challenging. The details of conotoxin biosynthetic pathways and the mechanism and molecular basis for their post-translational modifications still remain largely unexplored [55].

Discussion
Eukaryotic RiPP pathways have some special features comparing to bacterial RiPP pathways. As described above, many fungal and plant RiPPs have N-terminal recognition sequences in their precursor peptides. Moreover, C-terminal signal sequences are also common in eukaryotic RiPP precursors. For example, in the case of cyclotides (Section 3.1), an ER signal sequence is present in their precursor peptides [63]. In addition, the core region of eukaryotic precursor peptides often has several repeats of the core sequence, flanked by conserved motifs. Although this manner is also found in cyanobactin biosynthesis, whose cores are present as repetitive cassettes, it is not common in other bacterial RiPP pathways [64]. Even more surprisingly, some eukaryotic RiPP precursors are not encoded as stand-alone genes, but rather as fusion or chimeric proteins. For example, omphalotin A precursor is fused to the C-terminal of a post-translationally modifying enzyme methyltransferase (Section 2.1.2), and some cyclotide precursors are embedded within an albumin precursor (Section 3.1). The presence of precursor peptides fused to other structural genes underscores the possibility that eukaryotic RiPP natural products are much more common than have been found. Interestingly, the biosynthesis of many eukaryotic RiPPs involves an N-C macrocyclization catalyzed by proteases. For example, amatoxins are macrocyclized by POPB enzymes, cyclotides are formed by a head-to-tail cyclization catalyzed by butelase 1, and the N-to-C cyclization of orbitides are catalyzed by PCY1 proteases [65]. The resulting macrocyclic peptides are more resistant to protease degradation in physiological environments, thus having more potential to be developed into novel therapeutics.
RiPP natural products are promising candidates for developing novel therapeutics, as many RiPPs have shown significant biological activities and great engineering potential [7]. The studies of eukaryotic RiPP biosynthesis are relatively more challenging than their bacterial counterparts, mainly due to the more complex genomic context. This challenge can be compromised by the development of optimal computational tools for mining eukaryotic genomes [2,66,67]. As more RiPP natural products are discovered, and more RiPP biosynthetic pathways are revealed, this group of fast expanding natural products will continue to provide compounds for industrial applications, and to inspire engineering efforts on enzymatic machineries.
Funding: This study was funded by the start-up funding from both Lanzhou University and State Key Laboratory of Applied Organic Chemistry.