Diversity of Linear Non-Ribosomal Peptide in Biocontrol Fungi

Biocontrol fungi (BFs) play a key role in regulation of pest populations. BFs produce multiple non-ribosomal peptides (NRPs) and other secondary metabolites that interact with pests, plants and microorganisms. NRPs—including linear and cyclic peptides (L-NRPs and C-NRPs)—are small peptides frequently containing special amino acids and other organic acids. They are biosynthesized in fungi through non-ribosomal peptide synthases (NRPSs). Compared with C-NRPs, L-NRPs have simpler structures, with only a linear chain and biosynthesis without cyclization. BFs mainly include entomopathogenic and mycoparasitic fungi, that are used to control insect pests and phytopathogens in fields, respectively. NRPs play an important role of in the interactions of BFs with insects or phytopathogens. On the other hand, the residues of NRPs may contaminate food through BFs activities in the environment. In recent decades, C-NRPs in BFs have been thoroughly reviewed. However, L-NRPs are rarely investigated. In order to better understand the species and potential problems of L-NRPs in BFs, this review lists the L-NRPs from entomopathogenic and mycoparasitic fungi, summarizes their sources, structures, activities and biosynthesis, and details risks and utilization prospects.


Introduction
Biocontrol fungi (BFs) play an important role in the control of agricultural and forestry pests. BFs include mainly entomopathogenic and mycoparasitic fungi (EFs and MFs). Entomopathogenic fungi are used extensively in agricultural and medical areas. Beauveria bassiana and Metarhizium anisopliae have been developed as commercial BFs to manage many insect pests worldwide; Cordyceps spp. have been used in traditional medicines in Asia for many years [1,2]. Mycoparasitic fungi such as Trichoderma spp. have been used to control soil-borne plant diseases at commercial scales [3,4]. BFs produce multiple secondary metabolites to interact with pests, plants and microorganisms for better adapting to their environments.
Secondary metabolites produced by BFs mainly include polyketides (PKs), terpenes and non-ribosomal peptides (NRPs). NRPs are synthesized by multidomain mega-enzymes named nonribosmal peptide synthetases (NRPSs), without ribosomes and messenger RNAs. NRPSs assemble numerous NRPs with large structural and functional diversity, including more than 20 marketed drugs with antibacterial (penicillin, vancomycin), antitumor (bleomycin) and immunosuppressant (cyclosporine) activities [5]. Apart from the 20 protein amino acids, NRPs also contain rare amino acids and other organic acids. The N-terminal of NPRs are often modified by fatty acids, heterocyclic compounds, glycosylated or phosphorylated structures [6]. NRPs are divided into linear (L-NRPs)
J. Fungi 2020, 6, x FOR PEER REVIEW 2 of 2 2 fatty acids, heterocyclic compounds, glycosylated or phosphorylated structures [6]. NRPs ar divided into linear (L-NRPs) and cyclic NRPs((C-NRPs). Due to lack of cyclization, L-NRPs hav peptide chains composed of multiple amino acids often modified by different fat chains o non-protein amino acids. L-NRPs have antimicrobial, insecticidal, antiviral or anticancer properties There are numerous studies and reviews on fungal non-ribosomal synthases and cyclic peptides [7 9]. However, little attention has been paid to L-NRPs from BFs. As the agricultural and medica importance of BFs, it is necessary to investigate the BFs L-NRPs source species, structure, activit and biosynthesis, as well as their potential risks.

Peptaibol Compounds
Peptaibols are a special kind of L-NRPs which have been found in a variety of soil fungi. Most of them are found in Trichoderma spp. which have been used to control plant disease [29]. To date, more than 500 peptaibols have been identified [30] and among them, 35 types were identified after 2000 [31]. Peptaibols are rich in non-protein amino acid and are often acetylated at the N-terminus and hydroxylated at the C-terminus. Peptaibols contain 5-20 aa residues forming α-helical conformation [32,33]. Moreover, peptaibols form ion channels on the lipid bilayer membrane. Those peptaibols with long sequences (12-20 aa) have the "barrel-stave" ion channel, while, the others with short sequences (5-11 aa) possess the "carpet" ion channel as a dimers with their N-termini connected ( Figure 5) [34]. Therefore, peptaibols can break the ion balance of cells leading to functional disorder of cell. They not only have antibacterial, cytotoxic activity, but also are teratogenic to the larvae of some marine organisms [35,36]. In BFs, there are three peptaibols reported.
J. Fungi 2020, 6, x FOR PEER REVIEW 5 of 21 of cell. They not only have antibacterial, cytotoxic activity, but also are teratogenic to the larvae of some marine organisms [35,36]. In BFs, there are three peptaibols reported. "carpet" ion channel model.

LP237
The entomopathogenic fungus Tolypocladium geodes (Beauveria geodes) produces LP237 with three analogs (Table 3) [39][40][41]. The highly helical structure of LP237 and the amphiphilic side chain of amino acids form a "barrel-stave" ion channel on the membrane, the Gln6, Gln7 and Gln10 in the peptide are on the same polar surface of the helix, forming a cavity of the ion channel, resulting in the membrane permeability activity of the peptide [42]. LP237 F8 is cytotoxic to P388D1 mouse leukemia cells and human tumor cells, such as lung cancer A549, ovarian cancer OVCAR3, colon cancer SW620 and breast cancer MCF7. It has synergistic actions with other anticancer peptides [42]. Table 3. Analogs of LP237.

LP237
The entomopathogenic fungus Tolypocladium geodes (Beauveria geodes) produces LP237 with three analogs (Table 3) [39][40][41]. The highly helical structure of LP237 and the amphiphilic side chain of amino acids form a "barrel-stave" ion channel on the membrane, the Gln 6 , Gln 7 and Gln 10 in the peptide are on the same polar surface of the helix, forming a cavity of the ion channel, resulting in the membrane permeability activity of the peptide [42]. LP237 F8 is cytotoxic to P388D1 mouse leukemia cells and human tumor cells, such as lung cancer A549, ovarian cancer OVCAR3, colon cancer SW620 and breast cancer MCF7. It has synergistic actions with other anticancer peptides [42]. Table 3. Analogs of LP237.

ACV
ACV is a tripeptide formed by condensation of L-aminoadipic acid, L-cysteine and L-valine ( Figure 7). It is a synthetic precursor of the antibiotics penicillin and cephalosporins [44]. ACV was isolated from Penicillium chrysogenum, Cephalosporins acremonium and Aspergillus nidulans. Interestingly, Penicillium chrysogenum is the most important ACV producer, it not only increases plant resistance to pathogens [45], but also has insecticidal activity to Bactrocera oleae [46]. ACV is synthesized by ACV synthase (ACVS) which was from fungi and bacteria [47][48][49].

ACV
ACV is a tripeptide formed by condensation of L-aminoadipic acid, L-cysteine and L-valine ( Figure 7). It is a synthetic precursor of the antibiotics penicillin and cephalosporins [44]. ACV was

Harzianins
Harzianins are named because first, identified in extract of Trichoderma harzianum. T. harzianum not only has a good inhibitory effect on plant pathogens, but also be used to control mosquito pests [50]. To date, there are up to 15 harzianin analogs ( Table 5). The HC type contains three kinks formed by the Aib-Pro motifs. The structures are 310-helices, which are embedded in the lipid layer to form a voltage-gated ion channel of the "barrel-stave" type, which increases the hydrophobicity and permeability of the lipid bilayer [51,52]. The PCU4 type are similar to HC, but with shorter chain [53]. Compared to HC type, the HB I is missing an Aib-Pro-Ala [54]. There are two Aib-Pro motifs in the HK VI, also in the 310-helices conformation [55]. The HA V only contains an Aib-Pro, forming the center hinge of Pro's α-helix structure [56].

Harzianins
Harzianins are named because first, identified in extract of Trichoderma harzianum. T. harzianum not only has a good inhibitory effect on plant pathogens, but also be used to control mosquito pests [50]. To date, there are up to 15 harzianin analogs ( Table 5). The HC type contains three kinks formed by the Aib-Pro motifs. The structures are 3 10 -helices, which are embedded in the lipid layer to form a voltage-gated ion channel of the "barrel-stave" type, which increases the hydrophobicity and permeability of the lipid bilayer [51,52]. The PCU4 type are similar to HC, but with shorter chain [53]. Compared to HC type, the HB I is missing an Aib-Pro-Ala [54]. There are two Aib-Pro motifs in the HK VI, also in the 3 10 -helices conformation [55]. The HA V only contains an Aib-Pro, forming the center hinge of Pro's α-helix structure [56].

Trichorzins
Trichorzins are 18 aa peptaibols with up to 10 analogs found in T. harzianum and T. virens (Table 6). Trichorzin PAs with six analogs found in T. harzianum show the higher activity against mycoplasma and spiroplasma [57,58]. Three TVB analogs were isolated from T. virens [59]. Trichorzins PAs have a polar C-terminus of tryptophan (Trpol) with affinity to the hydrophilic head of the phospholipid molecule in bilayer membrane, which is important for construction of a voltage-gated ion channel of these "barrel-stave" peptaibols [60].
T. longibrachiatum produces longibrachin (LG), which is a peptaibol with 20 aa residues. Six LG analogs were found. The A series LGs with four analogs (LG A I-IV) have the neutral Gln at 18th aa residues, while the B series (LG B II-III) are replaced with acidic Glu (Table 7) [64,65]. The negatively charged side chain Glu of LG B increases the oligomerization level of the ion channel and improves the transportation of substances [66].
LGs result in deformities of Crassostrea gigas larvae and may be neurotoxic to Calliphora vomitoria with an ED 50 of 270 mg/kg [64,67]. They are also toxic effect on KB cells (human oral epidermoid cancer cells) [64].
LGs show antibacterial activity against mycoplasma and Gram-positive bacteria.

Trilongins
Trilongins have 13 analogs with 11 or 20aa residues and are mainly found in T. longibrachiatum and T. atroviride (Table 9) [76,77]. The trilongin A series have 11aa residues with average molecular weight of 1175 Da, while the trilongin B and C series have 20aa residues with 1936-1965 Da. Trilongins are toxic to mammals. They destroy the mitochondria of boar sperm cells, remarkably, the mixtures of long and short sequences trilongins are more toxic [77]. Trilongins form voltage-gated K + /Na + ion channels, moreover, the combinations of A type with B/C type than the single type have synergistic effect to keep the ion channel open longer [77]. Table 9. Analogs of trilongins

Trilongins
Trilongins have 13 analogs with 11 or 20aa residues and are mainly found in T. longibrachiatum and T. atroviride (Table 9) [76,77]. The trilongin A series have 11aa residues with average molecular weight of 1175 Da, while the trilongin B and C series have 20aa residues with 1936-1965 Da. Trilongins are toxic to mammals. They destroy the mitochondria of boar sperm cells, remarkably, the mixtures of long and short sequences trilongins are more toxic [77]. Trilongins form voltage-gated K + /Na + ion channels, moreover, the combinations of A type with B/C type than the single type have synergistic effect to keep the ion channel open longer [77].

Alamethicins
Trichoderma viride (NRRL 3199), a BF widely distributed in nature and used to control soil-borne plant diseases [84,85], produces alamethicins with two analogs, B30 and B50 (Table 11) [86]. Each analogs has many derivatives with the absences of the N-terminal six residues or the C-terminal phenylalaninol (pheol) or the substitution of Ala of 6th residue with Aib or Gln of 7th and 19th residues with Glu. Alamethicins rich in Aib and have two Pro near the N-terminal and C-terminal, the N-terminal of the molecule forms a stable α-helix and the C-terminal exhibits a variable hydrogen bonding pattern [87]. Alamethicin is often used as a model ion channel for passive diffusion of voltage-gated cation ions [88,89].

Hypomurocins
Hypomurocins have 13 analogs (A and B series) (Table 14) purified from Hypocrea muroiana which is a BF not only inhibiting various plant diseases, but also promoting plant growth [96], excepting hypomurocin B that is found in Trichoderma harzianum [59,97]. Hypomurocin A has mixed helical conformation containing αand 3 10 -helices, as well as types I and III β-turn structures to link the helical [98][99][100]. Hypomurocin B consists of 18 amino acid residues to form the 3 10 -helical structure rather than by α-helical structure [101]. Hypomurocins inhibit Bacillus subtilis and causes hemolysis of rat erythrocytes, moreover, the activity of hypomurocin B is greater than that of hypomurocin A [96].

Peptaivirin
Peptaivirins are special peptaibols purified from Trichoderma spp. (KGT142). Peptaivirins have two analogs (Table 17), peptaivirins A and B, which show strong antiviral effects on TMV infection [106]. Peptaivirins are rich in Aib and have an N-terminus of acetylated phenylalanine.

Biosynthesis of L-NRPs
Non-ribosomal peptides are synthesized by non-ribosomal peptide synthase (NRPS) with multiple modules in some of the largest enzymes found in nature. The modules consist of different domains with specific catalytic activities. The core domains of NRPS include adenylation domain (A domain, recognizing and adenylating the initiation molecule), thiolation domain (T domain, also known as peptidyl carrier protein domain (PCP)) and condensation domain (C domain, catalyzing the corresponding monomers to bind to the new peptide) [8]. In addition to the infrastructure domains (A, T and C domains), NRPSs probably have epimerization domain (E domain), N-methylation domain (M domain) and others to modify the peptide [104]. Finally, there is a thioesterase domain (TE) in bacteria NRPSs or a similar condensation domain (CT domain) in fungal NRPSs to hydrolyze or cyclize the end of the target polypeptide [107,108].
NRPSs are divided into three categories, namely linear, iterative and nonlinear NRPS ( Figure 9). The linear NRPSs take C-A-T as the extension module, and the assembly results in the production of linear NRPs or cyclic NRPs. The iterative NRPS has multiple same modules and results in final product of oligopeptides or cyclic NRPs with the multiple residues of the same amino acids. The nonlinear NRPS has other modules (X) and the order of C-A-T is not necessary. It deviates completely from the standard domain organization leading to unexpected products [109]. L-NRPs are mainly synthesized by linear NRPs [109]. The number of modules determines the length of the peptide. Compared with cyclic NRPs, the main difference is whether initiation substrates in A domain have free hydroxyl and amidogen. After hydrolysis in the TE domain, internal esterification or lactam hydrolysis will occur [6]. The biosynthesis process of L-NRPs in biocontrol fungi is complex and few researches have been published, so we take ACV as an example to illustrate the biosynthesis of L-NRPs. Ac-Phe-Aib-Ser-Aib-Iva-Leu-Gln-Gly-Aib-Aib-Ala-Ala-Aib-Pro-Iva-Aib-Aib-Gln-Pheol

Biosynthesis of L-NRPs
Non-ribosomal peptides are synthesized by non-ribosomal peptide synthase (NRPS) with multiple modules in some of the largest enzymes found in nature. The modules consist of different domains with specific catalytic activities. The core domains of NRPS include adenylation domain (A domain, recognizing and adenylating the initiation molecule), thiolation domain (T domain, also known as peptidyl carrier protein domain (PCP)) and condensation domain (C domain, catalyzing the corresponding monomers to bind to the new peptide) [8]. In addition to the infrastructure domains (A, T and C domains), NRPSs probably have epimerization domain (E domain), N-methylation domain (M domain) and others to modify the peptide [104]. Finally, there is a thioesterase domain (TE) in bacteria NRPSs or a similar condensation domain (CT domain) in fungal NRPSs to hydrolyze or cyclize the end of the target polypeptide [107,108].
NRPSs are divided into three categories, namely linear, iterative and nonlinear NRPS ( Figure 9). The linear NRPSs take C-A-T as the extension module, and the assembly results in the production of linear NRPs or cyclic NRPs. The iterative NRPS has multiple same modules and results in final product of oligopeptides or cyclic NRPs with the multiple residues of the same amino acids. The nonlinear NRPS has other modules (X) and the order of C-A-T is not necessary. It deviates completely from the standard domain organization leading to unexpected products [109]. L-NRPs are mainly synthesized by linear NRPs [109]. The number of modules determines the length of the peptide. Compared with cyclic NRPs, the main difference is whether initiation substrates in A domain have free hydroxyl and amidogen. After hydrolysis in the TE domain, internal esterification or lactam hydrolysis will occur [6]. The biosynthesis process of L-NRPs in biocontrol fungi is complex and few researches have been published, so we take ACV as an example to illustrate the biosynthesis of L-NRPs. The PcbAB of ACV synthetase (ACVS) was cloned from P. chrysogenum, it measures 11,500 bp with the open reading frame (ORF) being 11,376 bp and coding for a protein of 3791 aa. The genes, PcbAB, PcbC (encoding cyclase) and PcbDE (encoding penicillin acetyltransferase) form a cluster in the 17 Kb DNA region to drive penicillin biosynthesis [110]. The ACVS genes of Cephalosporins acremonium and Aspergillus nidulans are similar to those of P. chrysogenum, with more than 60% similarity [111,112]. ACVS contains ten domains, three modules (M1, M2 and M3), in which M3 has the special domains E and TE domains to conduct epimerization of L-valine and the hydrolysis of ACV ( Figure 10) [113]. During biosynthesis, A domain in modules M1 chooses the suitable substrate The PcbAB of ACV synthetase (ACVS) was cloned from P. chrysogenum, it measures 11,500 bp with the open reading frame (ORF) being 11,376 bp and coding for a protein of 3791 aa. The genes, PcbAB, PcbC (encoding cyclase) and PcbDE (encoding penicillin acetyltransferase) form a cluster in the 17 Kb DNA region to drive penicillin biosynthesis [110]. The ACVS genes of Cephalosporins acremonium and Aspergillus nidulans are similar to those of P. chrysogenum, with more than 60% similarity [111,112]. ACVS contains ten domains, three modules (M1, M2 and M3), in which M3 has the special domains E and TE domains to conduct epimerization of L-valine and the hydrolysis of ACV ( Figure 10) [113]. During biosynthesis, A domain in modules M1 chooses the suitable substrate L-α-aminoadipic acid to activate and form an aminoacyl-AMP. Then, it combines with hydrosulfonyl of T domain to form aminoacyl-S-carrier complex and transferred to modules M2 and form cysteinyl-aminoacyl-S-carrier complex by combining with the activated cysteinyl-S-carrier. Then, it is transferred to M3 and condensated with the activated valinyl-S-carrier into valinyl-cysteinyl-aminoacyl-S-carrier complex. Finally, through intramolecular nucleophilic attacks in TE domain, the L-NRPs δ-(L-α-amino hexanedioyl)-L-cysteinyl-D-valine (ACV) is produced ( Figure 10). L-α-aminoadipic acid to activate and form an aminoacyl-AMP. Then, it combines with hydrosulfonyl of T domain to form aminoacyl-S-carrier complex and transferred to modules M2 and form cysteinyl-aminoacyl-S-carrier complex by combining with the activated cysteinyl-S-carrier. Then, it is transferred to M3 and condensated with the activated valinyl-S-carrier into valinyl-cysteinyl-aminoacyl-S-carrier complex. Finally, through intramolecular nucleophilic attacks in TE domain, the L-NRPs δ-(L-α-amino hexanedioyl)-L-cysteinyl-D-valine (ACV) is produced ( Figure 10).

Discussion
Although only 22 classes of L-NRPs are found in BFs to date, BFs absolutely have abundant diversity of L-NRPs. First, the BFs L-NRPs have diverse molecular structures, i.e., each class has multiple analogs and numerous derivatives with different configurations and conformations. Second, the BFs L-NRPs have multiple functions because each has diverse bioactivities among of antifungi, antibacteria, antiviruses, insecticides, acaricides, nematicides, herbicides or anticancers. Finally, the BFs L-NRPs have diverse distribution, i.e., one species of BFs has more than one class of L-NRPs, on the other hand, a same L-NRP can exist in different BFs species. For example, Trichoderma harzianum at least has two L-NRPs, harzianins and hypomurocin B, while Trchoderma longibrachiatum produces trichobrachins, trichogins, trilongins and trichokonins. Furthermore, efrapeptins are produced by Tolypocladium niveum, Tolypocladium geodes, Acremonium sp. and Metarhizium anisopliae, while trichokonins can be found in Trichoderma koningii, Trichoderma pseudokoningii and T. longibrachiatum.
Interestingly, more L-NRPs have been found in mycoparasitic fungi than in entomopathogenic fungi, especially in the common entomopathogens, such as Beauveria, Metarhizium and Isaria. The main reason may be related to the L-NRPs characteristics of easy hydrolysis [114]. Insects have many proteases especially in their midguts, so if entomopathogens secrete L-NRPs into insect's body, they will be hydrolyzed soon. However, C-NRPs are difficult to degradation in insects. On the contrary, mycoparasitic fungi usually live in soil and interact with phytopathogens or other microorganisms-in an environment with less proteases. Therefore, the L-NRPs secreted by mycoparaites may persist for a longer time, which has beneficial influences on surrounding microorganisms. The diversity of NRPs is ensured by NRPS through different organizations of domains and modules. The A domains with various structures can select different substrate amino or fatty acids to provide the diverse composition of peptide chain. Undoubtedly, to adapt environments, BFs must take the least costs to obtain the best NRPS genes. Such, the co-evolution of BFs and these target lives leads to less L-NRPs existing in entomopathogenic fungi than in mycoparasitic fungi.
NRPs as drug resources attract much attention of researchers. BFs L-NRPs have the potential as pesticides and medicines as well. For example, bleomycin has been used to treat cancers [5]. ACV as

Discussion
Although only 22 classes of L-NRPs are found in BFs to date, BFs absolutely have abundant diversity of L-NRPs. First, the BFs L-NRPs have diverse molecular structures, i.e., each class has multiple analogs and numerous derivatives with different configurations and conformations. Second, the BFs L-NRPs have multiple functions because each has diverse bioactivities among of antifungi, antibacteria, antiviruses, insecticides, acaricides, nematicides, herbicides or anticancers. Finally, the BFs L-NRPs have diverse distribution, i.e., one species of BFs has more than one class of L-NRPs, on the other hand, a same L-NRP can exist in different BFs species. For example, Trichoderma harzianum at least has two L-NRPs, harzianins and hypomurocin B, while Trchoderma longibrachiatum produces trichobrachins, trichogins, trilongins and trichokonins. Furthermore, efrapeptins are produced by Tolypocladium niveum, Tolypocladium geodes, Acremonium sp. and Metarhizium anisopliae, while trichokonins can be found in Trichoderma koningii, Trichoderma pseudokoningii and T. longibrachiatum.
Interestingly, more L-NRPs have been found in mycoparasitic fungi than in entomopathogenic fungi, especially in the common entomopathogens, such as Beauveria, Metarhizium and Isaria. The main reason may be related to the L-NRPs characteristics of easy hydrolysis [114]. Insects have many proteases especially in their midguts, so if entomopathogens secrete L-NRPs into insect's body, they will be hydrolyzed soon. However, C-NRPs are difficult to degradation in insects. On the contrary, mycoparasitic fungi usually live in soil and interact with phytopathogens or other microorganisms-in an environment with less proteases. Therefore, the L-NRPs secreted by mycoparaites may persist for a longer time, which has beneficial influences on surrounding microorganisms. The diversity of NRPs is ensured by NRPS through different organizations of domains and modules. The A domains with various structures can select different substrate amino or fatty acids to provide the diverse composition of peptide chain. Undoubtedly, to adapt environments, BFs must take the least costs to obtain the best NRPS genes. Such, the co-evolution of BFs and these target lives leads to less L-NRPs existing in entomopathogenic fungi than in mycoparasitic fungi.
NRPs as drug resources attract much attention of researchers. BFs L-NRPs have the potential as pesticides and medicines as well. For example, bleomycin has been used to treat cancers [5]. ACV as a precursor compound of penicillin has been concerned for long times [52]. Although NRPs are currently not used in agricultural area, the further studies are valuable. However, the more important is the risks of L-NRPs in BFs. As many L-NRPs are toxic, they can hazard human health and non-target beings once they enter the food chain in the process of agricultural application. Although NRPs produced by BFs have little probability to enter food chain [2,9,115], caution must be exercised. It is necessary that adequate risk assessments are conducted before using BFs.
In conclusion, there are 22 classes L-NRPs found in BFs currently. They have abundant diversity including various structures, functions and distributions. The NRPSs through different compositions of domains and modules accomplish biosynthesis of deferent L-NRPs. Mycoparasitic fungi than entomopathogenic fungi produce more L-NRPs, it is maybe because the co-evolutions of fungi with their hosts lead to NRPSs in these two fungi. BFs L-NRPs have the potential as pesticides and medicines. However, the risks of L-NRPs contaminating foods and environment need be paid more attentions.