Diversity of Nonribosomal Peptide Synthetases Involved in the Biosynthesis of Lipopeptide Biosurfactants

Lipopeptide biosurfactants (LPBSs) consist of a hydrophobic fatty acid portion linked to a hydrophilic peptide chain in the molecule. With their complex and diverse structures, LPBSs exhibit various biological activities including surface activity as well as anti-cellular and anti-enzymatic activities. LPBSs are also involved in multi-cellular behaviors such as swarming motility and biofilm formation. Among the bacterial genera, Bacillus (Gram-positive) and Pseudomonas (Gram-negative) have received the most attention because they produce a wide range of effective LPBSs that are potentially useful for agricultural, chemical, food, and pharmaceutical industries. The biosynthetic mechanisms and gene regulation systems of LPBSs have been extensively analyzed over the last decade. LPBSs are generally synthesized in a ribosome-independent manner with megaenzymes called nonribosomal peptide synthetases (NRPSs). Production of active-form NRPSs requires not only transcriptional induction and translation but also post-translational modification and assemblage. The accumulated knowledge reveals the versatility and evolutionary lineage of the NRPSs system. This review provides an overview of the structural and functional diversity of LPBSs and their different biosynthetic mechanisms in Bacillus and Pseudomonas, including both typical and unique systems. Finally, successful genetic engineering of NRPSs for creating novel lipopeptides is also discussed.

a module. The order of modules is usually co-linear to the product peptide sequences (Figure 1). Each module is composed of specific domains that are responsible for catalyzing different enzymatic activities.
The adenylation (A) domain is responsible for amino acid recognition and adenylation at the expense of ATP to form an acyl-adenylate intermediate. Then, the adenylated amino acid covalently binds to a phosphopantetheine carrier of the adjacent thiolation (T) or peptidyl carrier protein (PCP) domain. Peptide bond formation of two consecutively bound amino acids is catalyzed by the condensation (C) domain. Modification domains, such as the epimerization (E) domain, catalyze the conversion of L-amino acids to D-isomers, and they are typically associated with the module that incorporates D-amino acids. Lastly, cyclization and release of the product peptide are carried out by C-terminal thioesterase (Te) domain that is associated with a termination module. A number of gene clusters encoding NRPSs for LPBS biosynthesis in both Bacillus and Pseudomonas have been cloned and characterized ( Figure 1). These gene clusters show striking similarities in the modular architecture of their repetitive catalytic units and assembly-line mechanism. However, several unique features have also been identified.

Bacillus
Most LPBSs from Bacillus can be classified into three families: surfactin, fengycin, and iturin [16]. There are several Bacillus strains that have the ability to produce three families of LPBSs simultaneously [17][18][19][20]. In addition to these three families, several other lipopeptides have also been identified in Bacillus species (Table 1). Characteristics of these LPBSs families and corresponding NRPSs are as follows.

Surfactin and Lichenysin Synthetases
Surfactin and lichenysin are structurally related LPBSs produced by B. subtilis and B. licheniformis [3,7,21]. Several other forms of surfactin with amino acid variation at position 2, 4, and 7 have been reported [22]. Surfactin carries strong surfactant properties by reducing the surface tension of water from 72 to 27 mN/m at a critical micelle concentration (CMC) of 25-220 mg/L depending on its variants and determined conditions [3,23]. A surfactin-like compound termed lichenysin is at least a 2-fold more efficient biosurfactant than surfactin, probably due to the replacement of Glu1 by Gln1 [6,23]. Surfactin was first identified as an inhibitor of fibrin clot formation. It also exhibits anti-microbial, anti-tumor, anti-viral, and hemolytic properties [24]. Surfactin is required for the biofilm formation of producing cells [25,26], swarming motility [27,28], and fruiting body formation [29]. However, surfactin also inhibits biofilm formation of other bacteria by interfering with attachment of the cells to surfaces [30].  Other Antiadhesin FA-β-OH-L-Asp-L-Leu-L-Leu-L-Val-L-Val-L-Glu-L-Leu FA-X-Thr-X-Gly-X-Ala-X-Ser-X-His-X-Gln-X-Gln [45] These LPBSs are usually a mixture of compounds with different lengths and types of fatty acid (FA), β-hydroxy FA (FA-β-OH), β-amino FA (FA-β-NH 2 ) or guanidylated-β-OH FA (gFA-β-OH). The β-OH or β-NH 2 group of FA forms an ester or peptide bond with the carboxyl group of the C-terminal amino acid. For fengycin, circulocins, fusaricidin, and kurstakin, the carboxyl group of the C-terminal amino acid is lactonised with the hydroxyl group of Tyr 3 , Thr 1 , Thr 1 , and Ser 4 , respectively.
Other 14 CO-X-Gly-X-Ser-X-Thr-X-Leu-X-Leu-X-Ser-X-Leu-X-Leu/Val  The biosynthetic gene clusters of surfactin [69] and lichenysin [32], namely srfA and lic, are highly homologous and extend over 25 kb (Figure 1). They contain four open reading frames (ORFs), srfA-A/licA, srfA-B/licB, srfA-C/licC, and srfA-Te/lic-Te. The amino acid sequences of the first three ORFs are homologous to other NRPSs whereas the last ORF encodes a putative type II Te. SrfA-A/B/C and LicA/B/C are composed of three, three, and one module(s), and each ORF can be further subdivided into functional domains. SrfA and Lic bear six typical C-domains that catalyze amide bond formation. An additional C-domain, the N-acyl domain, is located at the N-terminal domain of the first module, suggesting that the first amino acid is initially N-acylated with a β-hydroxy fatty acid in this domain [32,70]. Recently, Kraas and coworkers (2010) have shown that the N-acyl domain in SrfA transfers CoA-activated 3-hydroxy fatty acid to the first T-domain where the N-terminal Glu is bound [71]. SrfA and Lic contain the conventional E-domains essential for the transformation of L-amino acids to D-amino acids and a C-terminal type I Te-domain that releases the final product. In addition to the C-terminal type I Te-domain, SrfA and Lic have an external type II Te protein, SrfA-Te/LicTe. A decreased production of surfactin (84%) is observed in the SrfA-Te mutant [72]. The external type II Te is involved in regenerating misprimed T-domains by removing short acyl chains from the 4'-phosphopantetheine cofactors and thereby regenerates functional NRPSs [73]. Moreover, a study has suggested that the type II Te also hydrolyzes incorrectly loaded amino acids that are not processed by the nonribosomal machinery [74]. SrfA-Te also functions as the thioesterase/acyltransferase that supports and stimulates the formation of β-hydroxymyristoyl-glutamate, an initiation substrate of surfactin synthesis [75,76].

Fengycin Synthetase
Fengycin, also referred to as plipastatin when Tyr 3 and Tyr 9 is present as the L-and D-form, repectively. It is an anti-fungal antibiotic that inhibits filamentous fungi but is ineffective against yeast and bacteria. It is also capable of inhibiting phospholipase A 2 and biofilm formation of several bacteria [4,37,77,78]. These types of lipodecapeptides are produced by various strains of Bacillus spp. and exhibit moderate surfactant activities [4,17,79]. Fengycin is expected to form a lactone between the hydroxyl group of L-Tyr 3

Bacillomycin, Iturin, and Mycosubtilin Synthetases
The iturin family comprises bacillomycin, iturin, and mycosubtilin, which are cyclic lipoheptapeptides linked by a β-amino acid residue. Members of this family have strong antibiotic activity, moderate surfactant activity, and enhanced swarming motility [17,38,81]. The NRPS gene cluster of bacillomycin D (bam/bmy), mycosubtilin (myc), and iturin A (itu) is composed of four large ORFs [18,40,82,83]. bam and bmy are identical gene clusters found in B. subtilis AU195 and B. amyloliquefaciens FZB42, respectively. The gene encodes multifunctional hybrid enzymes of a fatty acid synthase, an aminotransferase, and peptide synthetases. The first ORF-bmyD, ituD, and fenF-encodes malonyl-CoA transacylase. The second ORF-bmyA, ituA, and mycA-encodes acyl-CoA ligase, acyl carrier protein (ACP), β-ketoacyl synthetase and aminotransferase domains before a conventional module of NRPS. The MycA loading module activates free fatty acids through an acyl-adenylate intermediate and loads on the adjacent ACP1 domain [84]. Meanwhile, FenF reveals broad acyl-substrate specificity and loads malonyl-CoA onto ACP2 in MycA [85]. The aminotransferase domain catalyzes the transfer of an amino group to the β-position of the growing acyl chain [86]. The resulting β-amino fatty thioester is then presumably passed on to the third and fourth ORFs that encode four and two functional modules of a typical NRPS, respectively. Mycosubtilin and iturin A have almost the same structure except that D-Ser 6 and L-Asn 7 residues in mycosubtilin are inverted to D-Asn 6 and L-Ser 7 in iturin A. Amino acid sequence homology between the two A-domains for Ser and Asn in the two synthetases is high, suggesting that an intragenic domain change occurred in either Myc or Itu synthetase to evolve a counterpart gene [82].

Fusaricidin Synthetase
Fusaricidin is a unique hexapeptide linked to guanidylated β-hydroxyl fatty acid that possesses a potent anti-fungal activity produced by Paenibacillus polymyxa PKB1 (formerly called Bacillus polymyxa). It is a candidate for a biocontrol medicine used to treat blackleg disease [44]. Fusaricidin synthetase (FusA) comprises six NRPS modules that are encoded by a single ORF. The second, fourth, and fifth, modules of FusA incorporate D-amino acids and carry E-domains. However, no E-domain was detected in the sixth module that would incorporate D-Ala [44].
To date, three different mechanisms have been reported for incorporation of D-amino acids in the LPBS products. In the typical NRPS system in Gram-positive bacilli, an E-domain responsible for epimerization of L-amino acids to the D-forms is located downstream of D-amino acid-incorporating modules. A second mechanism is the direct incorporation of D-amino acids by the respective A-domains. This system is found in eukaryotic fungal NRPS systems such as cyclosporine and HC-toxin [87,88]. A third system for incorporating D-amino acid is a novel type of C-domain with dual epimerization and condensation activities (C/E domain), which has recently been identified in several NRPSs in both actinomycete and Gram-negative pseudomonads [89].
Regarding FusA synthetase, two different mechanisms for incorporation of D-amino acids are employed. Conversion from L-to D-amino acid in the three modules of FurA synthetase is mediated by conventional E-domains, whereas direct incorporation of D-Ala is found in the last module that does not contain the E-domain or C/E domain [44]. Thus, horizontal gene transfer has potentially occurred between Gram-positive bacterial and fungal NRPS genes.

Pseudomonas
Pseudomonas also produces a variety of cyclic lipopeptides. Recently, lipopeptides of pseudomonads have been classified into six groups: viscosin, syringomycin, amphisin, putisolvin, tolaasin, and syringopeptin [90]. In addition to these six main LPBS groups, other lipopeptides have been also identified in Pseudomonas species, but no cyclic lipopeptides linked by a β-amino acid residue have been reported (Table 2).

Syringomycin Synthetase
Syringomycin is a phytotoxin and a key determinant of Pseudomonas syringae B301D virulence. It has moderate surfactant activity with a CMC of 1250 mg/L and minimum surface tension of 33 mN/m [91]. Syringomycin is synthesized by two NRPSs (SyrB1, SyrE) and three modifying protein systems (SyrB2, SyrC, SyrP). SyrB1 and SyrE do not follow the co-linearity rule and also lack E-domains. Eight modules for the first eight amino acids in SyrE are arranged in a line, but the ninth module (SyrB1), which is necessary for incorporation of the last amino acid (L-Thr 9 ), is located in the upstream region [51]. This observation suggests that absolute co-linearity is not essential for NRPS synthesis, which is similar to SrfA data [92]. L-Thr 9 is activated and loaded by SyrB1 and is then chlorinated to 4-Cl-L-Thr by the non-heme Fe(II) halogenase SyrB2 [93]. This intermediate is transferred from the T-domain of SyrB1 to SyrE by aminoacyltransferase SyrC to form the final product [94]. Hydroxylation of Asp at module 8 is catalyzed by SyrP, whose gene is located upstream of syrB1 [95]. Although three amino acid residues in syringomycin are in the D-form, the E-domain is not associated with the modules incorporating the respective D-amino acids. However, Balibar and coworkers (2005) demonstrated that SyrE contains unique dual C/E domains, which contribute to the conversion of L-amino acids to the D-form [89].

Syringopeptin Syntheatase
P. syringae B301D also produces another class of lipodepsipeptide phytotoxins called syringopeptin. Syringopeptin contains a larger peptide moiety than syringomycin, with 22 or 25 amino acid residues, and is one of the largest LPBS ever reported. Syringopeptin has a CMC of 820 mg/L and reduces surface tension to 40.2 mN/m [91]. Three NRPSs, SypA, SypB, SypC, are involved in the biosynthesis of syringopeptin. The order and number of the modules are co-linear to the amino acid sequence of syringopeptin SP22. SypA/B/C represents the largest NRPSs among those reported for prokaryotes. Similar to Syr synthetase, no E-domain is present in Syp synthetase, despite the presence of several D-amino acids. In contrast to the Syr synthetase, SypC contains two unique C-terminal Te-domains predicted to catalyze the release and cyclization of syringopeptin [96].

Arthrofactin Synthetase
Arthrofactin is a cyclic lipoundecapeptide produced by Pseudomonas sp. MIS38, which was initially misidentified as Arthrobacter sp. [56], and belongs to the amphisin group. The molecule is cyclized through the formation of an ester bond between the carboxyl group of the C-terminal Asp and the β-hydroxyl group of D-allo-Thr [97]. Arthrofactin is one of the most effective cyclic LPBSs; it reduces the surface tension of water from 72 to 24 mN/m with a CMC of 13.5 mg/L [56]. Arthrofactin appears to be essential for swarming activity and inhibits initial attachment of the planktonic cells in biofilm formation [98]. Several arthrofactin-like compounds with remarkable biosurfactant and antifungal properties or an enzyme inhibitor have been reported from Pseudomonas spp. [9][10][11][12]57]. Like arthrofactin, amphisin is involved in swarming motility of Pseudomonas sp. DSS73 [99].
Biosynthesis of arthrofactin is catalyzed by the arthrofactin synthetase (Arf), which consists of three NRPS protein subunits, ArfA (234 kDa), ArfB (474 kDa), and ArfC (648 kDa), which contain two, four, and five functional modules, respectively ( Figure 2) [98]. An additional C-domain was identified in the first module of ArfA, suggesting that the first amino acid could be initially acylated with a fatty acid. Site-directed mutagenesis changing the histidine residue of conserved core motif (HHXXXDG) to alanine impairs arthrofactin production [100]. This result suggested that the first C-domain is essential for biosynthesis of lipopeptide. Indeed, the β-hydroxydecanoyl thioester may be coupled to the activated leucine by the action of this C-domain to yield β-hydroxydecanoyl-L-Leu as the initial intermediate. A phylogenetic tree showed that the first C-domain of Arf belongs to N-acyl groups that use fatty acyl-CoA as their starter unit [70]. Although seven of the 11 amino acid residues in arthrofactin are in the D-form, Arf contains no E-domains, as found in syringomycin and syringopeptin [98]. The A-domain of D-Leu 1 specifically recognizes only L-Leu in vitro. Based on these observations, we initially hypothesized that an external racemase may be responsible for incorporation of the D-amino acids in arthrofactin. Different amino acid sequences downstream of a conserved core motif [FFELGGHSLLA(V/M)] in the T-domains were expected to reflect the recognition by external racemase. However, Balibar and coworkers later demonstrated that Arf contains unique dual C/E domains, which contribute to the conversion of L-amino acids to the D-form [89]. This novel C/E domain is cryptically embedded with the C-domain located downstream of the D-amino acid-incorporating modules. Dual C/E domains can be recognized by an elongated His motif (HHI/LXXXXGD). This feature was also identified in the Syr and Syp synthetases. Another unique characteristic of Arf is the presence of C-terminal tandem Te-domains like syringopeptin. By site-directed mutagenesis, the first Te-domain (ArfC-Te1) was shown to be essential for the completion of macrocyclization and the release of the final product. The second Te-domain (ArfC-Te2) was suggested to be involved in the evolution of Arf to improve the macrocyclization efficiency [101]. Moreover, we found that the gene encoding putative ArfA/B/C exists in the genome sequence of Pseudomonas fluorescens Pf0-1 (YP_347943/YP_347944/YP_347945) [102]. Arf represents a novel NRPS architecture that features tandem Te-domains and dual C/E domains. Interestingly, another type of NRPS involved in biosynthesis of a siderophore, pyoverdine, was also identified in arthrofactin-producing Pseudomonas sp. MIS38. A gene encoding NRPS for the chromophore part of pyoverdine contains a conventional E-domain [102]. This observation suggests that different NRPS systems with dual C/E domains and a conventional E-domain are both functional in Pseudomonas spp.

Viscosin and Massetolide Synthetases
Viscosin and massetolide are structurally related lipononapeptides produced by P. fluorescens SBW25 and P. fluorescens SS101, respectively [46,48]. Viscosin has significant surfactant activity by reducing surface tension of water to 28 mN/m with a CMC of 10-15 mg/L and forms stable emulsions [46,103]. It also inhibits migration of a metastatic prostate cancer cell line without visible toxicity [103]. In addition, viscosin and massetolide are required for biofilm formation and swarming motility of Pseudomonas cells [46,48]. Viscosin/massetolide is synthesized by NRPS systems that are encoded by three large ORFs, termed viscA/massA, viscB/massB, and viscC/massC. The viscA/massA gene is not clustered with the latter genes, but is located at a different locus of the Pseudomonas genome. The distance between viscA/massA and the latter genes is more than 1.5 MB. Analysis of the amino acid sequences revealed two modules in ViscA/MassA, four modules in ViscB/MassB, and three modules in ViscC/MassC. Each module bears A-, T-, and C-domains like other NRPSs. However, none of the five D-amino acid-incorporating modules possesses a cognate E-domain, but contains a C/E domain similar to Arf. Tandem Te-domains were also identified in the last ViscC/MassC module and are likely to be functional for the biosynthesis of both lipopeptides as was shown for the two Te-domains in Arf [101]. Similar to other NRPSs involved in lipopeptide biosynthesis, the N-terminal C-domain in the first module is highly similar to the N-acyl domain and is presumably involved in N-acylation of the first amino acid.

Orfamide Synthetase
Orfamide and its biosynthetic genes (ofaA/B/C) were discovered from the P. fluorescens Pf-5 genome using a genome isotope approach that employs a combination of genome sequence analysis and isotope-guided fractionation to identify the corresponding compounds [66]. Orfamide is a lipodecapeptide consisting of a β-hydroxy fatty acid linked to a 10-amino acid cyclic peptide in which five amino acids are in the D-form. Orfamide is essential for swarming activity and exhibits strong zoosporicidal activity, but it is not involved in biofilm formation [66]. Although orfamide is composed of 10 amino acids, its structure is most similar to the lipononapeptide viscosin, suggesting a common evolutionary lineage. It is also interesting that the gene structure, ofaABC, is rather similar to arfABC (Figure 1). Structural analysis of OfaA/B/C identified ten modules. The first NRPS, OfaA, consists of two modules with an N-acyl domain at its N-terminus. Four modules were identified in the second NRPS, OfaB, and the last NRPS, OfaC. Similar to other Pseudomonas NRPSs, no cognate E-domains are found in Ofa modules. Although five amino acid residues in orfamide are in the D-form, Ofa seems to contain a total of six dual C/E domains. This inconsistency suggests that the NRPS system is more complex than previously thought. Tandem Te-domains were found in the C-terminus of OfaC, similar to several NRPSs from other Pseudomonas species.

Putisolvin Synthetase
Putisolvin is a LPBS synthesized by P. putida PL1445, which was isolated from soil heavily contaminated with polycyclic aromatic hydrocarbons. Putisolvin is a cyclic lipododecapeptide consisting of a 12-amino acid peptide linked to a hexanoic lipid by an ester linkage between the ninth serine residue and the C-terminal carboxyl group [58]. Putisolvin inhibits biofilm formation of other bacteria and exhibits zoosporicidal and antifungal activities [58,104]. Three genes (psoA, psoB, and psoC) were identified and shown to encode NRPS involved in putisolvin biosynthesis [105]. PsoA, PsoB, and PsoC contain two, seven, and three functional modules, respectively. The C-terminus in PsoC carries putative tandem Te-domains. Both domains harbor a highly conserved signature sequence (GXSXG) and the catalytic triad residues of Te-domains. Nine of the 12 amino acids are in the D-form, but no conventional E-domains were identified [105]. Analysis of specific sequence motifs in the T-domains suggested that the first nine T-domains in Pso synthetase are responsible for transferring D-amino acids [98]. Amino acid sequence analysis of the C-domains indicated that dual C/E domains are organized downstream of the first nine modules. Prediction of A-domain substrate specificity in the eleventh module indicates its preference for Val over Leu or Ile, which correlates well with the production ratios of putisolvin I and II [105].

Syringafactin Synthetase
Syringafactin is a novel linear lipooctapeptide produced by P. syringae pv. tomato DC3000. It contains an eight-amino acid linear peptide linked to a β-hydroxy fatty acid. The Val 4 residue can be substituted with Leu or Ile. Syringafactin shows surfactant activity and is essential for the swarming motility of the producing strain, but its contribution to the pathogenicity has not been tested [68]. Syringafactin biosynthetic genes were identified from mining the P. syringae pv. tomato DC3000 genome. The gene clusters syfA and syfB encode three and five NRPS modules, respectively. The N-acyl domain present in the initiating module of SyfA indicated that syringafactin would contain an N-terminal fatty acid chain. SyfB contains tandem Te-domains at the C-terminus. The D/L-configuration of each residue has not been determined. However, based on the location of the dual C/E domains that are typically located downstream of D-amino acid-incorporating modules, the structure of syringafactin should be fatty acyl-D-Leu 1 -D-Leu 2 -D-Gln 3 -Leu 4 -D-Thr 5 -Val 6 -D-Leu 7 -Leu 8 , which differs from a previous report [68]. The N-terminal C-domain of SyfA shows the highest level of amino acid sequence similarity with the N-terminal C-domain of ArfA. It seems likely that protein domains corresponding to the first three modules of the arthrofactin NRPS are absent in syringafactin NRPS. This observation suggests that the syringafactin NRPS system in P. syringae pv. tomato DC3000 evolved from the arthrofactin system, after which three modules of the arthrofactin NRPS were deleted, resulting in the fusion of the N-terminus of ArfA with a portion of ArfB. Importantly, the deleted modules include the module that incorporates the threonyl residue that forms the ester linkage involved in cyclization of arthrofactin. Indeed, the structure of syringafactin is reported to be a linear form.

Entolysin Synthetase
Entolysin is a cyclic lipotetradecapeptide produced by an entomopathogenic bacterium Pseudomonas entomophila. This bacterium is able to infect and effectively kill various insects, and it is closely related to the saprophytic soil bacterium P. putida. Entolysin has a relatively small cyclic peptide moiety in which a lactone ring is formed between the tenth and the last amino acid. Entolysin is required for swarming motility and exhibits hemolytic and surfactant activity as described for other lipopeptides, but it does not participate in the virulence of the producing strain for killing Drosophila [65]. Three genes encoding entolysin synthetases were identified (etlA, etlB, and etlC). The deduced amino acid sequences are similar to other NRPSs and closely related to Pso synthetase in P. putida PCL1445. The etlA gene is not physically linked with etlB and etlC in the P. entomophila genome. This organization has also been reported for the viscosin and massetolide gene clusters. EtlA, EtlB, and EtlC comprise two, eight, and four functional modules of NRPS, all of which correspond to the number of amino acid residues in the product peptide. These modules are composed of typical domains. However, no cognate E-domains have been identified in EtlA/B/C. Amino acid sequence analysis indicated that all of the C-domains but C12 and C13 could function as dual C/E domains. In addition, the first C-domain of EtlA is similar to the N-acyl domain of other lipopeptides, and tandem Te-domains were identified in the C-terminus of EtlC [65].

Gene Regulation in Bacillus
Gene regulation of LPBSs produced by Bacillus spp. has been most intensively investigated within the surfactin biosynthesis system (Figure 3). Expression of the gene srfA is controlled by several peptide pheromones including ComX and Phr [106]. B. subtilis encodes eight Phr peptides (PhrA, PhrC [CSF], PhrE, PhrF, PhrG, PhrH, PhrI, and PhrK) and 11 aspartyl-phosphate phosphatase proteins (RapA to RapK). Each Phr peptide inhibits the activity of cotranscribed Rap protein. RapC, RapF, and RapK act as negative regulators of srfA [107]. ComX interacts with the membrane-bound histidine kinase ComP, which autophosphorylates upon stimulation and then transfers its phosphate to a serine residue in the response regulator ComA. The phosphorylated ComA binds to ComA boxes (T/GCGG-N4-CCGCA) upstream of the srfA promoter as a tetramer and initiates transcription of srfA [108,109]. Recently, it was found that ComA binds to a degenerate tripartite sequence consisting of three recognition elements (RE). RE1 and RE2 contain the inverted repeats previously characterized as part of the ComA-boxes. Meanwhile, RE3 is located downstream of RE1 and RE2 with a consensus sequence identical to that of RE1 [110]. In addition, mutation at three non-aspartate amino acids in the N-terminal portion of ComA decreases surfactin production [111]. These three amino acids may be involved in the phosphorylation mechanism. It was previously reported that glucose can stimulate the transcription of comA and consequently increases the expression of srfA [112,113]. Induction of srfA requires the oligopeptide permease Spo0K, which is involved in PhrC import [114]. On the other hand, expression of srfA is downregulated upon treatment with H 2 O 2 ; this evidence led to identification of the H 2 O 2 stress responsive regulator PerR. PerR positively regulates srfA expression by binding to PerR boxes located in the upstream region of ComA boxes in which H 2 O 2 inhibits the DNA-binding activity of PerR [115]. Furthermore, the chaperon subunit ClpX and protease ClpP are required for the transcription of srfA at a step that follows ComP-dependent activation of ComA [116]. An additional transcription factor DegU also functions as a positive regulator for srfA transcription [117]. Overexpression of RapD, RapG, and RapH inhibits srfA transcription, and production of these Rap proteins is suppressed by RghR [118,119]. Mutation in sodA, which encodes superoxide dismutase, inhibits transcription of the comQXP quorum-sensing locus, thereby preventing srfA expression [120]. At high concentrations of amino acids such as Ile, Leu, and Val, CodY represses transcription of srfA by interacting specifically with the srfA promoter [121]. Like CodY, AbrB also negatively regulates srfA transcription. Expression of srfA is also repressed by the RNA polymerase-binding protein Spx [122]. On the other hand, the Spx-RNA polymerase interaction is required for positive transcriptional control of genes in response to thiol-oxidative stress [123]. Furthermore, 4'-phosphopantetheinyl transferase (Sfp/PPTase) is required for the activation of SrfA enzymes by converting the inactive apo-forms of the T-domains to the active holo-forms [124]. An acyltransferase SrfA-Te is also required in the initial step of transferring a hydroxyl fatty acid to the first amino acid in the peptide. The surfactin self-resistance protein, YerP, is required for surfactin exportation [125].
Regulation of the fengycin/plipastatin genes is positively controlled by DegQ, an enhancer of extracellular protease production [126]. degQ is a pleiotropic regulatory gene that controls the production of several hydrolytic enzymes [127]. Production of plipastatin is severely reduced in the degQ mutant, but no significant change is observed in surfactin production. A Sfp-like protein, Lpa-8, is also required for plipastatin production in B. subtilis YB8 [128]. Overexpression of degQ in B. subtilis 168 expressing lpa-8 yields a 10-fold increase in plipastatin production [126]. Recently, transcription analysis of fen in B. subtilis F29-3 demonstrated that RNA polymerase binds to the Aand T-rich sequences, called the UP element, which is located upstream of the fen promoter [129].
Gene regulation of the iturin family was first demonstrated within mycosubtilin-producing B. subtilis ATCC6633. Expression of the myc operon is independent of ComA, but still seems to be regulated via quorum sensing, as PhrC strongly stimulates expression. The sigma H factor, Spo0H, also influences expression of the myc operon, and addition of PhrC to the culture medium compensates for loss of Spo0H expression. Finally, the transition state regulator AbrB represses expression of myc, as deletion of abrB results in increased myc expression [130]. Further information regarding the gene regulation was obtained from the study of the bmy operon produced using the B. amyloliquefaciens FZB42. Expression of bmy is dependent on a single sigma A factor-dependent promoter and is positively controlled by the small regulatory protein DegQ, similar to fen. Similar to srfA, the global regulators DegU and ComA are required for the full transcriptional activation of bmy. DegU plays a key role because it binds directly to two sites located upstream of the bmy promoter. Moreover, post-transcriptional regulation of bacillomycin production is also suggested for both DegU and a transmembrane protein, YczE [131]. Like other lipopeptide synthetases in Bacillus, the Sfp-like protein, Lpa-14, is also required for iturin production [132]. Mutation of the Sfp-encoding gene simultaneously prevents B. amyloliquefaciens FZB42 from producing bacillomycin, fengycin, and surfactin [131].

Gene Regulation in Pseudomonas
Similar to ComP/ComA in Bacillus, a two-component system has been identified as the master transcriptional regulation system in Pseudomonads (Figure 4). Typically, this system consists of a sensor kinase GacS and response regulator GacA. GacS was first described in P. syringae pv. syringae B728a as an essential factor for lesion manifestation. Meanwhile, GacA was first identified as a global activator of antibiotic and cyanide production in P. fluorescens CHA0 [133]. It is proposed that upon interaction with the signal(s), GacS is activated by autophosphorylation and then GacA acts as a phosphoryl acceptor. After trans-phosphorylation, GacA activates transcription of the regulatory gene, which in turn controls the expression of target genes [133]. Based on bacterial two-hybrid analysis, the entire GacA molecule is necessary for GacA interaction with itself or GacS [134]. The GacS/GacA system positively controls the expression of genes required for the synthesis of lipopeptides (syringomycin, amphisin, putisolvin, and entolysin) because mutation in either gene impairs lipopeptide production [65,[135][136][137][138]. A quorum-sensing system that triggers GacA/GacS phosphorylation during high cell density is essential for biosynthesis of the lipopeptide putisolvin [139]. In Gram-negative bacteria, the quorum-sensing system largely relies on the interaction of signaling molecule N-acyl homoserine lactones (AHLs) that are synthesized via LuxI protein with the transcriptional regulator LuxR. The quorum-sensing system in P. putida PCL1445 is composed of LuxI-homologous PpuI, LuxR-homologous PpuR, and RsaL. Expression of the genes ppuI and ppuR are required for the biosynthesis of AHLs, and mutation of these genes reduces putisolvin production. Meanwhile, overproduction of AHLs and putisolvin is observed in the rsaL mutant. This observation suggests that RsaL acts as a repressor of PpuI and PpuR. In contrast, biosynthesis of the lipopeptides massetolide, amphisin, and syringomycin is not regulated by AHL-based quorum sensing [48]. Downstream of the Gac system, the cognate transcriptional regulator LuxR regulates the production of several lipopeptides that bind to the operator of the NRPS genes.
LuxR protein contains a DNA-binding helix-turn-helix motif in its C-terminal region. In Pseudomonas sp. MIS38, the LuxR-type transcription factor, ArfF, positively controls transcription of the gene arf [140]. LuxR-type protein is also implicated in the biosynthesis of entolysin, putisolvin, and syringafactin, and mutation of this gene results in the loss of lipopeptide production [65,68,105,141]. Two types of LuxR-type proteins are involved in biosynthesis of syringomycin, syringopeptin, and viscosin. SalA and SyrF are the LuxR-type proteins responsible for the production of two lipopeptides, syringomycin and syringopeptin, in P. syringae pv. syringae B301D. SalA is suggested to control transcription of SyrF, an apparent homolog of ArfF, and SyrF binds and trans-activates the target NRPS promoter [142]. Mutation of two LuxR-type proteins in P. fluorescens SBW25, ViscAR and ViscBCR, results in the reduction of NRPS gene transcription and loss of viscosin production [143]. Recently, heat shock proteins were shown to regulate biosynthesis of putisolvin and arthrofactin [140,141]. Mutation in the gene encoding DnaK, a HSP70 class heat shock protein, impairs putisolvin production in P. putida PCL1445. Together with DnaJ, DnaK regulates putisolvin synthesis at low temperature. Elimination of arthrofactin synthesis was identified following mutation of the gene encoding HtpG, a HSP90 class heat shock protein. However, normal expression of the arthrofactin synthetase genes is retained in the HtpG mutant. Thus, HtpG appears to be involved in the proper folding of positive transcription factors or in the assembly of the NRPS complex. The serine protease ClpP regulates massetolide biosynthesis in P. fluorescens SS101 via LuxR transcriptional regulators, and expression of ClpP is independent of regulation by the GacS/GacA system [144]. In contrast to surfactin synthesis, the chaperon subunit ClpX is not involved in the production of massetolide, and its gene is transcribed independently of clpP. Random mutagenesis in Pseudomonas MIS38 also led to identification of SpoT as a new regulator of arthrofactin synthesis [140]. Mutation in the SpoT-encoding gene prevents MIS38 from producing arthrofactin. SpoT is a (p)ppGpp synthetase/hydrolase responsible for cellular metabolism during nutritional starvation [145]. Epistasis analysis revealed that spoT positively regulates arthrofactin biosynthesis through arfF and arfB. Post-translational modification of NRPS in pseudomonads is also catalyzed by PPTase [146][147][148].
Exportation of lipopeptide in pseudomonads requires ATP-binding cassette (ABC) transporter systems, and the ABC transporter genes are then clustered together with a synthetase gene [48,65,68,98,105]. Exportation of syringomycin and syringopeptin requires two transporter systems known as SyrD and PseABC. SyrD is proposed to function as an ATP-driven efflux pump. Only trace quantities of both lipopeptides are produced by the syrD mutant, and the cells show significantly lower virulence [149,150]. On the other hand, the tripartite ABC-type efflux transporter PseABC is homologous to the resistance-nodulation-cell division, RND, efflux system. Mutation in each of the genes pseABC results in a 40 to 60% decrease in both syringomycin and syringopeptin production [151]. The transporter system for arthrofactin was recently characterized in Pseudomonas MIS38 [152]. The genes encoding a putative periplasmic protein (ArfD) and a putative ABC transporter (ArfE) are located downstream of the arf gene cluster. Arthrofactin production is temporarily reduced in both mutants, but it eventually reached a similar level to that of MIS38 after 12 h cultivation. Furthermore, exportation of arthrofactin is almost completely blocked by ABC transporter inhibitors. This suggests that multiple ABC transporter systems can export arthrofactin and that accumulation of arthrofactin is toxic to the cells. Two genes that are predicted to encode homologs of the ABC transporters MacA and MacB in E. coli are located downstream of the putisolvin and entolysin synthetase genes [153]. Analysis of macA and macB mutations in P. putida PCL1445 shows a 70% decrease in putisolvin production [105]. Meanwhile, macA and macB mutations in P. entomophila result in almost a complete loss of entolysin production [65]. Therefore, we hypothesize that multiple ABC transporter systems generally play a role in transportation of LPBSs in pseudomonads.

Genetic Engineering of NRPS to Create Novel Products
A large number of LPBSs are synthesized by NRPSs that share similar modular architecture of their repetitive catalytic units and a similar assembly-line mechanism. However, recent genome analysis of the lipopeptide synthetases from both Bacillus and Pseudomonas strains revealed novel NRPS architecture that does not completely conform to the co-linearity rule, which encompasses multifunctional hybrid enzymes that can directly incorporate D-amino acid residues. In addition, most of the recently identified lipopeptide synthetases in Pseudomonas spp lack the cognate E-domains, but contain a dual C/E domain and also feature tandem Te-domains. These variations demonstrate natural versatility in evolving complex pathways for lipopeptide biosynthesis and may allow for artificial alteration of the protein template with the aim of reprogramming it to create novel compounds with improved properties.
Recently, new strategies for engineering NRPSs have been developed. The first successful report examined exchanging or replacing the minimal module (A-and T-domains) with the last Leu 7 module within SrfA-C, a single-module with the simplest structure [14]. This module was deleted and replaced with several bacterial and fungal A/T-domains by homologous recombination. Construction of the gene encoding hybrid SrfA leads to the production of surfactin variants that retain their activity (Table 3). Using the same strategy, A/T-domain exchange was applied to multi-modular SrfA. Swapping of the Leu 2 module results in an altered product whose peptide chain is shorter than that of wild-type [154]. However, this minimal module replacement resulted in a very low yield of variants. Therefore, Yakimov and coworkers (2000) developed a whole-module replacement within SrfA. A complete set of C/A/T-domains of the lichenysin A synthetase Gln 1 module was introduced into the Glu 1 module of SrfA. An altered product, surfactin [Gln 1 ], was produced by the recombinant B. subtilis at a level similar to that of wild-type surfactin and it also exhibits stronger surface activity than surfactin [155]. The SrfA could also be modified by translocation of the C-terminal Te-domain to the end of the internal domains, thus resulting in new linear surfactin analogs [156]. Furthermore, the use of recombinant Te-domain to catalyze regiospecific cyclization of synthetic peptides or lipopeptides in vitro is a powerful approach for generating libraries of novel compounds with improved properties [157,158]. Another powerful approach for the genetic manipulation of NRPS templates is the directed mutation of the substrate specificity within the A-domain based on its specificity-conferring code [159]. Specificity-conferring codes of the Asp 5 module within SrfA were adapted for the recognition of Asn. The engineered B. subtilis produces the new lipoheptapeptide, surfactin [Asn 5 ] [13]. Along the same line, directed mutation of the Asp 7 A-domain to Asn 7 within CDA synthetase from Streptomyces coelicolar leads to production of the expected lipoundecapeptide containing Asn 7 and an unexpected linear lipohexapeptide intermediate [160]. A reduction or increase in the number of peptide residues is an alternate approach to generate structural diversity of lipopeptides and glycopeptides. Deletion of an entire Leu 2 module in SrfA causes secretion of the predicted surfactin variant with a smaller cyclic peptide. Furthermore, a novel lipohexapeptide is produced with a significantly high yield and enhanced antimicrobial activities [161,162]. Amino acid insertion by module extension has been performed within the vancomycin-type glycopeptide antibiotic, balhimycin. Insertion of an entire D-hydroxyphenylglycine module into the balhimycin assembly line between modules 4 and 5 results in an elongated octapeptide product [163]. C-terminally truncated hexa-to di-peptide metabolites are also detected. An alternative approach was examined based on the function of communication-mediating (COM) domains that play an important role in the intermolecular communication within the NRPS system [164]. Swapping of COM domains have been exploited within SrfA and allow for biosynthesis of surfactin variants with different peptide chain lengths [165]. Using plasmid and transposon mutation of the srfA operon at specific and random positions, various intermediates from lipodipeptides to lipohexapeptides were identified by whole-cell MALDI-TOF MS analysis [166]. Although several NRPSs involved in lipopeptide production in pseudomonads have also been characterized, there is limited information available for rational modification of their NRPS modules in vivo. Utilizing a complementation method, Ackerley and Lamont (2004) examined the possibility for engineering pyoverdine synthetase (PvdD) in P. aeruginosa PAO1 to generate novel pyoverdine analogs. Introduction of the Thr-incorporating module from other species into the pvdD mutant restores pyoverdine production [167]. Recently, the researchers at Cubist Pharmaceuticals, Inc. have shown how these approaches can also be used for the production of lipopeptide variants by Streptomyces spp. with improved antibiotic activities [168,169]. Table 3. Primary structure of engineered LPBSs produced by B. subtilis.

Conclusion
Gram-positive Bacillus and Gram-negative Pseudomonas strains produce a variety of lipopeptides with remarkable surface and biological activities. In contrast to the structural diversity of these lipopeptides, their biosynthetic mechanism is basically conserved. They are synthesized nonribosomally by a mega-peptide synthetase unit, NRPS, which is composed of several cooperating multifunctional modules, each capable of performing one cycle of peptide elongation. To become an active form, they are post-translationally modified by a PPTase and properly assembled by DnaJ/K and HtpG proteins. However, recent analysis of the lipopeptide synthetases suggests that there are several variants of NRPS architecture. Modification of NRPS by genetic engineering of the encoding genes is a promising method to produce useful variants. Accumulation of genetic information for lipopeptide synthetases should contribute to design biosurfactants with higher surface activity and/or novel features. Moreover, understanding of their biosynthetic pathways and genetic regulation mechanisms will facilitate not only uncovering the evolution of nonribosomal peptide synthesis mechanisms, but also the development of cost-effective methods for large-scale production of useful lipopeptides.