Nostoc edaphicum CCNP1411 from the Baltic Sea—A New Producer of Nostocyclopeptides

Nostocyclopeptides (Ncps) constitute a small class of nonribosomal peptides, exclusively produced by cyanobacteria of the genus Nostoc. The peptides inhibit the organic anion transporters, OATP1B3 and OATP1B1, and prevent the transport of the toxic microcystins and nodularin into hepatocytes. So far, only three structural analogues, Ncp-A1, Ncp-A2 and Ncp-M1, and their linear forms were identified in Nostoc strains as naturally produced cyanometabolites. In the current work, the whole genome sequence of the new Ncps producer, N. edaphicum CCNP1411 from the Baltic Sea, has been determined. The genome consists of the circular chromosome (7,733,505 bps) and five circular plasmids (from 44.5 kb to 264.8 kb). The nostocyclopeptide biosynthetic gene cluster (located between positions 7,609,981–7,643,289 bps of the chromosome) has been identified and characterized in silico. The LC-MS/MS analyzes of N. edaphicum CCNP1411 cell extracts prepared in aqueous methanol revealed several products of the genes. Besides the known peptides, Ncp-A1 and Ncp-A2, six other compounds putatively characterized as new noctocyclopeptide analogues were detected. This includes Ncp-E1 and E2 and their linear forms (Ncp-E1-L and E2-L), a cyclic Ncp-E3 and a linear Ncp-E4-L. Regardless of the extraction conditions, the cell contents of the linear nostocyclopeptides were found to be higher than the cyclic ones, suggesting a slow rate of the macrocyclization process.


Introduction
Secondary metabolites produced by cyanobacteria of the genus Nostoc (Nostocales) are characterized by a high variety of structures and biological activities [1][2][3][4][5][6]. On the basis of chemical structure, these compounds are mainly classified to peptides, polyketides, lipids, polysaccharides and alkaloids [7]. Abundantly produced cyanopeptides with anticancer, antimicrobial, antiviral and enzyme-inhibiting activity, have attracted attention of many research groups [6,[8][9][10][11][12]. Some of the metabolites, such as nostocyclopeptides (Ncps) or cryptophycins are exclusively produced by the cyanobacteria of the genus Nostoc ( Figure 1A). Ncps constitute a small class of nonribosomal peptides. Thus far, only three analogues of the compounds and their linear forms have been discovered. This includes Ncp-A1 and Ncp-A2 detected in Nostoc sp. ATCC53789 isolated from a lichen collected at Arron Island in Scotland [13]. The same peptides were detected in Nostoc sp. ASN_M, isolated The activity of Ncps have been explored [13] and their potential as antitoxins, inhibiting the transport of hepatotoxic microcystin-LR and nodularin into the rat hepatocytes through the organic anion transporter polypeptides OATP1B1/1B3 was revealed [21]. As OATP1B3 is overexpressed in some malignant tumors (e.g., colon carcinomas) [22], Ncps, as inhibitors of this transporter protein, are suggested to be promising lead compounds for new drug development.
In our previous studies, Nostoc edaphicum CCNP1411 ( Figure 1A) from the Baltic Sea was found to be a rich source of cyanopeptolins, the nonribosomal peptides with potent inhibitory activity against serine proteases [6]. In the current work, the potential of the strain to produce other bioactive metabolites was explored. The whole-genome sequence of N. edaphicum CCNP1411 has been determined, and the nostocyclopeptide biosynthetic gene cluster has been identified in the strain and characterized in silico for the first time. Furthermore, the new products of the Ncp gene cluster have been detected and their structures have been characterized by LC-MS/MS.

Analysis of N. edaphicum CCNP1411 Genome
Total DNA has been isolated from N. edaphicum CCNP1411, and the whole genome sequence has been determined. Identified replicons of N. edaphicum CCNP1411 genome consist of the circular chromosome of 7,733,505 bps, and five circular plasmids (Table 1). Within the total size of 8,316,316 bps genome (chromosome and plasmids, Figure 2), we have distinguished, according to annotation, the total number of 6957 genes from which 6458 potentially code for proteins (CDSs), 415 are Ncps are composed of seven residues and a unique imino linkage formed between C-terminal aldehyde and an N-terminal amine group of the conserved Tyr 1 ( Figure 1B) [13,16]. The presence of modified amino acid residues, e.g., 4-methylproline, homoserine and D-configured glutamine, indicated the nonribosomal biosynthetic pathway of the molecules. Genetic analysis of Nostoc sp. ATCC53789 revealed the presence of the 33-kb Ncp gene cluster composed of two genes, ncpA and ncpB, encoding NcpA1-A3 and NcpB1-B4 modules. These proteins catalyze the activation and incorporation of Tyr, Gly, Gln, Ile and Ser into the Ncp structure [17]. They show high similarity to NosE and NosF which take part in the biosynthesis of nostopeptolides in Nostoc sp. GSV224 [18]. The ncpFGCDE fragment of the Ncp gene cluster is involved in the synthesis of MePro (ncpCDE), transport (ncpF) and proteolysis (ncpG) of the peptides. The characteristic features of the Ncp enzymatic complex in Nostoc sp. ATCC53789 is the presence of the epimerase domain (NcpA3) responsible for D-configuration of glutamine, and the unique reductase domain at C-terminal end of NcpB which catalyze the reductive release of a linear peptide aldehyde [19,20].
The activity of Ncps have been explored [13] and their potential as antitoxins, inhibiting the transport of hepatotoxic microcystin-LR and nodularin into the rat hepatocytes through the organic anion transporter polypeptides OATP1B1/1B3 was revealed [21]. As OATP1B3 is overexpressed in some malignant tumors (e.g., colon carcinomas) [22], Ncps, as inhibitors of this transporter protein, are suggested to be promising lead compounds for new drug development.
In our previous studies, Nostoc edaphicum CCNP1411 ( Figure 1A) from the Baltic Sea was found to be a rich source of cyanopeptolins, the nonribosomal peptides with potent inhibitory activity against serine proteases [6]. In the current work, the potential of the strain to produce other bioactive metabolites was explored. The whole-genome sequence of N. edaphicum CCNP1411 has been determined, and the nostocyclopeptide biosynthetic gene cluster has been identified in the strain and characterized in silico for the first time. Furthermore, the new products of the Ncp gene cluster have been detected and their structures have been characterized by LC-MS/MS.

Analysis of N. edaphicum CCNP1411 Genome
Total DNA has been isolated from N. edaphicum CCNP1411, and the whole genome sequence has been determined. Identified replicons of N. edaphicum CCNP1411 genome consist of the circular chromosome of 7,733,505 bps, and five circular plasmids (Table 1). Within the total size of 8,316,316 bps genome (chromosome and plasmids, Figure 2), we have distinguished, according to annotation, the total number of 6957 genes from which 6458 potentially code for proteins (CDSs), 415 are classified as pseudo-genes and 84 are coding for non-translatable RNA molecules. Pseudo-genes can be divided into subcategories due to the shift in the coding frame (180), internal stop codons (77), incomplete sequence (228), or occurrence of multiple problems (63). Genes coding for functional RNAs consist of those encoding ribosomal (rRNA) (9), transporting (tRNA) (71) and regulatory noncoding (ncRNA) (4), all embedded on the chromosome. Out of total coding and pseudo-genes sequences (6873), the vast majority (5846) initiates with the ATG start codon, while GTG and TTG occur less frequently (561 and 217 times, respectively). The frequencies of stop codons were set out as follows: TAA (3455), TAG (1750), TGA (1526). Coding and pseudo-genes sequences are distributed almost equally on the leading and complementary strand, including 3408 and 3465 sequences, respectively.

Non-Ribosomal Peptide Synthetase (NRPS) Gene Cluster of Nostocyclopeptides
Having the whole genome sequence of N. edaphicum CCNP1411, we have analyzed in detail the non-ribosomal peptide synthetase (NRPS) cluster, containing potential genes coding for enzymes involved in the synthesis of nostocyclopeptides. To establish correct spans for non-ribosomal peptide synthetases, 35 complete nucleotide sequence clusters derived from Cyanobacteria phylum were aligned resulting in hits scattered around positions 2,287,143-2,323,617 and 7,609,981-7,643,289 within the N. edaphicum CCNP1411 chromosome (7.7 Mbp) ( Figure 2). This method of characterization presented the overall similarity of selected spans to micropeptin (cyanopeptolin) biosynthetic gene cluster [23] and nostocyclopeptide biosynthetic gene cluster [17], respectively. To confirm these results, the antiSMASH analysis was employed resulting in confirmation of previously defined NRPS spans and adding two more regions 1,213,069-1,258,319 and 5,735,625-5,780,238, to small extent (12% and 30%, respectively) similar to anabaenopeptin gene cluster [24]. For the purpose of this study, we focused on putative nostocyclopeptide producing non-ribosomal peptide synthetase. Annotation of the selected region revealed nine putative open reading frames (ORFs), transcribed in reverse (7) and forward (2) direction. The identified cluster was arranged in a similar fashion to AY167420.1 (nostocyclopeptide biosynthetic gene cluster from Nostoc sp. ATCC 53789), with the exception of two ORFs (>170 bp), intersecting operon (ncpFGCDE) putatively encoding proteins involved in MePro assembly, efflux and hydrolysis of products of the second putative operon ncpAB ( Figure 3).

Non-Ribosomal Peptide Synthetase (NRPS) Gene Cluster of Nostocyclopeptides
Having the whole genome sequence of N. edaphicum CCNP1411, we have analyzed in detail the non-ribosomal peptide synthetase (NRPS) cluster, containing potential genes coding for enzymes involved in the synthesis of nostocyclopeptides. To establish correct spans for non-ribosomal peptide synthetases, 35 complete nucleotide sequence clusters derived from Cyanobacteria phylum were aligned resulting in hits scattered around positions 2,287,143-2,323,617 and 7,609,981-7,643,289 within the N. edaphicum CCNP1411 chromosome (7.7 Mbp) ( Figure 2). This method of characterization presented the overall similarity of selected spans to micropeptin (cyanopeptolin) biosynthetic gene cluster [23] and nostocyclopeptide biosynthetic gene cluster [17], respectively. To confirm these results, the antiSMASH analysis was employed resulting in confirmation of previously defined NRPS spans and adding two more regions 1,213,069-1,258,319 and 5,735,625-5,780,238, to small extent (12% and 30%, respectively) similar to anabaenopeptin gene cluster [24]. For the purpose of this study, we focused on putative nostocyclopeptide producing non-ribosomal peptide synthetase. Annotation of the selected region revealed nine putative open reading frames (ORFs), transcribed in reverse (7) and forward (2) direction. The identified cluster was arranged in a similar fashion to AY167420.1 (nostocyclopeptide biosynthetic gene cluster from Nostoc sp. ATCC 53789), with the exception of two ORFs (>170 bp), intersecting operon (ncpFGCDE) putatively encoding proteins involved in MePro assembly, efflux and hydrolysis of products of the second putative operon ncpAB ( Figure 3). Two sequences ORF1 (HUN01_34350) (837 bp) and ORF2 (HUN01_34355) (1107 bp) embedded on 3' end of nostocyclopeptide gene cluster resemble nosF and nosE genes, found in the nostopeptolide (nos) gene cluster [18] with 96% nucleotide sequence identities in both instances, putatively encoding for zinc-dependent long-chain dehydrogenase and a Δ1-pyrroline-5-carboxylic acid reductase. Further upstream, there is an ORF3 (HUN01_34360) (798 bp) of 98% homology to unknown gene from AF204805.2 gene cluster, suggested previously to be involved in 4methylproline biosynthesis [17,25], due to close proximity of downstream genes encompassing this reaction, but no experimental evidence was presented. Alignment of the sequence of this putative Two sequences ORF1 (HUN01_34350) (837 bp) and ORF2 (HUN01_34355) (1107 bp) embedded on 3 end of nostocyclopeptide gene cluster resemble nosF and nosE genes, found in the nostopeptolide (nos) gene cluster [18] with 96% nucleotide sequence identities in both instances, putatively encoding for zinc-dependent long-chain dehydrogenase and a ∆1-pyrroline-5-carboxylic acid reductase. Further upstream, there is an ORF3 (HUN01_34360) (798 bp) of 98% homology to unknown gene from AF204805.2 gene cluster, suggested previously to be involved in 4-methylproline biosynthesis [17,25], due to close proximity of downstream genes encompassing this reaction, but no experimental evidence was presented. Alignment of the sequence of this putative protein have shown a sequence homology, to some extent, to 4 -phosphopantetheinyl transferase, crucial for PCP aminoacyl substrate binding ( Figure 4) [26]. Moreover, partially present adenylate-forming domain within ORF4 (HUN01_34365) (165 bp) belongs to the acyl-and aryl-CoA ligases family, and may putatively engage substrate for post-translational modification of the PCP domain. Facing the same direction, an ORF5 (HUN01_34370) (1605 bp)-bearing putative domain classified as transpeptidase superfamily DD-carboxypeptidase and ORF6 (HUN01_34375) (2010 bp) homologous to ABC transporter ATP-binding protein/permease may be engaged in ncpAB peptide product transport [27]. Neither the ORF7 (HUN01_34780) Shine-Delgarno (SD) sequence upfront translation start codon could be assigned nor the TA-like signal~12 nucleotides upstream could be found. protein have shown a sequence homology, to some extent, to 4'-phosphopantetheinyl transferase, crucial for PCP aminoacyl substrate binding ( Figure 4) [26]. Moreover, partially present adenylateforming domain within ORF4 (HUN01_34365) (165 bp) belongs to the acyl-and aryl-CoA ligases family, and may putatively engage substrate for post-translational modification of the PCP domain. Facing the same direction, an ORF5 (HUN01_34370) (1605 bp)-bearing putative domain classified as transpeptidase superfamily DD-carboxypeptidase and ORF6 (HUN01_34375) (2010 bp) homologous to ABC transporter ATP-binding protein/permease may be engaged in ncpAB peptide product transport [27]. Neither the ORF7 (HUN01_34780) Shine-Delgarno (SD) sequence upfront translation start codon could be assigned nor the TA-like signal ~12 nucleotides upstream could be found. The main part of the Ncp biosynthetic gene cluster is located on the forward strand comprising two large genes which nucleotide sequences are homologous over 80% to ncpA and ncpB subunits of the ncp cluster in Nostoc sp. ATCC53789 [17]. Both these genes code for proteins consisting of repetitive modules incorporating single residue into elongating peptide. ORF 8 (HUN01_34785) (11,334 bp) encompasses three of these modules, whereas ORF 9 (HUN01_34380) (14,157 bp) encodes four modules. The core of one NRPS module consists of three succeeding domains: condensation (C), adenylation (A) and peptidyl carrier protein (PCP). Moreover, adjacent to coding spans of extreme modules, two tailoring domains were found within ORF8 and ORF9 genes ( Figure 5).  The main part of the Ncp biosynthetic gene cluster is located on the forward strand comprising two large genes which nucleotide sequences are homologous over 80% to ncpA and ncpB subunits of the ncp cluster in Nostoc sp. ATCC53789 [17]. Both these genes code for proteins consisting of repetitive modules incorporating single residue into elongating peptide. ORF 8 (HUN01_34785) (11,334 bp) encompasses three of these modules, whereas ORF 9 (HUN01_34380) (14,157 bp) encodes four modules. The core of one NRPS module consists of three succeeding domains: condensation (C), adenylation (A) and peptidyl carrier protein (PCP). Moreover, adjacent to coding spans of extreme modules, two tailoring domains were found within ORF8 and ORF9 genes ( Figure 5). protein have shown a sequence homology, to some extent, to 4'-phosphopantetheinyl transferase, crucial for PCP aminoacyl substrate binding ( Figure 4) [26]. Moreover, partially present adenylateforming domain within ORF4 (HUN01_34365) (165 bp) belongs to the acyl-and aryl-CoA ligases family, and may putatively engage substrate for post-translational modification of the PCP domain. Facing the same direction, an ORF5 (HUN01_34370) (1605 bp)-bearing putative domain classified as transpeptidase superfamily DD-carboxypeptidase and ORF6 (HUN01_34375) (2010 bp) homologous to ABC transporter ATP-binding protein/permease may be engaged in ncpAB peptide product transport [27]. Neither the ORF7 (HUN01_34780) Shine-Delgarno (SD) sequence upfront translation start codon could be assigned nor the TA-like signal ~12 nucleotides upstream could be found. The main part of the Ncp biosynthetic gene cluster is located on the forward strand comprising two large genes which nucleotide sequences are homologous over 80% to ncpA and ncpB subunits of the ncp cluster in Nostoc sp. ATCC53789 [17]. Both these genes code for proteins consisting of repetitive modules incorporating single residue into elongating peptide. ORF 8 (HUN01_34785) (11,334 bp) encompasses three of these modules, whereas ORF 9 (HUN01_34380) (14,157 bp) encodes four modules. The core of one NRPS module consists of three succeeding domains: condensation (C), adenylation (A) and peptidyl carrier protein (PCP). Moreover, adjacent to coding spans of extreme modules, two tailoring domains were found within ORF8 and ORF9 genes ( Figure 5).  Alignment of nucleotide sequences to the ncpAB operon revealed major differences in consecutive NcpB3 and NcpB4 modules. Utilizing the selected spans conjoined with conserved domain search allowed us to distinguish and compare C, A and PCP amino-acid sequences ( Figure 6). Intrinsic modules of NRPS, with an exception of NcpB3 adenylation domain sequence, were found homologous Mar. Drugs 2020, 18, 442 6 of 18 above 91%, whereas extremes have shown the biggest composition differences ranging from 13-15% to 24% in the NcpB4 adenylation domain ( Figure 6). Alignment of nucleotide sequences to the ncpAB operon revealed major differences in consecutive NcpB3 and NcpB4 modules. Utilizing the selected spans conjoined with conserved domain search allowed us to distinguish and compare C, A and PCP amino-acid sequences ( Figure  6). Intrinsic modules of NRPS, with an exception of NcpB3 adenylation domain sequence, were found homologous above 91%, whereas extremes have shown the biggest composition differences ranging from 13-15% to 24% in the NcpB4 adenylation domain ( Figure 6). The determination of the whole genome sequence of N. edaphicum CCNP1411 allowed us to perform analyses of genes coding for enzymes involved in the synthesis of nostocyclopeptides. The general analysis demonstrated homology of the NRPS/PKS clusters of N. edaphicum CCNP1411 to systems occurring in other cyanobacteria, however, with some differences. The non-ribosomal consensus code [28,29] allowed to recognize and predict the substrate specificities of NRPS adenylation domains: tyrosine (NcpA1), glycine (NcpA2), glutamine (NcpA3) for NcpA and isoleucine/valine (NcpB1) serine (NcpB2) 4-methylproline/proline (NcpB3) phenyloalanine/leucin/tyrosine (NcpB4) for NcpB (Table 2). This prediction was found to be in line with the structures of the Ncps detected in N. edaphicum CCNP1411.  The determination of the whole genome sequence of N. edaphicum CCNP1411 allowed us to perform analyses of genes coding for enzymes involved in the synthesis of nostocyclopeptides. The general analysis demonstrated homology of the NRPS/PKS clusters of N. edaphicum CCNP1411 to systems occurring in other cyanobacteria, however, with some differences. The non-ribosomal consensus code [28,29] allowed to recognize and predict the substrate specificities of NRPS adenylation domains: tyrosine (NcpA1), glycine (NcpA2), glutamine (NcpA3) for NcpA and isoleucine/valine (NcpB1) serine (NcpB2) 4-methylproline/proline (NcpB3) phenyloalanine/leucin/tyrosine (NcpB4) for NcpB (Table 2). This prediction was found to be in line with the structures of the Ncps detected in N. edaphicum CCNP1411.
To devolve elongating product onto subsequent condensation domain, the studied synthetase utilizes PCP domains, subunits responsible for thiolation of nascent peptide intermediates, where post-transcriptional modification of conserved serine residue shifts the state of the domain from inactive holo to active apo. Modification of this residue is related to PPTase which transfers Mar. Drugs 2020, 18, 442 7 of 18 a covalently-bound 4 -phosphopantetheine arm of CoA onto the PCP active site, enabling peptide intermediates to bind as reactive thioesters. Case residue which undergoes a nucleophilic attack by the hydroxyl group was conserved in every module within the PCP domain predicted at the front of the second helix [30].
The stand-alone docking domain (D) (7,617,812-7,617,964 bp) found on N-terminus of NcpA may be an essential component mediating interactions, recognition and specific association within NRPS subunits. The potential acceptor domain, based on sequence homology of conserved residues to C-terminal communication-mediating donor domains (COM), was found at the NcpB4 PCP domain second helix, encompassing conserved serine residue within potential binding sequence [31]. Moreover, this communication-mediating domain may putatively bind to C-terminus of NcpB3 and NcpB4 condensation (C) domains based on conserved motif LLEGIV, found by sequence homology to last five amino-acids of C-terminal docking domains residues, key for their interactions [32]. Within the same β-hairpin, a group of charged residues (ExxxxxKxR) putatively determines the binding affinity of the N-terminal domain [33].
Two tailoring domains encoded at the 5 ends of ncpA and ncpB genes were classified as epimerization (E) (7,627,742-7,629,043 bp) domain and reductase (R) (7,642,183-7,643,238 bp) domain, accordingly. Epimerization domain catalyzes the conversion of L-amino acids to D-amino acids, a reaction coherent with D-stereochemistry of the peptide glutamine residue, where His of the conserved HHxxxDG motif and Glu from the upstream EGHGRE motif raceB comprise an epimerisation reaction active site [34]. Homologous HHxxxDG conserved motif sequence is found in condensation domains (C), where a similar reaction is catalyzed within peptide bond formation, putatively by the second His residue [35]. As in ncp cluster [17], module NcpA1 motif includes degenerate sequences in two positions HQIVGDL with leucine instead of phenylalanine residue at the start of the helix. The second histidine site-directed mutagenesis abolished enzymatic activity which might suggest that NcpA1 condensation domain is inactive [36].
Reductase domain (R) found at the C-terminus of NRPS was classified as oxidoreductase. Despite 15% discrepancy in domain composition compared to NcpB core catalytic triad Thr-Tyr-Lys and Rossmann-fold, a NAD (P) H nucleotide-binding motif GxxGxxG positions were not affected. The mechanism driving this chain release utilizes NAD (P) H cofactor for redox reaction of the final moiety of the nascent peptide to aldehyde or alcohol [37,38].

Structure Characterization of Ncps Produced by N. edaphicum CCNP1411
Thus far, only three Ncps, Ncp-A1, A2 and M1, and their linear aldehydes were isolated as pure natural products of Nostoc strains [13,16]. Ncp-A3, with MePhe in the C-terminal position, was obtained through aberrant biosynthesis in the Nostoc sp. ATCC53789 culture supplemented with MePhe [13]. The linear aldehydes of Ncp-A1 and Ncp-A2, with Pro instead of MePro, were chemically synthesized and used to study the Ncps epimerization and macrocyclization equilibria [19,20]. In our work, ten Ncps, differing mainly in position 4 and 7, were detected by LC-MS/MS in the N. edaphicum CCNP1411 cell extract (Table 3, Figure 1, Figure 7, Figure 8 and Figure S1-S7). These include five cyclic structures, four linear Ncp aldehydes, and one linear hexapeptide Ncp. The putative structures of the six peptides, which were found to be naturally produced by Nostoc for the first time, are marked in Table 3 in bold (Ncp-E1, Ncp-E1-L, Ncp-E2, Ncp-E2-L, Ncp-E3 and Ncp-E4-L). In addition to the heptapeptide Ncps, N. edaphicum CCNP1411 produces a small amount of the linear hexapeptide, Ncp-E4-L, whose putative structure is Tyr+Gly+Gln+Ile + Ser+MePro (Table 3, Figure 9). This Ncp was detected only when higher biomass of Nostoc was extracted. As the proposed amino acids sequence in this molecule is the same as the sequence of the first six residues in Ncp-A1 and Ncp-A2, the hexapeptide can be a precursor of the two Ncps. The other option is that the cell concentration of the Ncps is self-regulated and the Ncp-E4-L is released through proteolytic cleavage of the final products. This hypothesis could be verified when the role of the Ncps for the producer is discovered. In the ncp gene cluster, the presence of ncpG encoding the NcpG peptidase, with high homology to enzymes hydrolyzing D-amino acid-containing peptides was revealed by Becker et al. [17] and also confirmed in this study. Therefore, the in-cell degradation of Ncps by the NcpG peptidase is possible, but it probably proceeds at D-Gln and gives other products than Ncp-E4-L.   The process of de novo structure elucidation was performed manually, based mainly on a series of b and y fragment ions produced by a cleavage of the peptide bonds (Figures 7-9, Figures S1-S7), and on the presence of immonium ions (e.g., m/z 70 for Pro, 84 for MePro, 136 for Tyr) in the product ion mass spectra of the peptides. The process of structure characterization was additionally supported by the previously published MS/MS spectra of Ncps [14]. The fragment ions that derived from the two amino acids in C-terminus usually belonged to the most intensive ions in the spectra and in this study they facilitated the structure characterization. For example, in the product ion mass spectrum of Ncp-A1 ( Figure S1) and Ncp-E3 ( Figure S7), ions at m/z 209 [MePro+Leu+H] and m/z 181 [MePro+Leu+H−CO] were present, while in the spectrum of Ncp-E2 ( Figure S5) with Pro (instead MePro), the corresponding ions at 14 unit lower m/z values, i.e., 195 and 167 were observed. The spectra of the linear Ncps contained the intensive Tyr immonium ion at m/z 136. Based on the previously determined structures of Ncp-A1 and Ncp-A2 [13], we assumed that in Ncp-E2, the amino acids in position 4 and 7, are Ile and Leu, respectively (Table 3; Figure S5). These two amino acids are difficult to distinguish by MS. Definitely, the NMR analyses are required to confirm the structures of the Ncps. The presence of Val in position 4, instead of Ile, distinguishes the Ncp-E3 from other Ncps produced by N. edaphicum CCNP1411. As it was previously reported [17], and also confirmed in this study, the predicted substrates of the NcpB1 protein encoded by ncpB and involved in the incorporation of the residue in position 4 are Ile/Leu and Val. However, the domain preferentially activates Ile, which explains why only traces of Val-containing Ncps were detected in N. edaphicum CCNP1411 (   Methylated Pro (MePro) in position 6 is quite conserved. MePro is a rare non-proteinogenic amino acid biosynthesized from Leu through the activity of the zinc-dependent long chain dehydrogenases and ∆ 1 -pyrroline-5-carboxylic acid (P5C) reductase homologue encoded by the gene cassette ncpCDE [17,18,25]. The genes involved in the biosynthesis of MePro were found in 30 of the 116 tested cyanobacterial strains, majority (80%) of which belonged to the genus Nostoc [39]. The new Ncp-E1 and Ncp-E2, detected at trace amounts, are the only Ncps produced by N. edaphicum CCNP1411 which contain Pro ( Table 3). The presence of m/z 84 ion in the fragmentation spectra of the two Ncps complicated the process of de novo structure elucidation. This ion corresponds to the immonium ion of MePro and could indicate the presence of this residue. However, the two ions m/z 101 and 129, which together with ion at m/z 84, are characteristic of Gln, suggested the presence of this amino acid in Ncp-E1 and Ncp-E2. The detailed characterization of Ncp fragmentation pathways is presented in Figures 7-9 and in Supplementary Materials (Figures S1-S7).    Figure 9.

Production of Ncps by N. edaphicum CCNP1411
Enhanced product ion mass spectrum of the cyclic nostocyclopeptide  Figure 1.
In addition to the heptapeptide Ncps, N. edaphicum CCNP1411 produces a small amount of the linear hexapeptide, Ncp-E4-L, whose putative structure is Tyr+Gly+Gln+Ile + Ser+MePro (Table 3, Figure 9). This Ncp was detected only when higher biomass of Nostoc was extracted. As the proposed amino acids sequence in this molecule is the same as the sequence of the first six residues in Ncp-A1 and Ncp-A2, the hexapeptide can be a precursor of the two Ncps. The other option is that the cell concentration of the Ncps is self-regulated and the Ncp-E4-L is released through proteolytic cleavage of the final products. This hypothesis could be verified when the role of the Ncps for the producer is discovered. In the ncp gene cluster, the presence of ncpG encoding the NcpG peptidase, with high homology to enzymes hydrolyzing D-amino acid-containing peptides was revealed by Becker et al. [17] and also confirmed in this study. Therefore, the in-cell degradation of Ncps by the NcpG peptidase is possible, but it probably proceeds at D-Gln and gives other products than Ncp-E4-L.

Production of Ncps by N. edaphicum CCNP1411
Apart from the structural analysis, we also made attempts to determine the relative amounts of the individual Ncps produced by N. edaphicum CCNP1411. To exclude the effect of the extraction procedure on the amounts of the detected peptides, different solvents and pH were applied. As the process of Ncp linearization during long storage of the freeze-dried material was suggested [16], both the fresh and lyophilized biomasses were analyzed. Regardless of the extraction procedure, Ncp-A2-L with Phe in C-terminus was always found to be the main Ncp analogue ( Figure 10A-D). In addition, when MePro and Pro-containing peptide were compared separately, the peak intensity of the linear Ncps with Phe in C-terminus (i.e., Ncp-A2-L and Ncp-E1-L) was higher than the Ncps with Leu. These results might indicate preferential incorporation of Phe into the synthesized peptide chain. these two residues are present in all detected Ncps, then, probably other elements of the structure affect the cyclization process, as well. We hypothesize that due to the steric hindrances, the cyclization of Ncp-A1 with Leu in C-terminal position is easier than the cyclization of Ncp-A2 with Phe. As a consequence, the proportion of the cyclic Leu-containing Ncp-A1 to the linear form of the peptide is higher. Thus far, Ncps synthesis was reported in few Nostoc strains, and the structural diversity of the peptides was found to be low. Other classes of NRPs were detected in cyanobacteria representing different orders and genera, and within one class of the peptides numerous analogues were detected. The study also showed that the cell contents of the linear Ncps are higher than the cyclic ones. (Figure 10A-D and Figure S8). The release of Ncps from the synthetase as linear aldehydes is catalyzed by a reductase domain, located in the C-terminal part of the NcpB [17]. This reductive release triggers the spontaneous, and enzyme independent, macrocyclization of the linear peptide [19,20]. The reaction leads to the formation of a stable imino bond between the C-terminal aldehyde and the N-terminal amine group of Tyr [19,20]. In N. edaphicum CCNP1411 cells, depending on the Ncp analogue, the analyzed material (fresh or lyophilized) and extraction solvent, the cyclic Ncps constituted from even less than 10% (Ncp-A2, fresh biomass) to over 90% of the linear peptide ( Figure 10A-D and Figure S8). In case of Ncp-A1, with MePro-Leu in C-terminus, the contribution of the cyclic form was always most significant, and at pH 8 it reached up to 91.7% of the linear peptide aldehyde (Ncp-A1-L) ( Figure 10A-D and Figure S8). The cyclic analogues, Ncp-E1 and Ncp-E3 were produced in trace amounts and their spectra were sporadically recorded. It was proven that the macrocyclization process of Ncps is determined by the geometry of the linear peptide aldehyde and the conformation of D-Gln and Gly is crucial for the folding and formation of the imino bond [19]. As these two residues are present in all detected Ncps, then, probably other elements of the structure affect the cyclization process, as well. We hypothesize that due to the steric hindrances, the cyclization of Ncp-A1 with Leu in C-terminal position is easier than the cyclization of Ncp-A2 with Phe. As a consequence, the proportion of the cyclic Leu-containing Ncp-A1 to the linear form of the peptide is higher.
Thus far, Ncps synthesis was reported in few Nostoc strains, and the structural diversity of the peptides was found to be low. Other classes of NRPs were detected in cyanobacteria representing different orders and genera, and within one class of the peptides numerous analogues were detected. For example, the number of naturally produced cyclic heptapeptide microcystins (MCs), is over 270 [40,41] and in one cyanobacterial strains even 47 MCs analogues were detected [40]. Cyanopeptolins are produced by many cyanobacterial taxa and so far more than 190 structural analogues of the peptides have been discovered [41]. In this work, cyanopeptolin gene cluster was identified in N. edaphicum CCNP1411 and thirteen products of the genes were previously reported [6]. These peptides contain seven amino acids and a short fatty acid chain, and only one element of the structure, 3-amino-6-hydroxy-2-piperidone (Ahp), is conserved [6]. The structural diversity of NRPs is generated by frequent genetic recombination events and point mutations in the NRP gene cluster. The changes in gene sequences affect the structure and substrate specificity of the encoded enzymes. The tailoring enzymes can further modify the product, leading to even higher diversity of the synthetized peptides [42]. In case of Ncps, both the number of the producing organisms and the structural diversity of the peptides are limited. Ncp-M1 from Nostoc living in symbiosis with gastropod [16] is the only Ncp with structure distinctly different from Ncp-A1, Ncp-A2 and other Ncps described in this work.
The diversity within one class of bioactive metabolites offers a good opportunity for structure-activity relationship studies, without the need to synthesize the variants. The studies are of paramount importance when the efficacy and safety of a drug candidate are optimized. Therefore, in our future work, when sufficient quantities of pure Ncps are isolated, the activity of individual analogues against different cellular targets will be tested and compared, in order to select the lead compound for further studies.

Isolation, Purification and Culturing of Nostoc CCNP1411
Nostoc strain CCNP1411 was isolated in 2010 from the Gulf of Gdańsk, southern Baltic Sea, by Dr. Justyna Kobos. Based on the 16S rRNA sequence (GenBank accession number KJ161445) and morphological features, such as the shape of trichomes, cell size (4.56 ± 0.30 µm wide and 4.12 ± 0.72 µm long) and lack of akinetes [43,44], the strain was classified to N. edaphicum species. Purification of the strain was carried out by multiple transfers to a liquid and solid (1% bacterial agar) Z8 medium supplemented with NaCl to obtain the salinity of 7.3 [45]. To establish the strain as a monoculture, free from accompanying heterotrophic bacteria, a third-generation cephalosporin, ceftriaxone (100 µg/mL) (Pol-Aura, Olsztyn, Poland) was used. In addition, the purity of the culture was regularly tested by inoculation on LA agar (solid LB medium with 1.5% agar) [46] and on agar Columbia +5% sheep blood (BTL Ltd. Łódź, Poland), a highly nutritious medium, recommended for fastidious bacteria. Cyanobacteria cultures were grown in liquid Z8 medium (100 mL) at 22 ± 1 • C, continuous light of 5-10 µmol photons m −2 s −1 . After three weeks of growth, the cyanobacterial biomass was harvested by passing the culture through a nylon net (mesh size 25 µm) and then freeze-dried before further processing.

Isolation and Sequencing of Genomic DNA
Genomic DNA of N. edaphicum CCNP1411 was isolated using SDS/Phenol method as described previously [47,48]. DNA quality control was performed by measuring the absorbance at 260/230 nm, template concentration was determined using Qubit fluorimeter (Thermo Fisher Scientific, Waltham, MA, USA), and DNA integrity was analyzed by 0.8% agarose gel electrophoresis and by PFGE using Biorad CHEF-III instrument (BioRad, Hercules, CA, USA).
The long reads were obtained using the GridION sequencer (Oxford Nanopore Technologies, Oxford, UK). Prior to long-read library preparation, genomic DNA was sheared into 30 kb fragments using 26 G needle followed by size selection on Bluepippin instrument (Sage Science, Beverly, MA, USA). DNA fragments above 20 kb were recovered using PAC30 kb cassette. 5 µg of recovered DNA was taken for 1D library construction using SQK-LSK109 kit and 0.5 µg of the final library was loaded into R9.4.1 flowcell and sequenced on MinION sequencer.

Genome and NRPS Alignment
Genome assembly was annotated using the NCBI Prokaryotic Genome Annotation Pipeline [52] with the assistance of prokka [53] refine annotation, with additionally curated database comprised of sequences selected by Nostocales order from NCBI non-redundant and refseq_genomes (280 positions) databases, enriched by 35 NRPS/PKS clusters selected by cyanobacteria phylum. To create circular maps of N. edaphicum CCNP1411 genome, the CGView Comparison Tool [54] was engaged with additional GC skew and GC content analyses.
Selected span for potential NRPS cluster was confirmed with BLASTn, BLASTp [55] and antiSMASH [56]. ORFs start codons within a putative cluster were verified by the presence of ribosome binding sites, 4-12 nucleotides upstream of the start codon. Schematic comparison of ORF BLASTn from relative synthetases, AY167420.1 and CP026681.1, was visualized by EasyFig program (http://mjsull.github.io/Easyfig/files.html). Annotated regions of NRPS span were subjected for NCBI Conserved Domain Database search [57] with a set e-value threshold (10 −3 ), determining evolutionarily-conserved protein domains and motifs against CDD v3.18 database. Recognized motifs were selected using samtools v.1.9 and were subjected for protein structure and function prediction by I-TASSER [58], and results were confirmed with literature reports, PKS/NRPS Analysis Web-site prediction [59] and reevaluated using MEGAX suite [60]. Amino-acid sequence was visualized by BOXSHADE 3.2 program (https://embnet.vital-it.ch/software/BOX_form.html). Determination of domain ligand binding and active sites was achieved using COFACTOR and COACH part of I-TASSER analyses confirmed by MUSCLE amino acid alignment from MEGA X.

Data Deposition
Genomic sequences generated and analyzed in this study were deposited in the GenBank database under BioProject number: PRJNA638531.

Extraction and LC-MS/MS Analysis
For LC-MS/MS analyses of Ncps, the lyophilized (10 mg DW) biomass of N. edaphicum CCNP1411 was homogenized by grinding with mortar and pestle, and extracted in 1 mL of milliQ water, 20% methanol (pH 3.5, 6.0 and 8.0) and 50% methanol in water. The pH was adjusted with 0.5 M HCl and 1.0 M NaOH. In addition, the fresh material (500 mg FW) was extracted in 20% methanol in water. The samples were vortexed for 15 min and centrifuged at 14,000 rpm for 10 min, at 4 • C. The collected supernatants were directly analyzed by LC-MS/MS system.
The LC-MS/MS was carried out on an Agilent 1200 HPLC (Agilent Technologies, Waldbronn, Germany) coupled to a hybrid triple quadrupole/linear ion trap mass spectrometer QTRAP5500 (Applied Biosystems MDS Sciex, Concord, ON, Canada). The separation was achieved on a Zorbax Eclipse XDB-C18 column (4.6 mm ID × 150 mm, 5 µm; Agilent Technologies, Santa Clara, CA, USA). The extract components were separated by gradient elution from 10% to 100% B (acetonitrile with 0.1% formic acid) over 25 min, at a flow rate of 0.6 mL/min. As solvent A, 5% acetonitrile in MilliQ water with 0.1% formic acid was used. The mass spectrometer was operated in positive mode, with turbo ion source (5.5 kV; 550 • C). An information-dependent acquisition method at the following settings was used: for ions within the m/z range 500-1100 and signal intensity above the threshold of 500,000 cps the MS/MS spectra were acquired within the m/z range 50-1000, at a collision energy of 60 eV and declustering potential of 80 eV. Data were acquired with the Analyst ® Sofware (version 1.7 Applied Biosystems, Concord, ON, Canada).

Conclusions
Genes coding for subunits of the non-ribosomal peptide synthetase, in nostocyclopeptideproducing N. edaphicum CCNP1411, revealed differences in nucleotide compositions, compared to the previously described ncp cluster of Nostoc sp. ATCC53789. Although the analysis of fragments of genes coding for active sites and ligand binding sites of the conserved protein domains derived from N. edaphicum CCNP1411 and Nostoc sp. ATCC53789 indicated identical amino-acid compositions, residues within adenylation domains and substrate binding sites were different between compared sequences. This finding may highlight sites prone to mutations within regions accounted for structure and substrate stability. Analysis of ncp gene products in N. edaphicum CCNP1411 led to the detection of new nostocyclopeptide analogues. However, modifications in their structure were minor and limited to three positions of the heptapeptides. Although the naturally produced nostocyclopeptides were previously described as cyclic structures, in N. edaphicum CCNP1411 they are mainly present as linear peptide aldehydes, indicating a slow cyclization process.