A Data-Independent Methodology for the Structural Characterization of Microcystins and Anabaenopeptins Leading to the Identification of Four New Congeners

Toxin-producing cyanobacteria are responsible for the presence of hundreds of bioactive compounds in aquatic environments undergoing increasing eutrophication. The identification of cyanotoxins is still emerging, due to the great diversity of potential congeners, yet high-resolution mass spectrometry (HRMS) has the potential to deepen this knowledge in aquatic environments. In this study, high-throughput and sensitive on-line solid-phase extraction ultra-high performance liquid chromatography (SPE-UHPLC) coupled to HRMS was applied to a data-independent acquisition (DIA) workflow for the suspect screening of cyanopeptides, including microcystin and anabaenopeptin toxin classes. The unambiguous characterization of 11 uncommon cyanopeptides was possible using a characterization workflow through extensive analysis of fragmentation patterns. This method also allowed the characterization of four unknown cyanotoxins ([Leu1, Ser7] MC-HtyR, [Asp3]MC-RHar, AP731, and AP803). The quantification of 17 common cyanotoxins along with the semi-quantification of the characterized uncommon cyanopeptides resulted with the identification of 23 different cyanotoxins in 12 lakes in Canada, United Kingdom and France. The concentrations of the compounds varied between 39 and 41,000 ng L−1. To our knowledge, this is the first DIA method applied for the suspect screening of two families of cyanopeptides simultaneously. Moreover, this study shows the great diversity of cyanotoxins in lake water cyanobacterial blooms, a growing concern in aquatic systems.


Introduction
Eutrophication of natural water sources is closely linked to the distinctive appearance of massive and episodic proliferations of cyanobacteria. These prokaryotic organisms do not systematically carry the expressed genes for toxin production, yet about 40 of the 150 cyanobacteria genera do possess these genes [1]. For more than two decades, microcystins (MCs) have been the main family these genes [1]. For more than two decades, microcystins (MCs) have been the main family of cyanopeptides extensively studied. This dominance has been triggered by tragic incidents, such as in a Brazilian hospital in 1996, where 52 patients undergoing dialysis succumbed to liver failure caused by contaminated water with MCs [2]. Following this, the World Health Organization (WHO) has suggested regulation levels for MC-LR in drinking water (1 µ g L −1 ) which was extended by the US EPA to MC-LR equivalents to include more congeners and other cyanotoxins [3,4]. However, several families of cyanopeptides have long been identified along with MCs isolated from common cyanobacteria, i.e., Microcystis sp. Amongst them, cyanopeptolins, anabaenopeptins (APs), aerucyclamides, aeruginosines, and microginins are to mention when specifying the dominant families [5]. Still, the high diversity of produced congeners from each family and the little information known about factors and mechanisms linked to their production greatly complicates their study.
Potential cyanopeptides toxicity critically depends on the variants structure, but is still misunderstood and poorly documented [5]. MCs are hepatotoxic and readily accumulated in the liver from the specific binding to protein phosphatases 1 and 2A. The latter causes disruption of cellular homeostasis, and, in most acute cases, leads to liver necrosis, as well as colorectal and liver cancer [6]. Thus far, bioactive APs are considered non-toxic. Nevertheless, a few studies suggest that some APs congeners, such as AP-A, may demonstrate the potential to inhibit protease and protein phosphatases [7]. Moreover, AP-B and -F induce cyanobacteria lysis, ultimately affecting the bioavailability of other cell-bound cyanotoxins [8]. Accordingly, much still needs to be done on the unambiguous identification of these cyanopeptides and the assessment of their potential toxicity.
Cyanopeptide's structures are characterized by cyclic or linear non-ribosomal peptides, each family possessing a characteristic substructure and some variable amino acids and peptides. These variations in the core structure of each cyanopeptide multiply the number of combinations which is the cause of the large variety of potential congeners; to date, more than 500 cyanopeptides, including 240 MCs and 96 APs have been identified [7,9,10]. More specifically, MCs are cyclic heptapeptides ( Figure 1) with a characteristic β-amino acid moiety named Adda (3-amino-9-methoxy-2,6,8trimethyl-10-phenyldeca-4,6-dienoic acid), and two distinctive positions with the highest variation of monomers (X and Z). APs are cyclic peptides bound through a characteristic ureido-linkage ( Figure  1); their structure is characterized as the following: AA1-CO-[Lys-AA3-AA4-MeAA5-AA6] with AA representing a variable amino acid residues and brackets, including the cyclic structure [9]. Based on the various amino acid combinations identified for these two families, an extensive list of potential amino acids per variable sites can be proposed to enumerate all possible theoretical combinations of cyanopeptides identifiable to date [11]. Based on the proposed combinations, one could theoretically propose a significantly higher number of congeners, although most of the variants may not occur naturally in practice, due to the low frequency of some amino acids in the possible combinations.  High-resolution mass spectrometry (HRMS) can use exact mass measurement coupled to database and software packages to become an increasingly more effective tool regarding the accurate identification of the suspect and unknown compounds without the use of certified standards, where target analysis is unfeasible. Suspect and non-target screening are the two main strategies used for the exhaustive search of the known and unknown compound where almost no reference material is available. In recent years, the use of these screening techniques in the environmental field has greatly increased, particularly for the non-target analysis of pharmaceuticals, pesticides, hormones in surface and treated water [12,13]. Reversewise, the presence of cyanotoxins in surface water has only been investigated by few authors using this type of analysis [11,[14][15][16]. Isobaric interferences and co-eluting substances can represent major challenges in the identification process of a compound even when using HRMS. Moreover, a sole analysis, based on the accurate mass, is insufficient to confirm a structural identification, e.g., determining the degradation by-products or metabolites related to a compound of interest. A non-target screening method should include various confirmatory elements, such as the accurate mass (m/z), mass defect, isotopic pattern, charge states, adducts and fragmentation pattern that increase the confidence of identification [17]. Suspect screening includes the benefit and disadvantage to depend on suspect lists. It is mainly based on some of the information mentioned above for the identification, but a major drawback comes from a lack of data in online libraries for some small molecule families, i.e., cyanotoxins, which allow a formal identification [17]. Nonetheless, considering the possibility to build specific in-house databases for the unambiguous identification of the known and unknown cyanopeptides is promising for the study and identification of less known congeners.
Several analytical strategies have been employed in the past to identify new cyanopeptide structures. Historically, nuclear magnetic resonance (NMR) was the method of choice regarding the structure elucidation of new cyanopeptides, sometimes combined with mass spectrometry (MS), but has been mainly applied only on cyanobacterial cultures and blooms where the cyanotoxins are typically found at higher concentrations and the matrices are less complex [18][19][20][21]. In environmental samples, the toxins are not concentrated enough for this technique. Therefore, MS-based methods with unambiguous identification are widely used. Matrix-assisted Laser Desorption/Ionization Time-of-Flight (MALDI-TOF) systems are used for accurate and simultaneous identification and quantification analysis in many complex matrices [7,[22][23][24][25]. For higher sensitivity and selectivity, liquid chromatography (LC) coupled to HRMS are increasingly used for quantitative and qualitative analysis. In the past, the LC coupled to several MS analyzers has proven effective in identifying unknown cyanopeptide congeners: Tandem mass spectrometry [26,27], Q-Trap [7,28], Q-TOF [11,15,29], Orbitrap TM [14,30,31], and FTICR [32]. The fragmentation of precursor ions is key to allow unambiguous identification through the different amino acids and peptides, which are identifiable via their specific fragmentation spectra. Very few studies were developed to propose suspect screening methods for the analysis of MCs and APs in freshwater samples, whereas, most strategies are based on non-target screening of which the use of databases may not be necessary [11,15].
Most studies use data-dependent acquisition (DDA) to generate a wide full scan (FS) and fragmentation (MS/MS) information, which selects precursor masses with a list of exact masses that trigger the fragmentation of the most intense precursors, i.e., top 10 [33]. DDA is a very specific and useful method for non-target screening, but lacks speed when the suspect lists are too large to manage and could be limited by the duty cycle of the instrument, leading to MS/MS data loss of the less intense precursors. Data-independent acquisition (DIA) is another type of experiment that induces fragmentation of all precursor ions in a selected m/z window. DIA is very useful when fragmentation patterns of suspect compounds are known, but the MS/MS data generated by this acquisition mode are highly complex and may be difficult to interpret, due to co-eluting compounds from the sample or undesirable compounds from the matrix [33].
In this respect, a suspect screening strategy, based on a DIA experiment, was developed along with a generated list of candidates for the unambiguous identification of uncommon MCs and APs congeners. The study mainly focused on MCs and APs, which were the two main groups of compounds found in the sampling regions (southern Quebec lakes, Canada) [34]. A DIA-based method was developed with an automated solid phase extraction coupled to ultra-high liquid chromatography with heated electrospray ionization, and detection by a Q-Orbitrap TM (SPE-UHPLC-HRMS) [34]. This integrated strategy allowed a sensitive and high-throughput analysis with minimum sample treatment for unambiguous identification of MCs and APs in freshwater. Two in-house databases with theoretical masses were built, including 660,960 MCs and 61,152 APs based on the combination of all the experimental peptides found to date in each compound family. To report the evidence of uncommon cyanopeptides, an optimization of analytical protocols was then described, and an identification strategy, based on levels of confidence, was applied [17,33,35]. A thorough discussion explained the optimized workflow, which led to the identification and characterization of MCs and APs, some of which have yet been unreported in the literature. Ultimately, a quantification of common cyanotoxins was done in real field samples, and a semi-quantification of suspected MCs and APs was achieved without the use of certified standards to estimate their environmental concentration. To our knowledge, this is the first report of a suspect screening strategy for the identification of known and unknown MCs and APs simultaneously along with the characterization of new cyanopeptides using a DIA-based method.

HRMS Parameters for Suspect Screening via DIA
The optimization of UHPLC-HRMS parameters for the suspect screening was focused on the MCs and APs analysis. The choice of these two families is the result of their high frequency in toxic algal blooms in selected freshwater sampling sites [34]. The FS mode acquisition window was chosen to include [M+H] + and [M+2H] 2+ ions, being set between m/ z 300 and m/z 1400. This mass range includes relatively low masses which results in a more complex mass spectrum, due to the presence of matrix related compounds [11]. Thus, the use of Compound Discoverer 3.0 software (CD) (Thermo Fisher Scientific, San Jose, CA, USA) for raw data treatment has helped to significantly reduce the complexity of the dataset, by using a database and by targeting adducts. The establishment of appropriate chromatographic and mass spectrometry settings also allowed the reduction of the complexity of mass spectrum data.
Optimal fragmentation energy from the higher-energy C-trap dissociation (HCD) cell is needed to ensure an appropriate fragmentation of each compound. This appropriate fragmentation allows obtaining the best signal-to-noise (S/N) ratio for characteristic fragments ions, while keeping a small intensity, i.e., 10% of the parent ion in the FS spectrum. Both parent and product ions are used for the identification step. MCs and APs have similar structures, but the needed energy values for fragmentation differ depending on the combinations of amino acids in the basic structure and the ion form. Collision energies were manually optimized for MCs and APs individually by directly injecting certified standards in the ion source. As a result, stepped normalized collision energies (NCE) of 10, 20 and 30 were applied to induce the fragmentation of all suspect compounds optimally.
The DIA mode has been explained in numerous studies and results in complex mass spectra [17,33,[36][37][38]. Several consecutive mass isolation windows (selected by the quadrupole) were included in the DIA strategy. Indeed, these mass isolation windows are subjected to an all ion fragmentation (AIF) scan mode which fragments all precursor ions from the whole mass dynamic range which explains the complexity of the fragmentation spectra after acquisition [17,36]. These isolation mass ranges have been voluntarily narrowed compared to what we can observe in the literature to reduce the complexity of the acquired data. By fragmenting these narrowed isolation windows, one can, therefore, significantly reduce the amount of data per spectra-thus, facilitating the interpretation of data and compound identification [36]. With this in mind, the number of isolation windows should be as high as possible to simplify the raw data, but the dual time may not be sufficient to obtain enough acquisition points per chromatographic peaks and adequate analytical results [38]. An optimization of the number of isolation windows is presented in Figure 2 for MC-LR and AP-A, and three m/z width values were tested, i.e., 25 m/z, 50 m/z and 100 m/z resulting in, respectively, 44, 22 and 11 isolation windows (on a m/z 300-1400 FS acquisition window). From 22 to a higher number of isolations windows, which means with 25 m/z width, the fragmentation spectra quality was enhanced. However, the dual time of the mass spectrometer was not sufficient to acquire enough points per peaks (i.e., around 10) for a quality identification analysis. On the other hand, identification was made easier with a higher number of isolation windows, thus 22 and higher. Best results were finally obtained with isolations windows of 50 m/z width, which gave the best compromise between the fragmentation spectra quality and the number of acquisition points per peak [38]. a higher number of isolations windows, which means with 25 m/z width, the fragmentation spectra quality was enhanced. However, the dual time of the mass spectrometer was not sufficient to acquire enough points per peaks (i.e., around 10) for a quality identification analysis. On the other hand, identification was made easier with a higher number of isolation windows, thus 22 and higher. Best results were finally obtained with isolations windows of 50 m/z width, which gave the best compromise between the fragmentation spectra quality and the number of acquisition points per peak [38].

Building in-House Databases
To build in-house databases, which include all theoretical MCs and APs exact masses, exhaustive lists of the amino acids on each site of the peptide chains of these two compounds were built from several sources of literature, based on the structure of all known MCs and APs [5,7,[9][10][11]28]. These lists are presented in Supplementary Information Figure S1. A macro was later built on an Excel® file to generate in silico the adequate amino acid combinations from each list. Respectively 660,960 and 61,152 exact theoretical masses for MCs and APs were then obtained from this calculation. For each combination, the amino acid exact masses were summed to obtain the monoisotopic molecular weights of each theoretical compound. Then, this in silico generated lists of exact masses were included as a database for the raw data processing by CD software. However, these lists are massive, and include a large number of duplicate values. Thus, to facilitate automated raw data processing, the lists were reduced to suppress all duplicates, and these included 8709 and 8815 unique exact masses for MCs and APs, respectively. Finally, an Excel ® file was built from the first lists, which included the detailed combinations for each theoretical compound. These combination lists could then be used to identify potential new combinations of amino acid, and thus, new cyanopeptides and be validated by the manual interpretation of MS/MS spectra, which is explained in section 2.4.

First Features Selection with Compound Discoverer
Using the previously described workflow through CD software, different lists of features were generated. The FS acquisition data included in the DIA experiment was set for the data treatment and features search. Aside the in-house lists, including all theoretical MCs and APs exact masses that were correlated with the FS data in this study, a search of features in other available online database (ChemSpider and mzCloud TM ) was also included in the data treatment workflow to look at the total number of features that can be identified in the selected environmental samples. Overall, after blank subtraction, the number of selected features by CD varied between about 1600 and 6000 in the twelve

Building in-House Databases
To build in-house databases, which include all theoretical MCs and APs exact masses, exhaustive lists of the amino acids on each site of the peptide chains of these two compounds were built from several sources of literature, based on the structure of all known MCs and APs [5,7,[9][10][11]28]. These lists are presented in Supplementary Information Figure S1. A macro was later built on an Excel ® file to generate in silico the adequate amino acid combinations from each list. Respectively 660,960 and 61,152 exact theoretical masses for MCs and APs were then obtained from this calculation. For each combination, the amino acid exact masses were summed to obtain the monoisotopic molecular weights of each theoretical compound. Then, this in silico generated lists of exact masses were included as a database for the raw data processing by CD software. However, these lists are massive, and include a large number of duplicate values. Thus, to facilitate automated raw data processing, the lists were reduced to suppress all duplicates, and these included 8709 and 8815 unique exact masses for MCs and APs, respectively. Finally, an Excel ® file was built from the first lists, which included the detailed combinations for each theoretical compound. These combination lists could then be used to identify potential new combinations of amino acid, and thus, new cyanopeptides and be validated by the manual interpretation of MS/MS spectra, which is explained in Section 2.4.

First Features Selection with Compound Discoverer
Using the previously described workflow through CD software, different lists of features were generated. The FS acquisition data included in the DIA experiment was set for the data treatment and features search. Aside the in-house lists, including all theoretical MCs and APs exact masses that were correlated with the FS data in this study, a search of features in other available online database (ChemSpider and mzCloud TM ) was also included in the data treatment workflow to look at the total number of features that can be identified in the selected environmental samples. Overall, after blank subtraction, the number of selected features by CD varied between about 1600 and 6000 in the twelve lakes selected for this study (see Table S1 for more details on the samples). These numbers were consistent with most untargeted studies and can be tedious to interpret in order to find features relevant to one's research [39][40][41]. This is where the selection criteria were useful to narrow down these lists, while searching for new cyanopeptides. First, the use of exact masses compared to the built-in database reduced the number of features of 12 to 116 for MCs and APs that would be identified as level 5 compounds (exact mass only) according to the identification levels strategy and confidence proposed by several studies [17,33], (Table S2). From there, the lists were narrowed down by selecting features with an appropriate isotopic pattern, adducts, retention times (RT), molecular formula and appropriate standard deviation (SD), resulting to lists of 3 to 51 features identified with level 3 confidence (tentative candidates by chemical class) [33]. Going further, distinctive fragments were used to strengthen the identification, and as described in Section 4.4, these fragments are common to all congeners of MCs and APs and were searched in the MS/MS spectra of the DIA experiment. After the feature's selection, the lists were finally reduced between 0 and 17 features depending on compounds and samples. These features would then be taken to the last level of identification, which is a further manual study of the MS/MS spectra in order to make a structural identification of the features to finally identify them as potential or confirmed compounds. In short, the first lists of potential features were reduced, a 10-fold. Although this first selection was made automatically by CD, apart from the search for specific fragments, this exercise showed the importance of using rigorous criteria for unambiguous identification of features needed for the confirmation of the structure. Those would lead to a level 2 (probable structure by spectrum match) or a level 1 (confirmed structure by reference standard) characterization [17,33].

Confirmation of Suspects Using MS/MS Spectra
For an exhaustive identification of suspect compounds and to confirm their identify using fragmentation patterns, samples were re-analyzed using a parallel reaction monitoring (PRM) scan mode with inclusion lists, including the last selected features of MCs and APs for each sample set at the same retention time. Using PRM scan mode enabled to generate quality MS/MS spectra that would be easier to interpret with more specific fragmentation spectra. In parallel, a theoretical list of fragments was built in accordance with the literature for MCs and APs, including the most encountered amino acids combinations found in fragmentation spectra to better interpret the often-complex spectra [7,26,29,32,42]. The identification workflow presented below is based on characteristic fragments of unique amino acids or the addition of multiple amino acids found in MS/MS spectra. The strategy is to narrow down the number of candidates by identifying amino acids one by one in the structure until the structure can be confirmed.

Microcystins Structures Elucidation
Before identifying unknown MC candidates, the workflow was tested with the elucidation of MC-LR found in sample no. 3 and confirmed with a certified standard shown in Table 1 and Figure  S2. This example confirmed the accuracy of the list of specific fragment ions compared to the mass spectra. The structure elucidation workflow is presented in the next paragraph with the first example, m/z 1105.59150 found in sample no. 12. In this case, 482 different combinations of MCs within the 5 ppm mass accuracy range, where found with this exact mass (Table S3). In other words, this exact mass corresponds to a large number of combinations identified at level 5 of characterization. This is why it is necessary to add an MS/MS interpretation to confirm the compound structure. For the structure elucidation workflow, a fragment ion was first used to confirm the presence of any form of the Adda moiety. This fragment ion would be present in all MCs' structures and corresponds to m/z 163.11229. Afterwards, a second specific fragment ion was used to identify the form of Adda moiety found in the compound according to the following masses: m/z 135.08099 for Adda and (6Z)Adda, m/z 121.06534 for DMAdda and m/z 163.07591 for ADMAdda). In the case of m/z 1105.59150, specific fragment ions m/z 163.11150 and 135.08049 were identified in the fragmentation spectra of sample no. 12 ( Figure 3 and Table 1 for fragment ions identification details). These fragment ions demonstrate the presence of the group Adda or (6Z)Adda, an isomerization form with weaker biological activity [43]. Subsequently, the amino acid in position 6 (AA6) can be identified by the fragment ion characterized by [Adda-134+AA6-NH 3 +H] + (  (Table 1), and these two last amino acids were identified as Leu and Ser, respectively. The last step of data mining in the fragmentation spectra is finally done to identify a maximum of different fragment ions to strengthen structure characterization. Finally, for this feature, two different compounds were identified, due to the potential presence of Adda or (6Z)Adda and the level of identification is set at level 2 since a certified standard would have confirmed the identification of the compound. We have identified this compound as [Leu 1 , Ser 7 ]MC-HtyR, and this is to our knowledge the first time this compound was identified [10]. For each step of amino acids identification, a number of potential amino acids combinations were listed and are shown in Table S3. It shows that when only Adda was identified, 300 potential MCs were associated with the exact mass of this MC and when all the amino acids are identified, what are the potential MCs associated with the exact mass. This identification process was applied to the different features that were confirmed to be MCs, which is described in Table 1. The other features were identified as MCs, and corresponded to m/z 1009.57104, 1071.55340, 1085.56928 and 1038.57291. Each of these features were associated with six different potential compounds, due to two amino acid sites (Adda or (6Z)Adda and Mdha, Dhb or (Z)Dhb at position AA7). Considering the abundance of each amino acid, the compounds were identified as [GluOMe 6 ]MC-LR in sample no. 3 ( Figure S3 Figure S6). The three firsts were already identified in previous studies, and the last is also an unknown cyanotoxin [32]. However, to confirm the identification of these MCs, certified standards would be needed.

Anabaenopeptins Structures Elucidation
A second identification workflow to identify APs was tested for the elucidation of AP-A found in sample no. 5 and confirmed with a certified standard, shown in Table 2 and Figure S7. This demonstrates the list of specific fragment ions associated with the mass spectra. The identification workflow for the structure elucidation of APs found in the lake water samples is detailed in the next paragraph.
To identify potential APs, the first fragment ion used to narrow down the feature list is m/z 84.08136, an immonium fragment ion of lysine. Using this fragment ion alone, the features lists were reduced significantly, with 0 to 8 possible candidates identified with exact masses alone (Table S4). For the feature found at m/z 804.43535, only 11 potential combinations were found in the APs list (Table S4) Table 2) which was found to be Leu or Ile. With the list lowered at eight possible combinations, the theoretical fragment ion list was directly used to identify all the other amino acids and a new AP identified as AP803 with a structure described as (Ile or Leu) 1 -CO-Lys 2 -Met 3 -Leu 4 -MeIle 5 -Met(O) 6 according to fragmentation spectra ( Figure 4 and Table  2). Another new AP was identified at m/z 732.39224 according to fragmentation spectra ( Figure S8 and Table 2), found in sample no. 11. In this case, this AP731 was the only candidate in the combination list, which leads to one structure elucidated to be Phe 1 -CO-Lys 2 -Val 3 -Leu 4 -MeGly 5 -AcSer 6 . Finally, four known APs were identified without the use of certified standards ( Table 2): AP-C in sample no. 11 ( Figure S9), AP-F in samples no. 5 and 11 ( Figure S10), ferintoic acid A in sample

Anabaenopeptins Structures Elucidation
A second identification workflow to identify APs was tested for the elucidation of AP-A found in sample no. 5 and confirmed with a certified standard, shown in Table 2 and Figure S7. This demonstrates the list of specific fragment ions associated with the mass spectra. The identification workflow for the structure elucidation of APs found in the lake water samples is detailed in the next paragraph.
To identify potential APs, the first fragment ion used to narrow down the feature list is m/z 84.08136, an immonium fragment ion of lysine. Using this fragment ion alone, the features lists were reduced significantly, with 0 to 8 possible candidates identified with exact masses alone (Table S4). For the feature found at m/z 804.43535, only 11 potential combinations were found in the APs list (Table S4) Table 2) which was found to be Leu or Ile. With the list lowered at eight possible combinations, the theoretical fragment ion list was directly used to identify all the other amino acids and a new AP identified as AP803 with a structure described as (Ile or Leu) 1 -CO-Lys 2 -Met 3 -Leu 4 -MeIle 5 -Met(O) 6 according to fragmentation spectra ( Figure 4 and Table 2). Another new AP was identified at m/z 732.39224 according to fragmentation spectra ( Figure S8 and Table 2), found in sample no. 11. In this case, this AP731 was the only candidate in the combination list, which leads to one structure elucidated to be Phe 1 -CO-Lys 2 -Val 3 -Leu 4 -MeGly 5 -AcSer 6 . Finally, four known APs were identified without the use of certified standards ( Table 2): AP-C in sample no. 11 ( Figure S9), AP-F in samples no. 5 and 11 ( Figure S10), ferintoic acid A in sample no. 12 ( Figure S11), and oscillamide Y in samples no. 5 and 11 ( Figure S12). For AP-F and oscillamide Y, two possible structures were found for each mass according to fragmentation spectra and the candidate list. However, due to the abundance of these two APs in toxic cyanobacterial blooms [7,19,44], they were identified as such, but were not confirmed with certified standards, so the identification is considered at level 2 [17,33].
Toxins 2019, 11, x FOR PEER REVIEW 11 of 22 no. 12 ( Figure S11), and oscillamide Y in samples no. 5 and 11 ( Figure S12). For AP-F and oscillamide Y, two possible structures were found for each mass according to fragmentation spectra and the candidate list. However, due to the abundance of these two APs in toxic cyanobacterial blooms [7,19,44], they were identified as such, but were not confirmed with certified standards, so the identification is considered at level 2 [17,33].

Quantification and Semi-Quantification
The twelve samples from different locations in Canada, United Kingdom and France and underwent a quantitative analysis to monitor 17 known cyanotoxins (anatoxin-a (ANA-a), homoanatoxin-a (HANA-a), cylindrospermopsin (CYN), MCs: [Asp 3 ]-LR, [Asp 3 ]-RR, -LR, -RR, -YR, -LA, -LY, -LW, -LF, -WR, -HtyR and -HilR, AP-A and AP-B) according to previously published method [34]. Twelve cyanotoxins were reported in 11 lakes and results are shown in Table 3. For MCs, concentrations varied between 39 and 41,000 ng L −1 with MC-LR being the most abundant congener that was found in 67% of the samples. However, [Asp 3 ]MC-RR and MC-RR were predominant in the two European samples (samples no. 11 and 12) with the highest concentrations being 41,000 and 5700 ng L −1 and MC-LA was also predominant in three samples (1, 2 and 4) with concentrations varying between 364 and 1165 ng L −1 . In addition, AP-A and AP-B were found in half of the samples with concentrations varying from 95 up to 6000 ng L −1 . These two APs were also predominant in two samples (5 and 6), and their ubiquity is supported by previous studies [5,34,45,46]. Finally, CYN was found in samples no. 8 at low concentration (153 ng L −1 ), but its mere presence is rather uncommon and can be linked to the evolution of cyanobacterial species and strains in relation with eutrophication and other stressors of ecosystems [47,48]. This high diversity of cyanotoxins present in these lakes is a marker of the potential diversity in strains of toxic cyanobacteria. Table 3. Cyanotoxins detection in lakes from Canada, United Kingdom and France. Concentrations are reported in ng L −1 with a standard deviation of duplicate analysis (ND: Analyte not detected). * Indicative values ± concentration between method detection limit (MDL) and method quantification limit (MQL), which were previously reported by Roy-Lachapelle et al. (2019) [34]. Only the analytes with results > MDL are presented. Indeed, 11 uncommon cyanotoxins were found in 42% of the samples (sample no. 2, 5, 9, 11 and 12) including four unreported MCs and APs. These compounds were semi-quantified in order to estimate their concentration levels in the samples (Table 4). Different reference materials were chosen for each semi-quantified compound, according to the similarities in terms of structure and physico-chemical proprieties. All concentrations varied between 57 and 1035 ng L −1 corresponding to MC levels lower, equal or higher to the proposed recommendations for MC-LR equivalents by the World Health Organization (WHO) (1 µg L −1 ) and by the U.S. EPA (0.3 to 1.6 µg L −1 for 10 days) for drinking water as a primary comparison for toxicity [4,49]. However, very little to no information is available about bioactivity and toxicity of these new compounds, making the assessment of risks quite difficult to evaluate public health and environmental impact, due to the presence of this diversity of cyanotoxins in lake water samples [5]. The study of compounds with lower toxicity is also relevant since the toxicology of the majority of these compounds is still not well understood, implying that the accumulating effects of bioactivity and the synergetic effects are also unknown. In the future, it would be interesting to deepen the understanding of cyanotoxins toxicity by studying samples contaminated by a variety of known and unknown cyanotoxins to understand the impact of a complex cyanobacterial bloom as a whole and to study the bioactivity of less known and unknown, but sometimes abundant, cyanotoxins individually. Table 4. MCs and APs identified in samples with semi-quantified concentration levels reported in ng L −1 with a standard deviation of duplicate analysis (ND: Analyte not detected).

Conclusions
In this study, a new suspect screening strategy, based on a DIA experiment, was developed for the unambiguous identification of uncommon microcystins and anabaenopeptins congeners. This DIA-based method was developed with an automated SPE coupled to UHPLC with heated electrospray ionization, and detection by a Q-Orbitrap TM , which allowed a sensitive and high-throughput analysis with minimum sample treatment for the target-screening of 17 cyanotoxins, and the suspect screening of MCs and APs in freshwater samples. A structural-based methodology supported by fragmentation spectra led to the characterization of 11 uncommon cyanotoxins, including two MCs ([Leu 1 , Ser 7 ]MC-HtyR and [Asp 3 ]MC-RHar) and two APs (AP731 and AP803), that have not yet been reported in the literature and were found in five of the twelve surface water samples from different lakes located in Canada, United Kingdom and France. These cyanotoxins were subsequently semi-quantified with levels of concentrations varying between 57 and 1035 ng L −1 . Twelve targeted cyanotoxins were found in 11 lakes with concentrations ranging from 39 to 41,000 ng L −1 . Overall, high diversity in terms of cyanotoxins and concentrations was observed, which highlights all the work still required on the discovery of cyanotoxins and the understanding of their impact on the environment. Finally, to our knowledge, this is the first report of a suspect screening strategy, based on a DIA experiment for the simultaneous identification and characterization of known and unknown MCs and APs. DIA experiment, has the advantage of providing more information in the fragmentation spectra than other common acquisition methods, but also makes it possible to quantify suspect compounds via the FS acquisition directly. Although suspect-screening methods can be time consuming for routine analysis when compounds are unknown, they can be very powerful for the identification of new structures.
In addition, by targeting known and specific fragments, it would, therefore, be possible to use the developed method to characterize field cyanobacterial blooms by identifying uncommon cyanotoxins following a routine quantitative analysis when using available HRMS instruments. This method could eventually be applied to field samples and cultures to include other families of cyanopeptides to cover a larger range of cyanotoxins and ultimately perform a more accurate characterization of toxic algal blooms.

Sample Collection, Preparation and Quantification
Surface water sampling was conducted by the ATRAPP (Algal Blooms, Treatment, Risk Assessment, Prediction and Prevention through Genomics) research initiative co-financed by Genome Quebec and Genome Canada. The samples were collected in the photic zone of several lakes under surveillance, due to their occurrence of toxic algal blooms located in Canada, United Kingdom and France (Table  S1). At each sampling location, a duplicate set of samples was collected in 125 mL amber polyethylene terephthalate glycol-modified (PETG) bottles (Thermo Scientific TM Nalgene TM , Waltham, MA, USA), previously rinsed three times with the surface water from the site [34]. The bottles were then filled to the brim, sealed, stored at −20 • C until shipment and sent to the laboratory within 3 days. Upon reception at the laboratory, the samples underwent cell lysis to release the cyanotoxins with three freeze-thawing cycles. The samples were subsequently filtered through 25 mm diameter, 0.2 µm pore size Acrodisc GH Polypro (GHP) filters (Waters, Milford, MA, USA) [34]. A volume of 1450 µL of each filtered sample was transferred into 2-mL amber glass vials and kept at −20 • C until analysis. For all optimization experiments, analytes were spiked in water matrix consisting of analyte-free lake water sampled before harmful algal bloom seasons or matrix-matched water. Five replicates are spiked at mid-level concentration from linearity range (200 ng L −1 ). Prior to quantitative analysis, the internal standards were added for a final concentration of 300 ng L −1 . Samples underwent a quantitative analysis to monitor 17 known cyanotoxins (ANA-a, HANA-a, CYN, MCs: [Asp 3 ]-LR, [Asp 3 ]-RR, -LR, -RR, -YR, -LA, -LY, -LW, -LF, -WR, -HtyR and -HilR, AP-A and AP-B) according to previously published method [34]. Samples with most interesting results (e.g., high cyanotoxins concentrations and the presence of less common congeners) were selected to conduct further suspect screening analysis.

Instrumental Conditions
A Thermo Scientific Dionex UltiMate TM 3000 RS pump and column compartment were used for chromatographic separation. The Dionex UltiMate TM 3000 pump was coupled to the system used for on-line solid phase extraction (SPE), and both were controlled by Chromeleon 7.2 Software (Thermo Fisher Scientific, Waltham, MA, USA and Dionex Softron GMbH part of Thermo Fisher Scientific, Germering, Germany). A PAL system RTC autosampler was used (Zwingen, Switzerland) for injection. A Hypersil Gold (20 × 2mm, 12µm particle size, 175 Å pore size) column was used for on-line SPE, and the chromatographic separation was done with a Hypersil Gold (100 × 2.1mm, 1.9µm particle size, 175 Å pore size) column kept at 55 • C. Analysis of samples was performed using a Q-Exactive mass spectrometer controlled by the Xcalibur 3.0 software (Thermo Fisher Scientific, Waltham, MA, USA). Instrument calibration in positive mode was done every 7 days with a direct infusion of an LTQ Velos ESI Positive Ion Calibration Solution (Pierce Biotechnology Inc. Rockford, IL, USA), i.e., a mixture of caffeine, Met-Arg-Phe-Ala (MRFA) and Ultramark 1621 to reach mass accuracy within the 5 ppm range. Mass accuracy for all target compounds remained in the 5 ppm range in the 7-days post calibration.

On-Line Solid Phase Extraction and Chromatographic Conditions
On-line SPE and chromatographic conditions were adapted from previous quantitative method [34]. Briefly, 1 mL of the sample was injected, and the loading speed from the injection loop to the SPE column was 1 mL min -1 . A washing volume of 0.5 mL passed through the column following the sample loading step. The pre-concentration columns were finally back-flushed with MeOH and the eluting analytes were transferred using the analytical pump gradient directly through the analytical column and chromatographic separation is proceeded with the solvents acetonitrile (B), and water (A) with the addition of 0.1% formic acid at a flow rate of 525 µL min -1 . A total chromatographic run of 8 min was carried out for the first screening step of the samples (quantitative analysis). The chromatographic run was extended to 30 min to ensure better chromatographic separation when using the suspect screening method via DIA mode. These chromatographic parameters were also applied to the samples, including calibration curve and quality control standards for semi-quantification (see Supporting Information Figure S13 for more details).

HRMS Conditions
All the details about the HRMS conditions for quantitative analysis are presented in a previously published method [34]. The same ionization parameters were selected for the suspect screening acquisition method (see Table S5 for more details). For the DIA runs, each cycle consisted of one FS with resolving power set at 35,000 at full width at half maximum (FWHM) at m/z 200 with scan range between m/z 300 and 1400 to include singly and doubly charged ions from the MCs and APs suspect lists. The FS event was followed by 22 isolation scan windows acquired at a resolving power was set at 17,500 FWHM at m/z 200. Each isolation window width was set at m/z 50 and optimized to limit potential cofragmented ions, while getting enough acquisition points per chromatographic peaks [38]. NCE of 10, 20 and 30 were applied to ensure optimal fragmentation of suspect ions.

Suspect Screening Using DIA Methodology
The FS data were first processed using Compound Discoverer 3.0 (Thermo Fisher Scientific, Waltham, MA, USA). The workflow was built for the search of unknown compounds with in-house database searches, including all suspected cyanopeptides. MCs and APs were processed separately with the same workflow, but different database lists built according to the different molecular combinations, based on the potential amino acids in the molecules ( Figure S1) [5,7,10,11,29]. The data processing consisted first of a spectra selection with a retention time filter between 4 min (lower limit) and 15 min (upper limit), a peak integration, a retention time alignment, an unknown compound detection, an isotope and adduct peak grouping (H + , Na + , K + ), an unknown compound grouping and features merging, and a blank subtraction using uncontaminated lake water samples. Then, the grouped compounds were investigated in the in-house database searching with a mass tolerance of 5 ppm and retention time tolerance of 0.05 min. These databases were individually constructed for MCs and APs in Excel ® sheets, including the masses values from all the possible congener's combinations minus the duplicates, which resulted in 8,709 individual masses for MCs and 8,815 individual masses for APs. Afterwards, a composition prediction was achieved, including minimum and maximum element counts (MCs: C 39  Cl), and the maximum includes all possible elements present in all congeners. Finally, only the compounds detected in duplicate with a coefficient of variation of the signal intensity lower than 30% were retained for later steps.
The following data treatment was performed using the Xcalibur 3.0 Software (Thermo Fisher Scientific, Waltham, MA, USA). MCs and APs have both few distinctive fragments, which were used to narrow down the list of features, and these were searched in the MS/MS spectra from the DIA acquisition. For MCs, the Adda function was the specific marker with two fragments simultaneously found in a MS/MS spectra: A first common to all congeners (m/z 163.11229-Adda-134-NH 3 +H + ), and a second specific to the form of Adda (m/z 135.08099-Adda and (6Z)Adda, m/z 121.06534-DMAdda, m/z 163.07591-ADMAdda). For APs, the m/z 84.08136 fragment was used as a marker that corresponds to the immonium ion of lysine, an amino acid found in all APs. Though the mass is low, this fragment is rarely found in environmental samples when lysine is not present [50]. Finally, a second analysis of the samples was achieved in PRM scan mode with an inclusion list, including the exact masses from the last features list. Structural characterization was done with product ions and by associating this assignment with the amino acid combinations in the list of suspects generated for MCs and APs.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6651/11/11/619/s1, Figure S1: Configuration of amino acids (AA) in MCs and APs, Table S1: Details on samples with sampling date and region of sampling in Canada and Europe, Table S2: Confidence of identification by levels and number of features obtained at each step of identification using Compound Discoverer 3.0 software, Figure S2: Chromatogram, isotopic pattern and fragmentation spectra of MC-LR with RT at 8.79 min, Table S3: Number of MCs combinations using fragmentation spectra and identification of amino acids, Figure S3: Chromatogram, isotopic pattern and fragmentation spectra of feature m/z 1009.57104 identified as [GluOMe 6 ]MC-LR with RT at 11.95 min, Figure S4 Figure S7: Chromatogram, isotopic pattern and fragmentation spectra of AP-A with RT at 7.40 min, Table S4: Number of AP combinations at each level of identification using fragmentation spectra and identification of amino acid, Figure S8: Chromatogram, isotopic pattern and fragmentation spectra of feature m/z 732.39224 identified as AP731 with RT at 6.53 min, Figure S9: Chromatogram, isotopic pattern and fragmentation spectra of feature m/z 809.45396 identified as AP-C with RT at 7.31 min, Figure S10: Chromatogram, isotopic pattern and fragmentation spectra of feature m/z 851.47649 identified as AP-F with RT at 7.05 min, Figure S11: Chromatogram, isotopic pattern and fragmentation spectra of feature m/z 867.4376 identified as ferintoic acid A with RT at 7.72 min, Figure S12: Chromatogram, isotopic pattern and fragmentation spectra of feature m/z 858.43789 identified as oscillamide Y with RT at 7.58 min, Figure S13: Details on the on-line SPE -UHPLC chromatographic gradient program for quantification (a) and suspect screening methods (b), Table S5: Ionization and HRMS acquisition parameters.