Cyanobacteria blooms have been a growing concern for many lake stakeholders in New England, particularly in New Hampshire. Monitoring cyanobacteria blooms has become an increased need in many lake communities. Contrary to traditional water quality sampling, monitoring cyanobacteria blooms using a UAS allows the user to assess the entire waterbody, reduces sample analysis and processing times, and increases sampler safety. The use of a very high spatial resolution multispectral camera flown on a DJI Matrice 300 RTK was investigated to capture reflectance values centered around ten different wavelengths of light of lakes known to experience cyanobacteria blooms.
4.1. Explanation and Interpretation of Results
Through building paired datasets of both water quality and UAS spectral data, multivariate classification and regression analyses were conducted. Most importantly, discerning if a sample was above or below the New Hampshire state threshold for cyanobacteria cell concentration yielded an overall accuracy of 93%, a chlorophyll-a concentration above or below 10 μg/L had an overall accuracy of 87%, and a phycocyanin concentration above or below 54.8% was 92% accurate overall. Therefore, this process could help lake stakeholders make informed management decisions regarding closures of certain use areas of the waterbody throughout the bloom season. Looking at the random forest algorithm results, a potential explanation for the lower overall accuracy for chlorophyll-a could possibly be attributed to the method in which chlorophyll-a was extracted. During this laboratory procedure, water samples were filtered through a 47 mm filter with pore sizes of 0.7 µm (
Appendix B). This was the only analysis to include a filtration step. Although unlikely, it is possible that some dissolved chlorophyll-a, less than 0.7 µm, was able to flow through the filter rather than be trapped by it, thus not being measured in the hot ethanol and fluorescence portion of the analysis. It is also possible that chlorophyll-a was captured on filters from organisms other than cyanobacteria, including green algae or plant cells as chlorophyll-a is not exclusive to cyanobacteria.
Multivariate regressions proved difficult with this dataset due to both structural multicollinearity and data multicollinearity in the spectral data. Simple regression was the preferred method for similar studies. Variation in cell concentrations, chlorophyll-a concentrations, and phycocyanin concentrations can be poorly explained by individual spectral features (
Table 7). Simple regressions were calculated for the top ten percent of the most important features (determined from the average feature important scores from the random forest algorithm). As shown in
Table 7, the reflectance data from the Blue 475 wavelength, NGBDI_4, and NGRDI_4 indices were the three most important features for classifying cell concentrations into “High” and “Low” categories based on the UAS or spectral data. The Blue 475 band is part of the NGBDI_4 equation, which also contains the green 560 band (
Appendix C). The Green 560 band is also found in the NGRDI_4 equation along with the red 668 band. However, the Green 560 band alone was not found to be within the top 10% of the most important features. The NGRDI_4, Green 531, and FLHblue_2 features were the most important features for classifying chlorophyll-a concentrations into high and low categories based on the spectral data. The FLHblue_2 equation uses the Blue 444, Green 531, and Red 650, 668 bands. Lastly, the Blue 475, NGBDI_4, and CI_2 features were most important for classifying phycocyanin concentrations into high and low categories based on spectral data. The CI_2 contains data from the red 650 and 668 bands, in addition to the red edge at 705 nm. Four of the total seventy-eight features were found to be in the top 10% most important features for all three water quality parameters: Blue 475, NGRDI_4, CI_2, and Green_531 (
Table 7).
Identifying which spectral features were most important for studying cyanobacteria concentrations with surrogate water quality parameters can guide emerging studies. Based on these findings, future studies should use sensors capable of capturing imagery from or near 444 nm,
475 nm, 531 nm,
560 nm, 650 nm,
668 nm, and 705 nm; and especially, those in bold. The MicaSense RedEdge-MX Dual Camera Imaging System senses wavelengths comparable to the Landsat 8 and Sentinel 2 satellites [
51,
52]. The important similarities of the MicaSense sensor to the Sentinel 2 (S2) satellite include the 475 band (490 S2), 560 band (560 S2), and 668 (664 S2).
The collection and processing of the water quality samples took over 4.5 times longer than the collection and processing of the UAS data. The three most time-consuming components were determining cell and chlorophyll-a concentrations, and physically collecting each water quality sample via canoe. Other studies have evaluated the applicability and reliability of using phycocyanin as indicators for cyanobacteria rather than relying on the time intensive cell counting for cell concentration [
10,
29,
30,
53,
54]. This study showed significant relationships between cell and phycocyanin concentrations at French Pond, Keyser Pond, Silver Lake, Tucker Pond, and for the entire dataset. The ability to use fluorometry to measure phycocyanin rather than the time intensive method of counting cells to determine cell concentrations or filtering and analyzing samples to measure chlorophyll-a concentrations would drastically improve the speed at which analyses could be made and results shared with communities. In the time it took to collect all the water samples alone, the entire UAS methodology could have been conducted and completed (
Figure 6). This time comparison does not include tasks shared by both processes which include time traveled to each lake, communication with lake residents and stakeholders, or data analysis.
4.2. Limitations
Five of the six lakes included in this study experienced cyanobacteria blooms in 2022 according to the NHDES Harmful Algal Bloom Monitoring Program (
Table 1). However, samples showing bloom forming conditions were only collected at Keyser Pond, Showell Pond, and Silver Lake. There were no collected samples indicating bloom conditions from French Pond, Greenwood Pond, or Tucker Pond as follows:
French Pond did not have any cyanobacteria blooms during the 2022 field season; thus no “High” cell concentration samples were collected.
The cyanobacteria bloom present at Greenwood Pond in 2022 was very rapid. Once sampling was conducted, the cyanobacteria bloom had subsided.
At Tucker Pond, a pixilated surface bloom of Worochinia and Microcystis was present in small groupings along the southern and southwestern shores. Notes from lake residents indicated the bloom only appeared to span roughly 15 feet into the lake. On the day of sampling, the concentrations of cyanobacteria were found to be below the state of New Hampshire’s advisory threshold of 70,000 cells/mL.
On par with the Microcystis and Worochinia bloom in Tucker Pond, Silver Lake’s bloom of primarily Microcystis and Dolichospermum proved difficult to capture. The ribbon of high cyanobacteria concentrations was isolated to the northern shores along the state park beach area, extending roughly 20 feet out into the lake at most. As a result, only a small fraction of the total samples collected throughout the lake surpassed the state threshold. During the peak of the first advisory on 6/29/22, collected samples ranged from 1150 to over 3.4 million cells/mL as the bloom was unevenly distributed throughout the lake.
Samples above the state threshold were collected at Keyser Pond, Showell Pond, and Silver Lake in 2022. Keyser Pond and Showell Pond contained very homogenous blooms of primarily Chyrosoporium with some Planktothrix in 2022 which turned the entire waterbodies a greenish-brown color. These blooms decreased water clarity measured with a secchi disk and view scope to less than 1 m. Cell concentrations ranged from 70,256 to 182,640 cells/mL at Keyser Pond, and from 320,400 to 650,250 cells/mL at Showell Pond for the samples collected during the advisories. Additionally, two samples were collected at Keyser Pond from clumps of Planktothrix that had broken off the benthic mat and floated to the surface after the advisory had lifted and ambient water cyanobacteria cell concentrations had fallen beneath the state threshold. These two samples, collected on 8/16/22, contained cell concentrations over 1 and 4 million cells/mL.
In addition to lake-specific analyses, this study identified the need for a more detailed analysis to be completed at a species-specific level. Different species of cyanobacteria alter lakes in varied ways. As seen in Keyser and Showell Ponds, blooms of Chrysosporum turned the entire waterbodies a greenish-brown color. This color alteration extended from the surface down into the water column. Planktothrix appeared in specks within the water column and as free-floating clumps that had detached from the benthic substrate. Microcystis was predominately found at Silver Lake in surface scum isolated to a single section of the waterbody. Often forming early in the morning and mixing with the water column as the wind and solar angle increased. This scum repeatedly came and dissipated quickly which caused the advisory to last for months and be illusive to the UAS.
Additional limitations arose in the image collection and processing phases of the UAS methodology. Due to the battery life of the UAS and not having a motorized boat, smaller lakes were targeted for this study. Lakes without islands or hidden coves were selected to maintain line of sight by the UAS pilot in command and to make canoeing to sites easier. Image processing in Agisoft Metashape proved difficult for waterbodies. Traditionally, tie points are used to properly align overlapping photos. However, the software struggles to identify tie points over a homogenous water’s surface, thus creating holes (
Figure 7). The solution to this problem was to fly the UAS higher to include more of the lake edge in more photos. Although not a perfect solution, this worked well enough for the purposes of this study. This limitation would be a hindering factor for wide lakes or those that are very large. Any in-lake features such as islands, floating docks, moored boats, etc. would help to build tie points over this homogenous surface. This challenge was also stated by many other scientists [
17,
24,
31,
32,
33,
55,
56,
57]. Due to this issue, ten water quality sampling points were not included in the UAS spectral data to water quality parameter analyses because they occurred in the reflectance data “holes.” Another limitation to using UAS for environmental monitoring is caused by the weather. The DJI Matrice 300 RTK could only safely fly on days where the wind speed (including gusts) was less than 8 MPH or 3.6 m/s, and there was no chance of rain in the immediate forecast. The wind proved to be more difficult but was generally at its lowest earlier in the morning. However, this timing was beneficial since it occurred when the sun angle and glint were at their lowest even though some edges of the waterbodies were within shadows from shoreline vegetation on sunny days.
4.3. Relation to Similar Studies
Contrary to other studies, this study involved multiple visits to various lakes with different dominant species of cyanobacteria from May to September in 2022. The revisitation allowed sampling to be conducted during various stages of the bloom cycle, ambient weather conditions, and seasonal changes in other water quality parameters including total suspended solids and emergent/submerged aquatic vegetation cover (not measured). It is difficult to draw usable conclusions from one flight over one lake on one day with only a handful of collected water samples for analysis without this replication.
However, many scientists have attempted to find correlations between chlorophyll-a concentrations and spectral indices collected by a UAS equipped sensor with less rigorous field studies. These studies include correlations to general “algae” [
22,
24,
32], cyanobacteria [
26], and a toxin associated with some species of cyanobacteria, microcystin [
23]. A variety of sensors including a Parrot Sequoia multispectral sensor, a Canon ELPH 110HS, and a modified digital camera were used [
21,
24,
32]. Two studies used the MicaSense RedEdge sensor [
22,
23]. A variety of indices were built for the chlorophyll-a regressions. R
2 values ranged from 0.004 to 0.88 depending on the index used. The most common index was a NDVI or modified blue NDVI (BNDVI). The NDVI_3 and BNDVI_3 regressions to chlorophyll-a in this study produced R
2 values of 0.50 and 0.66 respectively but were not of the most important for the random forest classification. These indices produced R
2 values of 0.15 and 0.16 [
23], 0.51 [
32], 0.70 [
22], 0.77 to 0.87 [
21] and 0.88 [
24]. These regressions were also represented in various forms including linear, logarithmic, and polynomial.
The logarithmic R
2 values found in this study (
Table 7) are comparable to those found in Sharp et al., 2021 [
26]. However, the indices which showed the best correlations for the lakes studied in New Hampshire were not similar to those used in these referenced studies. The NGRDI did not produce a significant linear relationship with chlorophyll-a in Kim et al., 2021, though it produced one of the highest R
2 values for chlorophyll-a concentrations of the most important features for classification in this study [
24]. The difference might be attributed to one being a linear and the other a logarithmic line of best fit, or due to different wavelengths of light used in the equations per the sensor’s capabilities although they are both designed to be NGRDI = (green − red)/(green + red). García-Fernández et al., 2021 used the NGBDI to assess the quality of grape plants for wine production using a UAS equipped RGB sensor [
43]. Although not an aquatic study, the NGBDI was used to assess alterations to growth due to water stress. This index proved to be very important for determining the presence of cyanobacteria associated parameters likely because it uses data from the blue and green portions of the electromagnetic spectrum (
Table 7).
The use of phycocyanin concentration for assessing cyanobacteria blooms is growing in momentum [
27]. This study serves as an additional source for verifying the cyanobacteria cell concentration to phycocyanin concentration relationship in addition to Almuhtaram et al., 2018 and Bertone et al., 2018 [
53,
54]. Few papers have discussed connecting phycocyanin concentration to UAS spectral features [
26,
34,
35]. Sharp et al., 2021 studied a single cyanobacteria bloom in California, USA, over the summer of 2019 [
26]. Sharp and colleagues included four visits to the lake for measurements of chlorophyll-a and phycocyanin paired with the overpass of the Sentinel-3a satellite. One visit corresponded with a small UAS (sUAS) mission over 12 sites in one portion of the lake when the dominant taxa were
Dolichospermum,
Gleotrichia, and
Microcystis, but a CI was not recreated due to the limited wavelengths of the UAS sensor used. The sUAS was only used to map chlorophyll-a concentrations throughout the study area using a band ratio relationship from a previously published paper designed for a different waterbody. Pyo et al., 2018 studied the relationship between phycocyanin and chlorophyll-a to hyperspectral imagery using modeled simulations [
35]. They produced R
2 values between 0.55 and 0.75 for their two and three band ratios when plotted against estimated phycocyanin, and between 0.25 and 0.56 for chlorophyll-a. J. M. Ahn et al., 2021 also used a UAS mounted hyperspectral sensor to form a correlation with phycocyanin concentrations (R
2 = 0.85) using a “generic algorithm” [
34].
Using a multivariate classification algorithm approach rather than only simple regressions allowed the overall success of this study to drastically increase. Although less commonly conducted compared to simple regressions, models for remotely sensed data to water quality data using machine learning algorithms produce high accuracies across the board in many recently published studies. Surface sediment classification using an object-based classification method from UAS multispectral data in tidal flats produced an overall classification accuracy of 72.8% [
58]. Scientists mapping percent cover of emergent vegetation in freshwater waterbodies of California (USA) used the random forest classification algorithm to discern overall accuracies of 82% [
59]. In addition, researchers studying two lakes in China classified general water quality into three classes based on designated uses. Using a convolutional neural network with four convolutional layers, overall accuracies reached 92.5% within their study [
60].
The argument can be made from a public safety standpoint for recreational waters, that knowing if cyanobacteria is classified as above or below regulatory thresholds is the primary goal. Only then would distinguishing between cell concentrations be useful, i.e., 25,000 cells/mL and 45,000 cells/mL. In other words, due to the high accuracy a classification approach produces, stakeholders can know if the waterbody is safe for the designated use. With the random forest algorithm method, this study found very high overall and user’s accuracies from 87.4 to 92.9% for the three water quality parameters with UAS multispectral spectral data.