Spatio-Temporal Variation of Cyanobacteria Blooms in Taihu Lake Using Multiple Remote Sensing Indices and Machine Learning

: In view of the ecological threat posed by cyanobacteria blooms in Taihu Lake (China), this paper presents a study on the area of cyanobacteria extent based on MODIS data using the quantum particle swarm optimization–random forest (QPSO-RF) machine learning algorithm. This paper selects multiple remote sensing input indices that can represent the characteristics of the primary underlying type in Taihu Lake. The proposed method performs best, with an F 1 score of 0.91–0.98. Based on this method, the spatio-temporal variation of cyanobacteria blooms in the Taihu Lake complex was analyzed. During 2010–2022, the average area of cyanobacteria blooms in Taihu Lake increased slightly. Severe-scale cyanobacteria blooms occurred in 2015–2019. Cyanobacteria blooms were normally concentrated from May to November. However, the most prolonged extended duration occurred in 2017, lasting for eight months. Spatially, cyanobacteria blooms were mainly identified in the northwestern part of Taihu Lake, with an average occurrence frequency of about 10.0%. The cyanobacteria blooms often began to grow in the northwestern part of the lake and then spread to the Center of the Lake, and also dissipated earliest in the northwestern part of the lake. Our study is also beneficial for monitoring the growth of cyanobacteria blooms in other similar large lakes in long time series.


Introduction
Cyanobacteria blooms can cause the deterioration of water quality, endanger human health, and destroy the balance of lake ecosystems [1].As the water source for one of the most developed regions in China, the water quality of Taihu Lake has been constantly threatened by frequent cyanobacteria blooms since the 1980s [2].In 2007, the cyanobacteria bloom in Taihu Lake triggered a severe water supply crisis.After more than ten years of sustained high-investment treatment, there is still no significant reduction trend in eutrophication problems, and cyanobacteria blooms still occur with high frequency.Especially in 2017, the cyanobacteria bloom situation was seriously aggravated, which severely challenged the drinking water safety and security of the surrounding cities' residents [3].Some studies have found that meteorological conditions have a considerable influence on the occurrence of cyanobacterial blooms in Taihu Lake at multiple time scales, causing cyanobacteria to occur at different time periods every year [4,5].Before 2007, the largest cyanobacterial bloom in Taihu Lake generally occurred between Jun. and Oct. when temperatures were high [6].However, after 2007, the largest cyanobacterial bloom in Taihu Lake may occur in any month between May and November, as long as water temperatures and habitats are suitable [3].Overall, the severity of cyanobacterial blooms in winter is lower than that in the other three seasons.Therefore, it is necessary to quickly and accurately grasp the occurrence patterns of cyanobacteria blooms, which is significant for monitoring, controlling, and preventing them.
Due to the rapid migration and replication speed of cyanobacteria blooms, in situ measurements are usually not suitable for observing the spatio-temporal variations of cyanobacteria blooms [7].Remote sensing has often been used to monitor the cyanobacteria bloom at a regional scale owing to its high efficiency and broad coverage.In recent years, a series of remote sensing spectral indices were developed from image data to estimate chlorophyll-a (Chla) and recognize cyanobacteria [8][9][10][11][12][13][14][15][16][17].Based on these remote sensing indices, threshold and machine learning methods were used to extract cyanobacteria blooms.The application of the threshold methods is limited by the selection of thresholds determined by subjective/statistical experience.A series of studies have shown that machine-learning techniques can extract cyanobacteria well [18][19][20][21], including decision trees, random forest (RF), support vector machine (SVM), neural networks, and gradient boosting decision trees (GBDTs).As a convenient model, RF benefits greatly from quick speed, high accuracy, and strong stability [22].It is widely used to classify land cover, even the extraction of cyanobacteria [23].
The performance of RF methods is greatly affected by their model parameters, and improper parameter selection can lead to underfitting or increased computation time problems.This problem occurs not only in the classification of cyanobacteria, but also in other applications such as land cover classification and crop classification [24,25].The commonly used parameter-tuning methods include the grid search method and random search method; the former has high accuracy but takes a long time with an extensive range of data, while the latter runs fast but is only suitable for low-dimensional spaces or rough searches [26].Intelligent optimization algorithms can make the random forest method less likely to fall into a local optimum during the search process and find the global optimal solution with great probability.Particle swarm optimization (PSO) is a data mining optimization method that adjusts meaningful features from the input dataset [27].However, PSO algorithms have limitations, such as low convergence rates, and tend to fall into local optimization in high-dimensional spaces.At the same time, quantum-particle swarm optimization (QPSO) can deal with these problems.QPSO has the advantages of a few input factors, strong search ability, and easy implementation [28].It has good performance in solving various optimization problems.In recent decades, QPSO has made great achievements in image research [29][30][31].Some studies use quantum PSO-random forest (QPSO-RF) to predict borehole spontaneous combustion and lung diseases [32,33].Specifically, the RF algorithm is suitable for classification, but improper parameter selection can lead to problems, while the QPSO algorithm can optimize the RF algorithm to find the global optimal solution when training the model, and also avoid premature convergence.The combination of the two can achieve more accurate classification, which may perform well in cyanobacterial bloom monitoring.
Remote sensing spectral indices are generally affected by external environments such as water transparency, cyanobacteria bloom concentration, and atmospheric effects.At the same time, the accuracy of RF is greatly affected by its model parameters, and parameter election is crucial [34].Selecting factors that are inappropriate for the regional environment may affect the accuracy of the extraction, resulting in errors in cyanobacteria bloom identification.To better reveal the spatial and temporal variation of cyanobacteria in Taihu Lake with high accuracy, this paper develops a QPSO-RF method, combines five indices, the Normalized Differential Water Index (NDWI) [35], Water Adjusted Vegetation Index (WAVI) [36], Turbid Water Index (TWI) [7], Floating Algae Index (FAI) [17], and Surface Algal Bloom Index (SABI) [37] as input data, called the Water-Vegetation-Algae (WVA-QPSO-RF) method.This paper considered the turbid water body, aquatic vegetation, and water-floating biomass growth of Taihu Lake when selecting remote sensing input factors, which can present characteristics of the primary underlying type in Taihu Lake.Considering various meteorological, ecological, and environmental conditions in Taihu Lake at the inter-monthly scale, this paper established individual random forest models every month to reduce uncertainty and obtain a model with higher accuracy.Moreover, this method is compared with other methods accordingly.Based on this method, the spatial and temporal variations of cyanobacteria in Taihu Lake are recognized in 2010-2022.Our study is believed to be beneficial for the study of the occurrence and development mechanisms of cyanobacteria blooms in Taihu Lake and similar large lakes worldwide.

Study Area
Taihu Lake is located between 30 • 55 ′ 40 ′′ ~31 • 32 ′ 58 ′′ N and 119 • 52 ′ 32 ′′ ~120 • 36 ′ 10 ′′ E, with a total area of 2427.8 km 2 , a water area of 2338.1 km 2 , and a shoreline of 393.2 km.It is the third-largest freshwater lake in China [38].Taihu Lake has a subtropical monsoon climate, with an average annual temperature of 17 • C, an annual precipitation of 1100-1150 mm, an average annual runoff of 7.5 × 10 10 m 3 , and a storage capacity of 4.4 × 10 10 m 3 .Referring to the sub-district boundary of the Taihu Basin Authority of Ministry of Water Resources, we drew lines to divide Taihu Lake into several sections to help the description of the spatial distribution of the blooms (Figure 1); Zhushan Lake (I), Meiliang Lake (II), Gonghu Lake (III), East Taihu Lake (shaded fill), Center of the Lake, Southern Lake, and Western Lake.In this paper, the study area was focused on the entire lake except for East Taihu Lake, which is a hydrophyte distribution area.Hydrophytes can significantly change reflectance characteristics and interfere with cyanobacteria identification [39][40][41].
(WAVI) [36], Turbid Water Index (TWI) [7], Floating Algae Index (FAI) [17], and Surface Algal Bloom Index (SABI) [37] as input data, called the Water-Vegetation-Algae (WVA-QPSO-RF) method.This paper considered the turbid water body, aquatic vegetation, and water-floating biomass growth of Taihu Lake when selecting remote sensing input factors, which can present characteristics of the primary underlying type in Taihu Lake.Considering various meteorological, ecological, and environmental conditions in Taihu Lake at the inter-monthly scale, this paper established individual random forest models every month to reduce uncertainty and obtain a model with higher accuracy.Moreover, this method is compared with other methods accordingly.Based on this method, the spatial and temporal variations of cyanobacteria in Taihu Lake are recognized in 2010-2022.Our study is believed to be beneficial for the study of the occurrence and development mechanisms of cyanobacteria blooms in Taihu Lake and similar large lakes worldwide.

Study Area
Taihu Lake is located between 30°55′40″~31°32′58″N and 119°52′32″~120°36′10″E, with a total area of 2427.8 km 2 , a water area of 2338.1 km 2 , and a shoreline of 393.2 km.It is the third-largest freshwater lake in China [38].Taihu Lake has a subtropical monsoon climate, with an average annual temperature of 17 °C, an annual precipitation of 1100-1150 mm, an average annual runoff of 7.5 × 10 10 m 3 , and a storage capacity of 4.4 × 10 10 m 3 .Referring to the sub-district boundary of the Taihu Basin Authority of Ministry of Water Resources, we drew lines to divide Taihu Lake into several sections to help the description of the spatial distribution of the blooms (Figure 1); Zhushan Lake (Ⅰ), Meiliang Lake (Ⅱ), Gonghu Lake (Ⅲ), East Taihu Lake (shaded fill), Center of the Lake, Southern Lake, and Western Lake.In this paper, the study area was focused on the entire lake except for East Taihu Lake, which is a hydrophyte distribution area.Hydrophytes can significantly change reflectance characteristics and interfere with cyanobacteria identification [39][40][41].

Data 2.2.1. MODIS Data
The Moderate-Resolution Imaging Spectroradiometer (MODIS), as a low-and mediumresolution satellite, has 36 bands (0.405~14.385 µm) with spatial resolutions of 250 m, 500 m, and 1 km, and a temporal resolution of 1 day [42].The MODIS band information used in this paper is shown in Table 1.From 2010 to 2022, 693 images were manually selected from a total of 4748 images of MYD09 data, considering the need for cloud-free images of the lake area.The selected data are shown in Supplementary Material Table S1.

Sentinel-2 Data
The Sentinel-2 satellite provides high-spatial-and -temporal-resolution multispectral imagery of the world, with 13 bands with resolutions of 10 m, 20 m, and 60 m [43].Considering the relatively high spatial resolution, the Sentinel-2 image was selected as the validation reference for extracting the cyanobacteria-affected water surface from the MODIS data [44].The nearest-neighbor resampling algorithm was used to unify the spatial resolution of band 3, band 4, and band 8 of Sentinel-2 images to synthesize pseudo-color images.Based on visual inspection, we verified whether the MODIS image pixels were clear water bodies or cyanobacteria blooms.In order to make the sample representative and make it available for each month, this paper established individual random forest models every month using the available data.The pseudo-color images of the Sentinel-2 data were used for visual interpretation to verify whether the image pixels were clear water bodies or cyanobacteria blooms.Table 2 presents the Sentinel-2 data and MODIS data we used for building the model.

WVA-QPSO-RF Method
QPSO-RF is an integrated algorithm that selects the best features for classification [33].The QPSO algorithm is a bionic intelligence computing method based on group intelligence by finding the global optimal solution through collaboration and information sharing among individuals in a group [28].First, a subset of candidate features is selected and the RF classifier is iteratively trained to optimize the objective function.Then, the iterative search and training continue to obtain a better objective function value than found from the previously selected feature subset.We selected the appropriate predicted factors for QPSO-RF based on the characteristics of the study area.Models based on RF methods can show better performance when the input factors can represent different types of underlying surface covers, as can be proved in many other applications of RF, such as the downscaling of land surface temperature and the classification of underlying surface covers [45][46][47].
Our method combines five indices: the NDWI can better identify water bodies [35], the WAVI captures aquatic vegetation features [36], the TWI avoids interference from highturbidity waters [7], the FAI allows accurate and comprehensive extraction of cyanobacteria [17], and the SABI detects water-floating biomass [37].These input factors were chosen to fully consider various water conditions in Taihu Lake, thus making the obtained model more applicable to Taihu Lake.To mitigate any concerns about the validity of this approach, the samples used were evenly split (5400 cyanobacteria and 5400 clear water body points in twelve images), and were considered to be of a sufficient number, spatially distributed across all parts of the study area, and temporally distributed across several input scenes over thirteen years.
To train the model, twelve MODIS and Sentinel-2 images were selected separately, representing each month, random points were generated on the low-resolution image (MODIS), and high-resolution pseudo-color images of Sentinel-2 from close moments of the same day were used for visual interpretation judgments to determine the category of image elements at random points.A total of 900 sample points (450 cyanobacteria and 450 clear water body points) of each image were selected evenly by hand from the random points.The corresponding multifactors at the 600 training sample points (300 cyanobacteria and 300 clear water body points) of each image were brought into the model and trained for MODIS.Then, the trained model was used for cyanobacteria extraction in the entire image.Finally, the remaining 300 sample points (150 cyanobacteria and 150 clear water body points) of each image were used as validation data to verify the accuracy of the extraction results.We constructed twelve models that simulated/represented each month.
The random forest model was built with the following equation.
Model = RF_train(NDWI, WAVI, TWI, FAI, SABI, Value) Result = RF_class(NDWI, WAVI, TWI, FAI, SABI, Model) To evaluate our method, the NDVI threshold method, NDWI threshold method, FAI threshold method [9,17,35], RF (NDVI & NDWI) method, Anomalous Behavior Detection-RF (ABD-RF) method [48], WVA-RF method, and WVA-QPSO-RF-A method were used to extract cyanobacteria to compare with our model.The WVA-RF method used the same remote sensing indices as our method, but only used an RF model.The WVA-QPSO-RF-A method used the same remote sensing indices and QPSO-RF method as our method, but the model covered all months.The ABD-RF method used five remote sensing indices, NDWI, NDVI, SABI, FAI, and SA red−NIR , as input data to build the model.

Trend Analysis
This paper used the Theil-Sen method [49] and the Mann-Kendall method [50,51] to explore the inter-annual trends of cyanobacteria blooms.First, the Theil-Sen method was used to calculate trend values and the Mann-Kendall method was used to determine the significance of the trend.The significant increase/decrease, no variation, and slight increase/decrease trends were then obtained.The Theil-Sen method is less influenced by outliers in the study of long-term series variation.However, it cannot achieve serial trend significance judgment, while the advantage of the Mann-Kendall method is that the samples do not need to follow a specific distribution rule and are not influenced by a small number of outliers.The introduction of this method can complete the test for significance of the trend of the series.Combining the two methods provides a robust evasive power against the interference of data errors, providing a solid statistical theoretical basis for the significance level test and making the test results more scientific and reliable [52].

Precision Evaluation Index
In order to evaluate the accuracy of the WVA-QPSO-RF method to extract cyanobacteria, this paper compared it with the threshold methods of NDVI/NDWI/FAI, RF (NDVI & NDWI), ABD-RF, WVA-RF, and WVA-QPSO-RF-A, and the extraction results were evaluated by introducing the confusion matrix to calculate 3 indices: precision, recall, and F1 score.Precision was judged by the prediction results, which is the proportion of correct predictions in samples where the prediction was positive.Recall was judged on the basis of actual samples, which is the proportion of actual positive samples that were predicted correctly as a proportion of the total actual positive samples.F1 score was the harmonic mean of precision and recall.
The formulas for precision, recall, and F1 score are as follows.

Occurrence Frequency of Cyanobacteria Blooms
The occurrence frequency was used to explore the spatial variation patterns of cyanobacteria in Taihu Lake.The occurrence frequency of cyanobacteria blooms was calculated by the following equation.
F represents the occurrence frequency of cyanobacteria blooms in Taihu Lake, d represents the number of days of cyanobacteria blooms in each region of Taihu Lake, and D represents the total number of days counted.The probability of cyanobacteria bloom occurring in each pixel within a year is the inter-annual cyanobacteria bloom occurrence frequency.The probability of cyanobacteria bloom in an image element in a month is the intra-year cyanobacteria bloom occurrence frequency.

Comparison of Cyanobacteria Extraction Methods
In this paper, visual recognition of cyanobacteria based on Sentinel-2 data was used as the reference.The accuracy of the WVA-QPSO-RF method was evaluated by comparing it with the result of the NDVI threshold method, NDWI threshold method, FAI threshold method, RF (NDVI & NDWI) method, ABD-RF method, WVA-RF method, and WVA-QPSO-RF-A method on 3 June 2019 (Figure 2).

F = 𝑑/𝐷 × 100%
(1 F represents the occurrence frequency of cyanobacteria blooms in Taihu Lake, d re resents the number of days of cyanobacteria blooms in each region of Taihu Lake, and represents the total number of days counted.The probability of cyanobacteria bloom o curring in each pixel within a year is the inter-annual cyanobacteria bloom occurren frequency.The probability of cyanobacteria bloom in an image element in a month is t intra-year cyanobacteria bloom occurrence frequency.

Comparison of Cyanobacteria Extraction Methods
In this paper, visual recognition of cyanobacteria based on Sentinel-2 data was used as the reference.The accuracy the WVA-QPSO-RF method was evaluated by comparing it with the result of the NDVI threshold method, NDW threshold method, FAI threshold method, RF (NDVI & NDWI) method, ABD-RF method, WVA-RF method, and WV QPSO-RF-A method on Jun.
3, 2019 (Figure  It can be found that the eight methods differed significantly in the classification of Zhushan Lake (I), Southern Lake, Western Lake, and the Center of the Lake (in color frames).The NDVI, NDWI, and FAI threshold methods all failed to identify some slight cyanobacterial blooms (in purple, yellow, and grey frames).The RF (NDVI & NDWI) method identified some water as cyanobacteria (in grey, red, yellow, and purple color frames).The ABD method identified some water as cyanobacteria (in grey and red frames), and also identified some slight cyanobacteria as water (in purple frames).The WVA-RF method identified some cyanobacteria as water (in grey and yellow frames).The WVA-QPSO-RF-A method identified some water as cyanobacteria (in five color frames).Compared with the other seven methods, the WVA-QPSO-RF method can better identify clear water and cyanobacteria in Taihu Lake, and it can be seen that not only were the severe cyanobacteria extracted accurately, but the slight clear cyanobacteria in the Center of the Lake were also identified.
To further demonstrate the performance of the method in this paper, evaluations were conducted in each month.Table 3 shows the accuracy of the eight methods.Overall, the accuracy of the three RF methods was higher than the three threshold methods.The WVA-QPSO-RF method had the highest F1 score of 0.91-0.98,indicating that the accuracy of the WVA-QPSO-RF method was high.Therefore, the extraction of cyanobacteria using the WVA-QPSO-RF method is better than other methods.Based on the WVA-QPSO-RF method, the cyanobacteria in Taihu Lake were extracted during 2010-2022.The temporal evolution of cyanobacteria coverage in the lake study area calculated from all cloud-free MODIS images with the WVA-QPSO-RF model is shown in Figure 3.The trend line shows that the coverage of cyanobacteria blooms increased slightly during 2010-2022.The coverage of cyanobacteria bloom decreased from 2010 to 2011, generally increased during 2011-2017, slightly decreased, and was more stable from 2020 to 2022, returning to the levels of a decade ago.In 2017, the coverage of cyanobacteria blooms was broader than those existing during the other years.The coverage of cyanobacteria blooms was relatively small in 2011.The broadest coverage of cyanobacteria blooms occurred in May 2017, at about 62.3%.Based on the WVA-QPSO-RF method, the cyanobacteria in Taihu Lake were extracted during 2010-2022.The temporal evolution of cyanobacteria coverage in the lake study area calculated from all cloud-free MODIS images with the WVA-QPSO-RF model is shown in Figure 3.The trend line shows that the coverage of cyanobacteria blooms increased slightly during 2010-2022.The coverage of cyanobacteria bloom decreased from 2010 to 2011, generally increased during 2011-2017, slightly decreased, and was more stable from 2020 to 2022, returning to the levels of a decade ago.In 2017, the coverage of cyanobacteria blooms was broader than those existing during the other years.The coverage of cyanobacteria blooms was relatively small in 2011.The broadest coverage of cyanobacteria blooms occurred in May 2017, at about 62.3%.

Changes in the Spatial Areas of Cyanobacteria
As shown in Figure 4, from December to April, the average monthly coverage of cyanobacteria in Taihu Lake was low, below 5.0%.Then, the average coverage of cyanobacteria increased, especially from April to May, and rose significantly faster, reaching a peak of 9.7% in Jun.From May to November, the average and maximum monthly coverage of cyanobacteria in Taihu Lake remained high.It then gradually decreased in December.

Changes in the Spatial Areas of Cyanobacteria
As shown in Figure 4, from December to April, the average monthly coverage of cyanobacteria in Taihu Lake was low, below 5.0%.Then, the average coverage of cyanobacteria increased, especially from April to May, and rose significantly faster, reaching a peak of 9.7% in Jun.From May to November, the average and maximum monthly coverage of cyanobacteria in Taihu Lake remained high.It then gradually decreased in December.
As shown in Figure 5, during 2010-2014, the average annual coverage of cyanobacteria in Taihu Lake was below 5.0%, then it increased, and by 2017, it peaked at 11.1%.Later, the average annual coverage of cyanobacteria declined and was below 5.0% by 2020-2022.The maximum coverage of cyanobacteria occurred in 2017, at 62.3%.As shown in Figure 5, during 2010-2014, the average annual coverage of cyanobacteria in Taihu Lake was below 5.0%, then it increased, and by 2017, it peaked at 11.1%.Later, the average annual coverage of cyanobacteria declined and was below 5.0% by 2020-2022.The maximum coverage of cyanobacteria occurred in 2017, at 62.3%.

Changes in the Occurrence Frequency of Cyanobacteria Blooms
The scales of cyanobacteria blooms in Taihu Lake were graded with reference to Liu [53].There four different scales of cyanobacteria blooms in Taihu Lake: small-scale cyanobacteria blooms, medium-scale cyanobacteria blooms, large-scale cyanobacteria blooms, and severe large-scale cyanobacteria blooms.The available data from 2010 to 2022 were counted to determine the days each cyanobacteria bloom scale occurred.The number of days per year that various scales of cyanobacteria occurred in Taihu Lake was counted.The occurrence frequency of each Taihu Lake cyanobacteria bloom scale in each year was calculated to obtain Figure 6.
A total of 693 days of cyanobacteria area results were selected, with various scales of large-scale cyanobacteria blooms accounting for 63.4%, 29.2%, 6.5%, and 1.0%.From 2010  As shown in Figure 5, during 2010-2014, the average annual coverage of cyanobacteria in Taihu Lake was below 5.0%, then it increased, and by 2017, it peaked at 11.1%.Later, the average annual coverage of cyanobacteria declined and was below 5.0% by 2020-2022.The maximum coverage of cyanobacteria occurred in 2017, at 62.3%.

Changes in the Occurrence Frequency of Cyanobacteria Blooms
The scales of cyanobacteria blooms in Taihu Lake were graded with reference to Liu [53].There are four different scales of cyanobacteria blooms in Taihu Lake: small-scale cyanobacteria blooms, medium-scale cyanobacteria blooms, large-scale cyanobacteria blooms, and severe large-scale cyanobacteria blooms.The available data from 2010 to 2022 were counted to determine the days each cyanobacteria bloom scale occurred.The number of days per year that various scales of cyanobacteria occurred in Taihu Lake was counted.The occurrence frequency of each Taihu Lake cyanobacteria bloom scale in each year was calculated to obtain Figure 6.
A total of 693 days of cyanobacteria area results were selected, with various scales of large-scale cyanobacteria blooms accounting for 63.4%, 29.2%, 6.5%, and 1.0%.From 2010

Changes in the Occurrence Frequency of Cyanobacteria Blooms
The scales of cyanobacteria blooms in Taihu Lake were graded with reference to Liu [53].There are four different scales of cyanobacteria blooms in Taihu Lake: smallscale cyanobacteria blooms, medium-scale cyanobacteria blooms, large-scale cyanobacteria blooms, and severe large-scale cyanobacteria blooms.The available data from 2010 to 2022 were counted to determine the days each cyanobacteria bloom scale occurred.The number of days per year that various scales of cyanobacteria occurred in Taihu Lake was counted.The occurrence frequency of each Taihu Lake cyanobacteria bloom scale in each year was calculated to obtain Figure 6.
A total of 693 days of cyanobacteria area results were selected, with various scales of large-scale cyanobacteria blooms accounting for 63.4%, 29.2%, 6.5%, and 1.0%.From 2010 to 2015, small-scale cyanobacteria blooms mainly occurred and no severe-scale cyanobacteria blooms occurred.Severe-scale cyanobacteria blooms began in 2016 and also occurred in 2017 and 2019.As seen from Figure 7, 2017 was the year with the most significant cyanobacteria cover and the highest proportion of large-scale and severely large-scale blooms, consistent with the actual situation [3,14].Although there was no severe-scale cyanobacteria bloom in 2018, the proportion of medium-scale cyanobacteria blooms was high.Then, the scale of cyanobacteria blooms decreased in 2020-2022.
nobacteria cover and the highest proportion of large-scale and severely large-scale blooms, consistent with the actual situation [3,14].Although there was no severe-scale cyanobacteria bloom in 2018, the proportion of medium-scale cyanobacteria blooms was high.Then, the scale of cyanobacteria blooms decreased in 2020-2022.
The duration of large-scale cyanobacteria blooms in Taihu Lake from 2010 to 2022 is shown in Figure 7.It can be seen that most of the large-scale cyanobacteria blooms in Taihu Lake from 2010 to 2022 started in April and May, and ended in November and December.The shortest blooms occurred in 2011 and 2021, which lasted one month.The most prolonged extended duration occurred in 2017, which lasted eight months.nobacteria cover and the highest proportion of large-scale and severely large-scale blooms, consistent with the actual situation [3,14].Although there was no severe-scale cyanobacteria bloom in 2018, the proportion of medium-scale cyanobacteria blooms was high.Then, the scale of cyanobacteria blooms decreased in 2020-2022.

Spatial Variation Patterns
The duration of large-scale cyanobacteria blooms in Taihu Lake from 2010 to 2022 is shown in Figure 7.It can be seen that most of the large-scale cyanobacteria blooms in Taihu Lake from 2010 to 2022 started in April and May, and ended in November and December.The shortest blooms occurred in 2011 and 2021, which lasted one month.The most prolonged extended duration occurred in 2017, which lasted eight months.The duration of large-scale cyanobacteria blooms in Taihu Lake from 2010 to 2022 is shown in Figure 7.It can be seen that most of the large-scale cyanobacteria blooms in Taihu Lake from 2010 to 2022 started in April and May, and ended in November and December.The shortest blooms occurred in 2011 and 2021, which lasted one month.The most prolonged extended duration occurred in 2017, which lasted eight months.

1.
Spatial distribution of average cyanobacteria blooms occurrence frequency from 2010 to 2022 As shown in Figure 8, cyanobacteria blooms are relatively common in the Western Lake and the coastal areas of Taihu Lake from 2010 to 2022.The average occurrence frequency of cyanobacteria blooms in the northwestern part of Taihu Lake was high, at about 10.0%, exceeding 20.0% in some places.Moreover, the occurrence frequency of cyanobacteria blooms in Zhushan Lake (I), Meiliang Lake (II), and the coastal areas of Gonghu Lake (III) was high, with an average occurrence frequency of more than 5.0%.The occurrence of cyanobacteria blooms in the Center of the Lake and Southern Lake was relatively slight, with an occurrence frequency of almost zero.As shown in Figure 8, cyanobacteria blooms are relatively common in the Western Lake and the coastal areas of Taihu Lake from 2010 to 2022.The average occurrence frequency of cyanobacteria blooms in the northwestern part of Taihu Lake was high, at about 10.0%, exceeding 20.0% in some places.Moreover, the occurrence frequency of cyanobacteria blooms in Zhushan Lake (I), Meiliang Lake (II), and the coastal areas of Gonghu Lake (III) was high, with an average occurrence frequency of more than 5.0%.The occurrence of cyanobacteria blooms in the Center of the Lake and Southern Lake was relatively slight, with an occurrence frequency of almost zero.In order to further reveal the spatial variation of cyanobacteria in Taihu Lake during 2010-2022, a combination of the Theil-Sen method and the Mann-Kendall method were used to investigate the spatial variation of cyanobacteria in Taihu Lake.The results were classified into five types of changes with reference to previous studies [52].According to the previous study in this paper, the trend in two phases was analyzed.As shown in Figure 9a, during 2010-2017, there were more areas showing a mild increase in cyanobacteria blooms in Taihu Lake, concentrated in Zhushan Lake (I), Meiliang Lake (II), and Western Lake.There were few areas of mild decrease in cyanobacteria blooms in the coastal area of Gonghu Lake (III).As shown in Figure 9b, during 2017-2022, there were more areas showing a mild decrease in cyanobacteria blooms in Taihu Lake, especially in the Western Lake and Zhushan Lake (I), which shows that Taihu Lake's eutrophication has improved in these six years.In order to further reveal the spatial variation of cyanobacteria in Taihu Lake during 2010-2022, a combination of the Theil-Sen method and the Mann-Kendall method were used to investigate the spatial variation of cyanobacteria in Taihu Lake.The results were classified into five types of changes with reference to previous studies [52].According to the previous study in this paper, the trend in two phases was analyzed.As shown in Figure 9a, during 2010-2017, there were more areas showing a mild increase in cyanobacteria blooms in Taihu Lake, concentrated in Zhushan Lake (I), Meiliang Lake (II), and Western Lake.There were few areas of mild decrease in cyanobacteria blooms in the coastal area of Gonghu Lake (III).As shown in Figure 9b, during 2017-2022, there were more areas showing a mild decrease in cyanobacteria blooms in Taihu Lake, especially in the Western Lake and Zhushan Lake (I), which shows that Taihu Lake's eutrophication has improved in these six years.As shown in Figure 10, cyanobacteria blooms are frequent in the coastal areas of Taihu Lake every year.The occurrence frequency of cyanobacteria blooms in Zhushan Lake (I) and Western Lake was high, sometimes exceeding 40.0%.In 2015-2019, Taihu Lake had more severe cyanobacteria bloom coverage.In 2015 and 2016, cyanobacteria blooms occurred in most areas of Taihu Lake, and then in 2017, the most widespread year

2.
Changes in the spatial distribution of cyanobacteria bloom occurrence frequency during 2010-2022 As shown in Figure 10, cyanobacteria blooms are frequent in the coastal areas of Taihu Lake every year.The occurrence frequency of cyanobacteria blooms in Zhushan Lake (I) and Western Lake was high, sometimes exceeding 40.0%.In 2015-2019, Taihu Lake had more severe cyanobacteria bloom coverage.In 2015 and 2016, cyanobacteria blooms occurred in most areas of Taihu Lake, and then in 2017, the most widespread year of cyanobacteria blooms in Taihu Lake occurred, covering even whole areas.In 2020-2022, cyanobacteria blooms in the Center of the Lake and the three northern arms of the lake reduced, especially in the coastal areas of Gonghu Lake (III), which had been severe most years before.

Changes in the spatial distribution of cyanobacteria bloom occurrence frequency during the year
As shown in Figure 11, the occurrence frequency of cyanobacteria was low throughout the lake from December to April.In May, cyanobacteria blooms began growing in the Western Lake, and three northern arms of the lake then spread to the Center of the Lake.From May to November, the occurrence frequency of cyanobacteria blooms was high, and sometimes cyanobacteria blooms covered the whole area.In October, the cyanobacteria bloom in the northwestern part of the lake began to dissipate.During the year, the Western Lake showed a high occurrence frequency of cyanobacteria blooms, while the Southern Lake had a relatively low occurrence frequency, and the area with the highest occurrence

3.
Changes in the spatial distribution of cyanobacteria bloom occurrence frequency during the year As shown in Figure 11, the occurrence frequency of cyanobacteria was low throughout the lake from December to April.In May, cyanobacteria blooms began growing in the Western Lake, and three northern arms of the lake then spread to the Center of the Lake.From May to November, the occurrence frequency of cyanobacteria blooms was high, and sometimes cyanobacteria blooms covered the whole area.In October, the cyanobacteria bloom in the northwestern part of the lake began to dissipate.During the year, the Western Lake showed a high occurrence frequency of cyanobacteria blooms, while the Southern Lake had a relatively low occurrence frequency, and the area with the highest occurrence frequency was in Zhushan Lake (I).

Analysis of Cyanobacteria Bloom during 2010-2020
During 2010-2020, the frequency and spread of cyanobacterial blooms have increased.In terms of time, the duration of large-scale blooms has increased, and in some years, they have even occurred during the winter, such as in 2013 and 2017 [6].In terms of location, the blooms can be found in most areas of Taihu Lake.Since 2015, a large number of lakes in the world have experienced the phenomenon of large-scale and high-intensity growth of cyanobacteria [54].A possible reason is that variations in meteorological conditions are rapid, which further cause changes in water temperature and thus affect the occurrence of cyanobacterial blooms [4,5].Specifically, the intensity of blooms in Taihu Lake is positively correlated with the average daily water temperature in winter and early spring, as well as the effective cumulative temperature in the same period [55].In 2017, there was an abnormally high water temperature during winter and early spring which contributed to the high intensity of the blooms, and it was the year when blooms covered almost the entire lake.

Analysis of Cyanobacteria Bloom during 2010-2020
During 2010-2020, the frequency and spread of cyanobacterial blooms have increased.In terms of time, the duration of large-scale blooms has increased, and in some years, they have even occurred during the winter, such as in 2013 and 2017 [6].In terms of location, the blooms can be found in most areas of Taihu Lake.Since 2015, a large number of lakes in the world have experienced the phenomenon of large-scale and high-intensity growth of cyanobacteria [54].A possible reason is that variations in meteorological conditions are rapid, which further cause changes in water temperature and thus affect the occurrence of cyanobacterial blooms [4,5].Specifically, the intensity of blooms in Taihu Lake is positively correlated with the average daily water temperature in winter and early spring, as well as the effective cumulative temperature in the same period [55].In 2017, there was an abnormally high water temperature during winter and early spring which contributed to the high intensity of the blooms, and it was the year when blooms covered almost the entire lake.
During 2010-2020, it can be found that cyanobacteria blooms were more severe in the northwestern part of Taihu Lake.The northwestern part of Taihu Lake encompasses the Yixing, Wujin, and Wuxi city areas, which are economically developed, densely distributed with industries, and highly intensified with agriculture.A great deal of industrial wastewater, domestic wastewater, and agricultural pollution is discharged into Taihu Lake from these areas [56].These areas have higher concentrations of nutrients, such as nitrogen and phosphorus in Taihu Lake, serious eutrophication problems, and vigorous algal growth [57,58].

Applicability of WVA-QPSO-RF Method in Other Lakes
This study aimed to extract cyanobacteria blooms, and the five input factors of NDWI, TWI, FAI, WAVI, and SABI were selected to train our model.NDWI is suitable for clear water, TWI is for turbid water, FAI is used to extract cyanobacteria, WAVI is used to detect aquatic vegetation, and SABI detects water-floating biomass.Waters in Taihu Lake are consistently turbid all year [59], and when large amounts of suspended sediment appear, they may be identified as cyanobacteria, so the TWI index was added to avoid interference in such high-turbidity waters.In Taihu Lake, there is more aquatic vegetation in the Southern Lake, with lower turbidity and few cyanobacteria, and aquatic vegetation can easily be misidentified as cyanobacteria [60].Using the NDWI, FAI, and SABI indexes can effectively distinguish lake water and cyanobacteria, but they are not effective in distinguishing cyanobacteria and aquatic vegetations [8], so the WAVI index was added to monitor aquatic vegetation.The model in this paper was proposed for the shallow eutrophic lake Taihu Lake, and is also suitable for lakes with similar environments, such as Chaohu Lake [61] However, whether it applies to other types of cyanobacteria-growing lakes remains to be discussed.
For lakes with different underlying surface covers, certain input factors need to be replaced according to the reality of the lakes.For example, the TWI index should not be used in deep lakes or lakes less affected by turbidity [7], such as Lake Erie (in the United States and Canada), Erhai Lake (in southwestern China), and Fuxian Lake (in southwestern China) [53,62,63].For lakes with macrophytes and cyanobacteria, training samples were needed to be added to distinguish them.When the existing indices in our method could not characterize the macrophytes in the lake, other corresponding remote sensing indices and training samples were needed.For example, Cuiping Lake (in northern China) contains cyanobacteria and much submerged vegetation [64].Since the existing indices in our model cannot identify submerged vegetation, the submerged vegetation index (SAVI) can be added to our model as an input factor [65].Then, samples of submerged vegetation can be added to reconstruct the model.
Furthermore, some lakes have high water inflow and significant water changes, and the large discharge takes away more cyanobacteria and is not conducive to the growth of cyanobacteria.Usually, large cyanobacteria blooms are rare in these lakes, such as Schlachtensee (in northeastern Germany), Onondaga Lake (in the northeastern United States), Poyang Lake (in eastern China), and Dongting Lake (in central China) [65][66][67].Since the FAI index is used to detect surface cyanobacteria scum, it may not be applicable if the cyanobacteria concentration is so low that it cannot form surface scum and mixes in the water column [18].However, the concentration of cyanobacteria that cannot form surface scum is difficult to determine, and the applicability of our model to such lakes needs further study.

Figure 1 .
Figure 1.Location of study area.

Figure 1 .
Figure 1.Location of study area.

Figure 3 .
Figure 3. Temporal variation of the cyanobacteria areas of Taihu Lake during 2010-2022 using WVA-QPSO-RF model, red line is the trend line.

Figure 3 .
Figure 3. Temporal variation of the cyanobacteria areas of Taihu Lake during 2010-2022 using WVA-QPSO-RF model, red line is the trend line.

Figure 4 .
Figure 4. Monthly coverage of cyanobacteria blooms in Taihu Lake (x-axis caption-month in year).

Figure 5 .
Figure 5. Annual coverage of cyanobacteria blooms in Taihu Lake.

Figure 4 .
Figure 4. Monthly coverage of cyanobacteria blooms in Taihu Lake (x-axis caption-month in year).

Figure 4 .
Figure 4. Monthly coverage of cyanobacteria blooms in Taihu Lake (x-axis caption-month in year).

Figure 5 .
Figure 5. Annual coverage of cyanobacteria blooms in Taihu Lake.

Figure 5 .
Figure 5. Annual coverage of cyanobacteria blooms in Taihu Lake.

Figure 6 .
Figure 6.Proportion of various scales of cyanobacteria blooms in Taihu Lake from 2010 to 2022.

Figure 7 .
Figure 7.The duration of the large-scale cyanobacteria blooms in Taihu Lake.

Figure 6 .
Figure 6.Proportion of various scales of cyanobacteria blooms in Taihu Lake from 2010 to 2022.

Figure 6 .
Figure 6.Proportion of various scales of cyanobacteria blooms in Taihu Lake from 2010 to 2022.

Figure 7 .
Figure 7.The duration of the large-scale cyanobacteria blooms in Taihu Lake.

Figure 7 .
Figure 7.The duration of the large-scale cyanobacteria blooms in Taihu Lake.

20 1.
Remote Sens. 2024, 16, x FOR PEER REVIEW 12 of Spatial distribution of average cyanobacteria blooms occurrence frequency from 2010 to 2022

Figure 8 .
Figure 8. Spatial map of average cyanobacteria bloom occurrence frequency in Taihu Lake from 2010 to 2022.

Figure 8 .
Figure 8. Spatial map of average cyanobacteria bloom occurrence frequency in Taihu Lake from 2010 to 2022.

20 Figure 9 .
Figure 9.The trend change distribution of cyanobacteria blooms in Taihu Lake during 2010-2017 (a) and 2017-2022 (b). 2. Changes in the spatial distribution of cyanobacteria bloom occurrence frequency during 2010-2022

20 Figure 11 .
Figure 11.Spatial distributions of the occurrence frequency of cyanobacteria blooms in Taihu Lake during the year.

Figure 11 .
Figure 11.Spatial distributions of the occurrence frequency of cyanobacteria blooms in Taihu Lake during the year.

Table 1 .
Subset of bands of MODIS data used in this study.

Table 2 .
Usage date of Sentinel-2 data and MODIS data.

Table 3 .
Accuracy evaluation of cyanobacteria extraction results based on MODIS data using eight methods.