A Meta-Analysis on Harmful Algal Bloom (HAB) Detection and Monitoring: A Remote Sensing Perspective

: Algae serves as a food source for a wide range of aquatic species; however, a high concentration of inorganic nutrients under favorable conditions can result in the development of harmful algal blooms (HABs). Many studies have addressed HAB detection and monitoring; however, no global scale meta-analysis has speciﬁcally explored remote sensing-based HAB monitoring. Therefore, this manuscript elucidates and visualizes spatiotemporal trends in HAB detection and monitoring using remote sensing methods and discusses future insights through a meta-analysis of 420 journal articles. The results indicate an increase in the quantity of published articles which have facilitated the analysis of sensors, software, and HAB proxy estimation methods. The comparison across multiple studies highlighted the need for a standardized reporting method for HAB proxy estimation. Research gaps include: (1) atmospheric correction methods, particularly for turbid waters, (2) the use of analytical-based models, (3) the application of machine learning algorithms, (4) the generation of harmonized virtual constellation and data fusion for increased spatial and temporal resolutions, and (5) the use of cloud-computing platforms for large scale HAB detection and monitoring. The planned hyperspectral satellites will aid in ﬁlling these gaps to some extent. Overall, this review provides a snapshot of spatiotemporal trends in HAB monitoring to assist in decision making for future studies.


Introduction
Algae serves as a food source for a wide range of aquatic species; however, nutrient abundance under favorable conditions can result in the development of harmful algal blooms (HABs) [1][2][3][4][5]. The causes of the increased nutrient abundance which encourages HAB development, are mostly associated with the effects of human disturbance [3,6,7]. These include chemicals from agriculture, sewage, and urban run-off [8][9][10][11]. In addition to nutrient availability, the HAB's growth rate is dependent upon multiple factors [2,4], such as water temperature [3,12], upwelling, [13] and wind-induced mixing [14,15]. The usual time span of an algal bloom can range from days to several months although individual phytoplankton cell life is only a few days [16].
HABs are generally comprised of various phytoplankton species, such as cyanobacteria, diatoms, and dinoflagellates [1,17]. Generally, HABs are classified as toxic or nontoxic [18], though both have the potential to cause harm; dinoflagellates account for 75% of toxic HAB species [3]. Toxic HABs can have adverse effects on coastal and marine ecosystems and freshwater reservoirs [19,20]. These can enter food webs and cause seafood and shellfish poisoning [21,22] or contaminate freshwater supplies [1,23]. Unfortunately, a bloom does not have to produce toxins to be harmful [24] as nontoxic HABs can cause for turbid waters, there is also the presence of CDOM and anthropogenic particles [76].
Chlorophyll-a is an important indicator of the trophic state of water, which is used to classify water bodies from oligotrophic to eutrophic based on the amount of biological productivity. For oligotrophic waters, the sun-induced fluorescence peak for chlorophyll-a is around 680 nm [77,78], whereas for eutrophic waters the fluorescence peaks shift to 710 nm and the absorption peak is around 665 nm [79].
Remotely sensed data can be collected through multiple platforms. Ground-based data collection includes handheld spectrometers such as the commonly used analytical spectral devices (ASD) field spectrometers [49,80,81]. These data can act as both calibration and validation for HAB proxy estimation methods. Due to the limited spatiotemporal coverage of handheld devices, studies have focused on airborne and space-borne platforms. Many HAB monitoring studies use space-borne imagery acquired from commonly used sensors, such as the sea-viewing wide field-of-view sensor (SeaWiFS), medium resolution imaging spectrometer (MERIS), and moderate resolution imaging spectroradiometer (MODIS). However, there are challenges with this type of imagery such as insufficient spatial resolution and bias resulting from atmospheric correction [82,83]. Although some commercial satellites capture data with high spatial resolution, their application is often not cost-effective [66,84]. Recently, there has been an increasing trend in the use of airborne data, such as unmanned aerial vehicle (UAV) data due to their low cost, flexibility, and high data quality. UAV sensors can be deployed quickly in response to HAB outbreaks [83,85], facilitating concurrent water chemical sampling, which can be difficult to align with satellite overpass due to cloud cover. Although UAV data are not affected by the atmosphere as compared to satellite data, the platform is impractical for large regions of interest. For continuous HAB monitoring of larger areas, geostationary satellites, such as the Geostationary Ocean Color Imager (GOCI) are being used [86]. In cases of intense HAB outbreak, researchers have also used synthetic aperture radar (SAR) data with multispectral satellite data [87,88]. The readers are directed towards the published literature [76] for details of commonly used airborne, space-borne, and UAV platforms and on-board sensors for monitoring HABs.
A growing number of HAB studies in freshwater [79,89,90], coastal [61,70,91], and open waters [92][93][94] are being reported across the globe. Some of these countries include China [51,95,96], USA [53,61,97], Japan [98], (pp. 1998-2001), Australia [80], Brazil [99], Canada [100,101], India [102,103], Italy [104] and Korea [62]. Furthermore, earth observation data is also being used to develop operational HAB-monitoring programs. An example of such a program is "Cyanobacteria Assessment Network (CyAN)" [105] which uses data from the Sentinel-3 Ocean and Land Color Imager (OLCI) for monitoring water quality effectively and generating alerts. This application is efficiently and effectively used to monitor over 2000 freshwater bodies across the USA. Similarly, another project "Applied Simulations and Integrated Modelling for the Understanding of Harmful Algal Blooms (Azimuth)" is used for Atlantic Europe [106]. Azimuth focuses on predicting HAB transport by combining current marine conditions and satellite imagery. For the waters of Florida throughout the Gulf of Mexico, another tool known as HABscope [107] is used. This uses smart phones in the field to collect videos that are then processed using neural networks to identify red tide. For global monitoring of cyanobacterial blooms an application, "CyanoTRACKER" was built using a cloud-based platform [108]. Another cloud-based platform, Google Earth Engine (GEE) was used to develop "AlgaeMAP" [109] to monitor algal blooms in Latin America using Sentinel-2 data. In addition, several review articles have been published providing a consolidated summary on the use of remote sensing for water quality parameter estimation, including chlorophyll-a, turbidity, secchi disk depth (SDD), and water temperature. Gholizadeh [76] focused on multiple water quality parameters and covered the airborne and space-borne remote sensing satellites extensively. Other have published location-based review papers on water quality parameters, for example, studies focused on China [110,111], Africa [112], or inland waters only [74]. Matthews published a review paper on regression-based HAB proxy estimation for inland and coastal waters [113] and Topp [114] published a meta-review highlighting the research trends for inland water quality using remote sensing. HAB-monitoring literature includes reviews on ocean-color methods [115,116], cyanobacterial blooms [47,68,117,118], a potential framework for monitoring HABs [119] and the role of UAV for monitoring HABs [83,85]. However, there is not yet a meta-review that utilizes statistical tools to summarize studies that detect and monitor HABs using earth observation data. This review aims to help researchers in this field make informed decisions regarding the selection of a suitable method for detecting and monitoring HABs according to their specific objectives. The present article elucidates and visualizes overall trends in HAB detection and monitoring along with highlighting future insights. The scope of this paper covers all the HABs' proxies, including chlorophyll-a and phycocyanin-based estimation. We conducted a meta-analysis of 420 journal articles to explore the spatiotemporal trends in published articles, the distribution of journals for published articles, temporal trends of the articles, HABs' proxies, validation methods, ancillary data and software for commonly used sensors, as well as the relation between study type and data resolution. The limitations and implications of the studies are discussed followed by an outlook highlighting knowledge gaps and future directions.

Materials and Methods
Two databases, Web of Science and Scopus, were used to find papers published up to 31 December 2020, using a carefully constructed search query ("A" AND "B" AND "C") ( Table 1). The article selection used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses-PRISMA [120], the details of which are given in Figure 1. The keywords in the first and second column of Table 1 were searched within the title field, while the third column was searched as a topic. The initial search resulted in a total of 1292 journal articles, which was reduced to 717 after removing duplicates. It is important to mention here that filters were applied on both databases to select journal articles published in English. Furthermore, only original articles were selected hence no review article was considered for the analysis. After that, articles were manually screened based on review of the title and abstract to select articles related to monitoring HABs using remotely sensed imagery. To keep the scope focused and workload manageable, articles related to phytoplankton size class, primary productivity, and phytoplankton structural types were excluded. The final result was a selection of 438 articles for full length screening. The following attributes were extracted: first author, journal, publication date, study area, spatial coverage, temporal coverage, sensor name and characteristics, HABs' proxy and estimation methods, preprocessing details and software used, and the reported accuracy. Based on the above-mentioned attributes, a total of 420 journal articles were included in this meta-analysis.

Results
This section articulates the findings of the detailed analysis of the 420 selected articles to derive the spatiotemporal trends in the attributes mentioned in the previous section. We start with outlining the trends in the general publication characteristics such as the number of articles published, journals included and the keywords, and then the review focuses on trends in HABs' proxy, sensor characteristics, processing and software used. Figure 2 shows the number of articles published in 2-year brackets and their cumulative distribution. This figure illustrates the exponentially increasing trend in HAB-based monitoring studies over the past decade. Based on the selected pool of 420 papers, the first study that used remotely sensed imagery for monitoring HABs dates back to 1974. There were relatively few studies from 1974 to the beginning years of the 21st century, partially due to the lack of freely available remotely sensed imagery [121]. In later years, it was observable that the number of studies using multiyear data surpassed those using single-year data. By using multiple satellite images, the monitoring, and extraction of the temporal trends of HABs was made easier as compared to using ground-based sampling solely. The number of published journal articles increased each year in the last decade, except for in 2016. The published articles considered in this meta-analysis were sourced from 117 journals across a wide range of disciplines that highlight the diversity of research backgrounds relevant to HAB detection and monitoring. Journals that published more than six articles and contributed more than 1.5% of the total articles are represented in Figure 3. The greatest contribution was from the first three journals, all of which are related to the field of remote sensing. The top contributor was the International Journal of Remote Sensing (12.2%) followed by Remote Sensing of Environment (10.8%) and Remote Sensing (9.8%). It is noteworthy that the top 13 journals accounted for 54.6% of the total article count. As seen from Figure 3, the disciplines of the contributing journals are diverse, including topics related to remote sensing, marine ecosystems, geophysical research, oceanography, and biology.

General Characteristics of Published Articles
Not only has the quantity of articles increased, but the quality has also increased as can be seen in Table 2. The top articles based on normalized citations are presented along with author name, year of publication, journal name, journal's impact factor and the article's total citations. It can be seen that 4 out of 10 articles are from Remote Sensing of the Environment with the highest impact factor of 10.164. It is also interesting to note that the top article was only published a year ago and five articles were published in the last five years. All the journals are well known in their respective fields which indicates good quality.  It is interesting to note that despite the diversity in the journals' genres, the geographical distribution of HAB studies is not similarly diverse. Almost one-third of the studies are conducted in open waters as shown in the pie chart presented in Figure 4. Studies from only 43 countries were identified in the database, among those, United States of America (USA) (17.6%) and China (16.2%) have the largest contribution. Studies based in China, USA and the oceanic region, account for almost 59% of the total, leaving only 41% to be divided among 41 other contributing countries. As depicted in Figure 4, the highest number of studies on a continental scale are from Asia (34.28%) followed by North America (20.23%) and Europe (10.48%) with a lower concentration from studies in South America (4.76%), Africa (1.9%) and Australia (1.67%) contributing less. Only two studies covered areas from around the globe. The keywords in the published literature also emphasize the research diversity of HAB monitoring-related studies as observed in Figure 5. For this purpose a text mining analysis was used to draw a network diagram showing the co-occurrence of various keywords using VosViewer [127,128]. The color of the keyword determines which cluster it belongs to, and the width of the lines depicts the strength link between two keywords. Out of 2669 keywords, only those that had fifteen or more occurrences were considered, reducing the total number to 81 keywords only. The diversity in keywords is an indication of the various applications linked with HAB monitoring and also of the diverse research approaches applied to the same problem. The size of the circle represents the number of occurrences of a keyword, and, according to Figure 5, indicates that the most frequently used keywords are "chlorophyll", "remote sensing", "satellite imagery" and "phytoplankton". The different clusters represent different study approaches. For example, "lakes" and "oceanography" belongs to different clusters. Hence, the strength of the link between "lakes" and "turbidity" is stronger as compared to the link between "ocean color" and "turbidity". Other such instances can also be determined from this network diagram to better understand the occurrences and link strength for each keyword.

Methods for HAB Detection and Monitoring
This section discusses patterns and trends in the scale of studies, HAB proxies, methods for HABs proxy estimation, and accuracy assessment.

Scale of the Study
In terms of scale, the studies were divided into three categories namely spatial, spatiotemporal, and time series. The spatial group represents studies that perform analysis at a single point in time, whereas spatiotemporal covers studies using data from multiple times, which might range from a few months to multiple years. Time series is used to represent studies that use the value from a single pixel (original or average) for temporal trend analysis, for example, to extract phenological information to detect algal bloom start and end date. Figure 6 summarizes the number of studies within each of these three categories for different types of study area (freshwater, coastal, and open water). Figure 6 shows that most of the studies performed spatiotemporal analysis (n = 268) followed by spatial only (n = 112) and lastly time series (n = 41). The bar graphs indicate that the time series analysis was mostly carried out for open waters. Conversely, the number of studies at spatiotemporal and spatial scale mostly consisted of inland and coastal areas. It is noteworthy that the number of spatial and spatiotemporal scale studies for open water is lower than for freshwater and coastal areas. Furthermore, the number of freshwater and coastal studies are similar at each of the three scales.

HAB Proxies
HABs are detected using various proxies in the form of band reflectance, multiband indices, phycocyanin concentration (PC), and chlorophyll-a. The most common proxy used for HAB detection and monitoring was chlorophyll-a ( Figure 7). As can be seen from Figure 7, 80% of the studies used chlorophyll-a as an indicator for detecting and monitoring HABs. Of the remainder, 7% of the studies used indices such as normalized difference chlorophyll index (NDCI), floating algal index (FAI) and cyanobacterial index (CI) while some studies (6%) only used false color images using band reflectance without quantitative analysis. Other indicators include fluorescence line height (FLH), which was used in 3% of the studies, and cell density (2%). Lastly, PC was also used in 2% of the studies. The reason behind chlorophyll-a being the most frequently used indicator is that is can be easily detected in the visible spectrum as compared to the other pigment, i.e., phycocyanin.

HAB Proxy Estimation Methods
Not only was there a variation in the HAB proxies used but also in the methodology adopted for estimating the proxy parameters. The pie chart presented in Figure 8 illustrates the six major categories of methods used to extract HAB proxies: regression, index thresholding, spectral shape, analytical methods, machine learning and classification. As can be seen in Figure 8, regression approaches dominated the HAB analysis (67%). Although machine learning methods have proved advantageous in many other applications of remote sensing, they were only used in 5% of the HAB studies reviewed. Very few studies (4%) used classification-such as dividing the area into high algal bloom, low algal bloom, or no bloom-for detecting HABs.

Validation for HAB Proxy Estimation Methods
The studies used numerous measures to validate the methods used for HAB proxy estimation, which made it challenging to directly compare studies. Figure 9 highlights the most frequently reported validation measures-R-Squared (R 2 ), root mean square error (RMSE), bias, mean absolute percentage error (MAPE), overall accuracy, and Kappa coefficient-noting that the number of studies reporting each type of error was not mutually exclusive. Since the majority of studies were regression based, R 2 was the most frequently reported measure (n = 210) followed by RMSE (n = 125). For classification or index thresholding studies, the most frequent measure reported was overall accuracy (n = 20), though some classification studies also reported Kappa coefficient (n = 8). Variants of the measures reported in Figure 9 also appeared in the literature, such as log RMSE, normalized RMSE, and unbiased RMSE and several MAPE variants, mean absolute error (MAE) and mean absolute percentage difference (MAPD). It is important to note that only 263 studies reported any quantitative validation. The other studies included visual validation or validation from another source such as previous studies or high-resolution imagery.

Sensors for HAB Monitoring
Out of the 420 papers selected for review, two of them applied ready-to-use products, hence 418 studies were used in the analysis of sensor trends. The results indicated that in terms of platform, the highest number of studies were satellite based (n = 407), with a few airborne (n = 4) and UAV based studies (n = 7) included. The spectral composition was also analyzed, with the highest number of studies using multispectral imagery only (n = 407), followed by hyperspectral (n = 3) and lastly SAR (n = 1). A few studies fused data type with five studies using SAR and multispectral data whereas the combination of hyperspectral-multispectral and hyperspectral-SAR were each used by one study.
Given the dominance of multispectral sensors, further analysis was carried out within that group and the distribution of these frequently used sensors, including single and combined use, is shown in Table 3. The largest number of studies (n = 162) utilized MODIS onboard the Terra or Aqua satellites, followed by SeaWiFS (93) and MERIS (81). Landsatbased sensors were also frequently used, in particular Operational Land Imager (OLI) (51), Enhanced Thematic Mapper Plus (ETM+) (17), and Thematic Mapper (TM) (28). Although the Sentinel satellites were relatively recently launched, 32 studies used the Multispectral Instrument (MSI) onboard Sentinel-2 and 18 studies used OLCI data onboard Sentinel-3. Table 3 also shows the common data combinations such as TM with ETM+ or OLI, and ETM+ and OLI in combination with MSI, though the most frequently used combination was MODIS, MERIS and SeaWiFS. In addition to those mentioned in Table 3, other sensors included Coastal Zone Color Scanner (CZCS), GOCI, Ocean Color Monitor (OCM), and Visible Infrared Imaging Radiometer Suite (VIIRS). A small number of studies also used high spatial resolution commercial imagery, mostly for validation purposes.  The number of studies only does not provide a complete picture, hence the temporal progression in the last two decades of frequently used sensors was also analyzed ( Figure 10). The first reported MODIS study was in 2003, with the number of studies increasing exponentially since then. Other sensors showed increases in the associated number of studies, but growth was steadier compared to that of MODIS. The decommissioning of the MERIS sensor in 2012 increased public access to data, hence study rates increased at that point. Similarly, the number of studies using Landsat data was relatively low until 2008, when the data became available at no cost. Figure 10 highlights the rapid increase in the application of newer sensors such as OLI and MSI. Many Landsat studies after 2013 used OLI, ETM+ and TM together for longer temporal analysis. It is also noteworthy that OLCI onboard Sentinel-3 contributed to 4.3% (n = 18) of studies despite being the most recently launched sensor among those displayed in Figure 10.

Sensor Resolutions
We also analyzed the spatial resolution of sensors used in HAB studies against the type of study area, i.e., freshwater, coastal, or open water. The freshwater type includes lakes, rivers, and reservoirs, coastal areas include near shore areas such as bays, estuaries, and lagoons, and open water includes sea and ocean. For analysis purposes, satellite resolution was divided into five categories: high, moderate to high, moderate to coarse, coarse, and very coarse. The number of studies associated with each study area type (freshwater, coastal and open water) and the corresponding resolution is indicated in Figure 11. As expected, high-resolution satellites such as WorldView and RapidEye and UAVs were mostly used for coastal and freshwater study areas, which typically have smaller geographical coverage. Moderate to high-resolution data, such as Landsat and Sentinel sensors, were most commonly used for freshwater-based studies. The moderate to coarse-resolution data types, such as MODIS, were more often used for freshwater and coastal studies as compared to open waters. Lastly coarse and very coarse-resolution data such as products derived from MODIS and SeaWiFS were predominantly used for coastal and open waters.

Processing Levels of Imagery
The processing levels of most satellite data are typically divided into four categories: (1) level 0: raw data without any preprocessing carried out, (2) level 1: geometrically and radiometrically corrected top of atmosphere reflectance, (3) level 2: surface reflectance with atmospheric correction applied, (4) level 3: derived products at coarser spatial resolution. For HAB studies, the level 3 data applied is mostly chlorophyll-a values, which are often merged from SeaWiFS, MODIS, and MERIS data. Figure 12 shows the number of studies using different processing levels of remotely sensed imagery based on the 299 studies that mentioned this information. The number of studies using raw or minimally processed data (levels 0 and 1, respectively) was almost double that of the more highly processed datasets. This may be because not all standardized techniques and parameters can be applied to every type of study area.

Ancillary Data
There were various types of ancillary data used in the studies for both extracting HAB proxies and finding their relationship to other parameters. For our analysis, we grouped these data into seven categories: water quality parameters, sea surface parameters, wind parameters, meteorological data, radiation parameters, hydrological data, ocean and bathymetric data. The details of these categories are shown in Table 4. The ancillary data categories were analyzed against the type of study area, i.e., freshwater, coastal, or open water, presented in Figure 13. Only 144 studies reported the use of ancillary data, with the most frequently used categories being "water quality parameters" and "sea surface temperature", each with 61 studies. The analysis across study area types showed that water quality parameters were most frequently used for freshwater studies, while sea surface parameters were required in coastal and open waters. Wind parameters were mostly used for coastal and open waters, whereas meteorological and hydrological data were largely used for freshwater study areas. Ocean and bathymetric data North Atlantic oscillation index, bathymetric maps, geostrophic current Figure 13. The types of ancillary data used for different study area types.

Software Environemnts Used
The distribution of studies across the various software environment categories are depicted in Figure 14. Figure 14a shows the software environments categorized into five types: data-specific, image processing, programming, geographic information system (GIS) based, and statistical analysis. Data-specific environments are those that are designed specifically for a certain type of data including Basic Toolbox for Envisat AATSR and MERIS (BEAM), SeaWIFS Data Analysis System (SeaDAS), Sentinel Application Platform (SNAP), GOCI data Processing System (GDPS), and Pix4D Mapper (used for UAV-based data processing). Figure 14b shows that SeaDAS was used in the highest number of studies (n = 94), followed by BEAM (n = 30), SNAP (n = 13), GDPS (n = 4) and lastly Pix4DMapper (n = 2). The different image processing environments shown in Figure 14c support a large variety of data. In this category, the highest number of studies used Environment for Visualizing Images (ENVI) (n = 38), followed by ERDAS Imagine (n = 11) and eCognition (n = 1). Figure 14d shows the variety of programming-based environments including MATLAB (n = 17), R (n = 14), GEE (n = 6) and Python (n = 4). The GIS-based environments in Figure 14e show the highest number of studies (n = 20) using ArcGIS followed by QGIS (n = 6) and GRASS GIS (n = 1). The statistical analysis-based software environments shown in Figure 14f, show this category contributed the least, with Statistical Package for the Social Sciences (SPSS) used in six studies, Timesat for time series generation used by two studies, and only one study reporting use of Microsoft Excel.
We also analyzed the distribution of the most frequently used software environments-BEAM, SeaDAS, MATLAB, ENVI, ArcGIS, and R-across sensors ( Figure 15). Figure 15 shows that MERIS data was mostly processed by BEAM, while MODIS and SeaWiFS data were generally processed through SeaDAS. Although Landsat sensors (TM/ETM+/OLI) were processed and visualized using multiple environments, the highest number of studies used ENVI. Some tools, e.g., MATLAB, ArcGIS, and R, were used for multiple sensor data types, while some sensors, e.g., Sentinel MSI and OLCI, were distributed among multiple environments.

General Characteristics
This paper presents a comprehensive meta-analysis of 420 articles for monitoring and detecting HABs. As seen in Figure 2, the number of related studies has increased in recent years attracting researchers from multiple domains and scientific backgrounds, which is evident from the variety of journals presented in Figure 3.
Various studies describe the advantage of utilizing earth observation for efficient HAB monitoring [46,59]. However, the study areas derived from the analyzed papers were limited to only 43 countries, most frequently China and the USA. The increase in the number of studies from China is associated with a HAB outbreak in Lake Taihu in 2007 [51,96,129]. In the USA, the greatest number of inland water studies focused on the Great Lakes [81,130], while Florida Bay [61,131,132] was the most reported coastal water. Apart from China and USA, studies conducted in the remaining 41 countries accounted for only around 40% of the total studies. Among those, there were only six studies that used earth observation for monitoring HABs in Australia and New Zealand [80,[133][134][135][136][137]; yet monitoring HABs in that region is important with large scale algal blooms in Australia's coastal areas since 1997 [80] that are largely associated with nutrient run off from agriculture land [137]. The monitoring and quantifying of phytoplankton abundance in open waters globally is required for understanding ecosystem dynamics and the effects of climate change, yet there are clear research gaps [138]. For example, there are few studies conducted in New Zealand marine environments despite being within the fourth largest "Exclusive Economic Zone" [139]. Similarly, in South Asia there are numerous studies of the coastal areas of the Arabian Sea touching the Indian border [140][141][142] and of the Persian Gulf [143][144][145], yet there are no studies based on the coastal waters of Pakistan which lies in between the above-mentioned study areas. The literature also suggests increasing HABs around global freshwaters [23]; however, the studies reviewed were mostly focused on China and USA as mentioned above and very few studies were conducted over other areas such as Lake Chad [146], Italian lakes [147] and New Zealand lakes [134]. It is important to study HABs at both regional and global level; earth observation data can provide a synoptic view to support continuous monitoring.

Reference Data
Reference data for monitoring HABs primarily comes from two sources. One is data directly or indirectly collected from water samples for HAB proxies such as chlorophylla or phycocyanin. The second is in situ spectral measurements collected to calibrate and validate airborne or space-borne imagery. Some studies use high-resolution satellite imagery for validation [148]. However, not every study presented details about reference data, which created difficulties in comparing accuracy across studies. The parameters of interest for in situ data for HAB detection and monitoring typically included the source of data collection, time period, number of stations, number of samples, laboratory analysis, in situ reflectance instrument, and data split ratio for calibration and validation. The typical methods for obtaining reference data in terms of water quality parameters were based on (a) field campaigns [93], (b) data obtained from government agencies [132], (c) existing databases from multiple sources [149]. In situ radiometry data was collected from various instruments for example Satlantic's "SeaWIFS Profiling Multichannel Radiometer (SPMR)" was used to collect ocean color and calibrate models for SeaWIFS at 13 spectral channels: 412 nm, 443 nm, 456 nm, 490 nm, 510 nm, 532 nm, 560 nm, 620 nm, 665 nm, 683 nm, 705 nm, 779 nm, and 865 nm [150]. In continuation of SPMR is Seabird's "Hyper Profiler II" [151] that also provides information about the color of the ocean by measuring upwelling and downwelling radiance. Other SeaBird instruments include OCR that can operate at 4 or 7 wavelengths [152] and HyperSAS which can operate between 350 nm to 800 nm [153]. Studies also used Trios RAMSES, which is a hyperspectral radiometer operating at 320 nm to 950 nm with 190 spectral bands [154]. The sensors from Ocean Optics operate from 200 nm to 1100 nm at 1.5 nm spectral resolution [155]. Researchers have also used GER 1500 which operates between 350 nm to 1050 nm and has 512 spectral channels [156]. The most frequently used in situ radiometers are the ASD Fieldspectrometers [157], which operate between 350 nm to 2500 nm with more than 2000 spectral channels. The accuracy of the reference data depends upon the accuracy and spectral resolution of the in situ radiometers. For further information about in situ radiometry readers are directed towards the published literature [158].
For HAB proxy data, many studies described the laboratory analysis in detail and provided the overall number of samples or stations included along with the time period [159]. However, information about the sampling technique was missing from the majority of studies [129,160]. The description of the division of data into training and testing samples was missing [161], which caused difficulties in terms of having a consistent metric to compare accuracy across the studies. In terms of matchup between ground based HAB proxies and satellite-based derived values, the studies used varying values in terms of both the spatial and temporal domain. In the spectral domain, values ranged from considering the response within one pixel [162][163][164] to that within a 16 × 16 pixel average [165]. However, most of the studies used a median of 3 × 3 pixels [102,[166][167][168][169][170][171][172][173]. In terms of temporal domain, some studies used an acceptable time difference varying from 30 min [137] to 12 days [174], with the most frequently being less than a one day time difference [79,124,165,[175][176][177][178][179][180][181][182]. Similarly, other parameters reported also varied across studies, which hindered direct comparison of accuracy assessment across the studies. A similar problem was observed in another meta-analysis study related to wetland mapping [183]. Researchers are encouraged to comprehensively describe information related to reference data collection and processing since associated errors will propagate throughout the processing workflow.

HAB Proxy Estimation Methods
Pigments such as chlorophyll-a and phycocyanin are used as HAB proxies since they are optically active components of water and have a definite spectral signature [184]. The literature shows various methods for estimating HAB proxies, the simplest of which are spectral-based methods such as index thresholding and band ratioing. For example, the availability of high spatial resolution UAV imagery has supported the detection of coastal algal blooms with high accuracy using a simple ratio between the red and blue band. Many studies have utilized the reflectance and absorption values of chlorophyll-a at 700 nm and 670 nm, respectively [185]. A significant correlation between chlorophyll-a concentrations and the ratio of spectral values near 700 nm and 670 nm have been reported in the literature for a wide range of chlorophyll values [186][187][188]. Although these band ratios are effective and easy to implement, they are only suitable for phytoplankton-dominated waters because turbid waters are influenced by other constituents [189]. Apart from band ratios, multiple indices have also been applied for HAB detection and monitoring. For example FAI has been successfully implemented for both UAV [190] and satellite imagery [51,191]. Another example is NDCI, which is appropriate for use even when ground reference data is not available [192] making it particularly applicable for coastal waters [193]. Multiple studies have shown the potential of using NDCI for HAB detection and monitoring [192,[194][195][196][197]. Similar to band ratios are spectral shape algorithms that are also based on the measured reflectance value. However, these differ from band ratios in that they use distinct reflectance or absorption features by calculating features such as slope instead of using absolute values [74]. The most commonly used spectral shape method is the FLH, which measures the chlorophyll-a fluorescence [198] and has been useful in multiple studies for HAB detection and monitoring [64,137,197,[199][200][201]. Another spectral shape-based index known as MERIS maximum chlorophyll index (MCI) has been shown to be versatile in performance and is good for chlorophyll values ranging in moderate to high concentrations [202][203][204][205][206]. For distinguishing cyanobacteria, studies have used another spectral shape-based method known as the CI [207]. This spectral shape method has been used to quantify cyanobacterial magnitude successfully [123,[208][209][210][211]. Among the many indices, a comparative study found that CI, MCI and FLH performed best [197]. These spectral shape methods generally perform better than the simple band ratio methods as the latter are affected by other constituents of water. However, these spectral shape methods require data with high spectral resolution, which is a limitation when it comes to satellite imagery [74] as only few sensors such as MODIS and MERIS have suitable spectral bands for these algorithms [211]. Furthermore, MERIS and OLCI are the only sensors that carry a spectral band at 620 nm wavelength, which is important for phycocyanin detection [43].
The most frequently used HAB proxy estimation methods are empirical approaches such as regression models that aim to derive a relationship between the HAB proxies and spectral values from earth observation data. These models require ample in situ data to develop the relationship, which can be limiting [74]. These models include linear- [59,[212][213][214], exponential- [113,215], polynomial- [216][217][218][219] and logarithmic-based [220,221] regression. Among empirical based models, NASA's ocean color (OC) algorithms [222,223] were the most frequently used for chlorophyll-a estimation such as OC2 [142,224,225], OC3 [60,226,227], OC4 [46,92,133,228], and OC5 [229,230]. In terms of nonlinear regression, a small number of studies used machine learning methods such as support vector machine (SVM) and neural networks [62,231]. SVM is a nonparametric technique, which is advantageous because it does not assume any underlying distribution, as is required in advance when using other statistical techniques [232]. Different kinds of neural networks were used in the reviewed studies, such as multilayer perceptron incorporating values from different spectral bands and band ratios to classify chlorophyll concentration [102,233]. Another type of neural network known as mixture density networks (MDN) was also used, of which the output parametrizes several Gaussians [52]. Although empirical-based methods are easy to implement and no prior understanding of water−light interaction is needed, these models are developed for a specific water body only and hence are not transferable [74]. However, because these models take into account the characteristics of a specific water body, they tend to provide better results as compared to other spectral based models [234]. To manage the complex nature of turbid waters, researchers have developed bio-optical models based on the radiative transfer model. These methods include quasi analytical algorithm (QAA) [235][236][237] Garver-Siegel Maritorena (GSM) [60,140,238] and HYDROPT [239]. QAA was proposed by [240] and is easy to implement as it does not require the calculation of absorption coefficients. Analytical methods such as GSM are sensitive to input reflectance values [60]. Due to the complexity of applying these models, researchers developed tools such as the "Bio-Optical Model-Based tool for Estimating water quality and bottom properties from Remote sensing images (BOMBER)" [241] and "Water Color Simulator (WASI)" [242].
The accuracy of HAB proxy estimation methods cannot be compared directly as there are multiple factors involved such as data collection methodology, satellite imagery used, preprocessing steps, and the number of samples used for calibration and validation. Furthermore, the accuracy measures differed substantially across the studies, which exacerbated the objective comparison of results. Classification or thresholding-based studies reported errors in the form of overall accuracy [89,243] or simply by presence and absence counts [62,244]. Approximately 62% of the studies reported accuracies in one form or the other, among which 79.5% included the validation R 2 value and only 7.6% reported overall accuracy. However, some studies compared various methods for the same study area and thus could be compared. For index-based methods, the studies showed that ASAR performed the best when compared with other spectral based methods [74,195,197,245], and regression models tended to outperform spectral based methods [195]. There were variations within the regression-based models, e.g., some studies showed that models using four bands performed better [246], while others found exponential and quadratic models had higher accuracy [195,247,248]. Studies also compared ocean color models and found OC5 performed best, especially for turbid waters [58,228,230,249]. For other ocean color models there were varying results: some studies showed that OC2 performed well [250,251], while other studies in the Atlantic region found OC4 performed better than OC2 [252]. These variations could be due to atmospheric correction effects [253], varying sensors [254] and different geographical locations [252]. The result of the comparison between empirical and analytical methods was inconclusive as studies showed analytical methods outperforming [161,255], empirical methods outperforming [74] and both methods showing similar results [134]. Hence these vary on a case-to-case basis. However, an increasing trend in accuracy was observed with the utilization of machine learning methods compared to conventional methods [74,231].

Sensors for HAB Monitoring
In this review, sensors onboard multiple platforms were included, however, the number of studies using airborne, or UAV platform were very low as compared to satellitebased platforms. This could be associated with the freely available satellite-based imagery and their continuous spatiotemporal monitoring over larger scales.
Although satellite-based imagery has its advantages, the use of UAV for remote sensing applications is increasing rapidly as they are capable of leveraging the capabilities of both airborne and space-borne remote sensing systems [256]. Several published articles indicate the utilization of UAVs for agriculture, forestry, and photogrammetry [232]. Recently, UAVs have also been used for algal bloom mapping [84,85]. This study identified seven studies based on UAV and four based on airborne data, although for some of these, the UAV or airborne data were used for validation only [257]. In terms of spectral specification, most of the studies used multispectral sensors whereas a few studies also used hyperspectral and SAR imagery, the details of which are discussed below. The multispectral data used in 407 studies mostly included visible to shortwave infrared regions of the electromagnetic spectrum. Optical data was used mostly in HAB monitoring for estimating chlorophyll-a [52]. The details for specific spectral bands and band ratios used for chlorophyll-a detection are given in the published literature [66].
Among the selected 420 published articles examined in this review, the most frequently used satellite imagery was acquired from MODIS, SeaWiFS, and MERIS, followed by TM, ETM+, and OLI. Conversely, a water quality parameter review paper based in China found that the Landsat-based sensors were the most frequently used datasets for chlorophyll-a mapping, rather than MODIS [110]. This is likely explained by the fact that 52% of the studies in their review paper were based on inland lakes, where our review also found that high-to-moderate resolution sensors, such as on Landsat, were frequently applied. The revisit time of MODIS, along with moderate spatial resolution, are the key contributors to its popularity for HAB monitoring especially in coastal and oceanic waters, capturing short and long-term dynamics. MODIS imagery was also frequently used for time series analysis, mostly in conjunction with SeaWIFS, MERIS, and VIIRS datasets [227,258,259]. SeaWiFS onboard OrbView-2 was used in 93 studies, most of which used the standard mapping images offered at a resolution of 9 km [260]. As noted, these studies often also used SeaWiFS data using MODIS and MERIS data for longer temporal analysis.
Out In this analysis, studies have used data from TM onboard Landsat 5 [261,262], ETM+ onboard Landsat 7 [90,263,264], and OLI onboard Landsat 8 [265][266][267]. There were very few studies where TM data was used alone [268], rather it was mostly used in conjunction with ETM+ or OLI [59,216,269]. A spatial resolution of 15 m (panchromatic) and 30 m (visible to infrared) for ETM+ and OLI, is helpful for detecting and monitoring HABs, particularly in the case of inland waters [59,90,122]. The spectral bands are similar for both sensors except for the addition of the coastal aerosol and cirrus bands for OLI. However, the 16 day revisit time of Landsat can be problematic as most blooms are present for less than a week, which is likely what has driven MODIS to be more popular.
The more recently launched European Sentinel series was also seen in the studies, including both MSI onboard Sentinel-2 and OLCI onboard Sentinel-3. This meta-analysis identified 32 articles using MSI mostly in combination with sensors onboard the Landsat satellites and with OLCI. OLCI continues the heritage of MERIS onboard EnviSAT [270]. The spatial resolution for Sentinel-3 OLCI is the same as MERIS, i.e., 300 m. Although the number of spectral bands is greater (21 instead of 15), the additional bands in the mid infrared region are not used for HAB monitoring as they are used to detect oxygen and water vapor. Likely due to free data availability, we observed a rapid increase in the use of MSI and OLCI data for HAB monitoring ( Figure 10). Although recently launched, a total of 18 studies utilized data from OLCI in this meta-analysis.
There are multiple commercial satellites with very high spatial resolution, but the high cost has led to very few studies using those for HAB monitoring studies. One example used 6.5 m RapidEye imagery for a case study in Korea and showed promising results that can be improved further using precise field data [271]. Only five studies used hyperspectral imagery, and these were mostly taken from a UAV or airborne platform [246], [272] with the exception of one study using HJ-1 [273]. This could be due to lack of freely available hyperspectral imagery, or the extensive data preprocessing associated with hyperspectral imagery. In this analysis, seven studies used SAR imagery, mostly in combination with multispectral imagery. The studies used X-band COSMO-Skymed and C band advanced SAR (ASAR) data [87,274] for monitoring algal blooms in various areas. The presence of cyanobacteria causes the water surface to become smooth, which is easily detectable in SAR imagery because smooth surfaces have low backscatter relative to other surfaces thus appearing darker. Some studies have also used Sentinel-1 SAR data along with Landsat and MODIS [269,275].

Processing Levels of Remotely Sensed Imagery
As seen in Figure 12, raw or minimally processed images were used in HAB studies in greater numbers compared to processed data. The reason behind this may be that the processed images use standard procedures that may not be applicable to every study area. Therefore, researchers prefer to carry out preprocessing appropriate to their study areas, which may include geometric correction, radiometric calibration, data mosaicking and clipping, and atmospheric correction. For atmospheric correction, researchers used available models such as 6S [181], p. 8, dark object subtraction [113], Fast Line-ofsight Atmospheric Analysis of Hypercubes (FLAASH) [276], or developed customized methods [277]. The most frequently used method was dark object subtraction, which works under the assumption that the water does not reflect in the near infra-red (NIR) region. This works well in calm and clear water but is not suitable for turbid waters due to reflection from particulate matter [278]. Therefore, for turbid waters such as coastal and inlands, another approach was developed based on short-wave infra-red (SWIR) bands [279]. Multiple atmospheric correction algorithms have been developed for inland and coastal waters [280][281][282]. Recently, atmospheric correction methods using water turbidity have also been proposed [283]. Results show that they perform better than the conventional techniques. Another study demonstrated the use of neural networks for atmospheric correction [284], finding significant improvement compared to aerosol robotic networkocean color (AERONET-OC). Studies also compared various atmospheric techniques for HAB applications [278,285,286]. However, there is no standard atmospheric correction method indicated in the literature that can be applied to all study areas. While atmospheric correction is highly recommended, it can also cause removal of valid pixels due to their low brightness value. While using the minimally processed images, researchers also carried out sensor specific corrections such as "bow-tie" correction for MODIS [287], "smile" correction for MERIS [87] and scan line correction for some ETM+ images [90]. A problem may arise in the case of level 3 images where an HAB proxy, such as chlorophyll-a, is derived based on a global model [288] instead of a local model. After atmospheric correction, images were generally checked using quality flags. The quality flags can include land pixel, cloud pixel, ice pixel, high top of atmosphere radiance, low water-leaving radiance, stray light, sun glint and atmospheric correction failure [289].

Remote Sensing Data Resolutions
One of the considerations related to satellite-based remotely sensed data is the resolution, specifically spatial, spectral, and temporal. As most of the studies included in this meta-analysis consisted of multispectral imagery, the discussion focuses more on spatial and temporal resolutions. Temporal resolution is very important for HAB detection and monitoring as algal bloom events are generally short-lived and an adequate temporal resolution is needed to monitor the associated rapid changes. A limited number of observations during the bloom seasons can cause uncertainties in blooming patterns [290]. Some researchers applied a time series of images aggregated at a defined interval (e.g., 8 or 16 days) comprising generally of chlorophyll-a anomaly values [291]. However, cloud cover can reduce temporal resolution especially in tropical areas. This can be decreased through the use of geostationary satellites, which have more frequent temporal scanning (eight times per day) compared to polar-orbiting satellites. One of the payloads onboard the Communication, Ocean and Meteorological Satellite (COMS) used for HAB monitoring is GOCI [292]. Although the spatial resolution is moderately coarse (500 m), frequent data capture makes it useful for HAB detection and monitoring by generating ocean products containing chlorophyll-a and other water quality parameters.
As seen in the above example, there is often a trade-off when choosing to use high temporal or high spatial resolution data. One option to mitigate this challenge is by using multiple datasets, for example combining data from comparable sensors, such as OLI and MSI, to increase temporal resolution. This approach is especially popular in the case of sensors with high spatial resolution and longer revisit times [145,293]. Another option is to use multiple images from the same sensor (e.g., Landsat or MODIS) to understand the phenology and seasonal variations of algal bloom [294]. Few HAB detection studies were based on single-date imagery [90], and several studies emphasized the importance of multitemporal satellite data [125,295]. Even when imagery was from a single year, the data were often obtained at multiple dates to observe the HABs over time and space [173], [296]. Given the availability of MODIS, HAB monitoring has frequently combined MODIS data with SeaWiFS or MERIS imagery [258,297]. Along with temporal seasonal analysis, time series analysis was also used in the form of wavelet techniques [143,298,299] and for observation of phytoplankton phenology in the form of bloom start and end date [148,300,301].

Ancillary Data
Studies have shown a variety of ancillary data used in HAB detection and monitoring studies to understand the underlying parameters of aquatic environment and thereby accurately estimate HAB proxies, as listed in Section 3.6. A large number of studies that used ancillary data acquired HAB proxies using MODIS and SeaWiFS data. Among those, SST was the most frequently obtained parameter, which was generally obtained from sources such as METEOSAT [302], and the Advanced Very High Resolution Radiometer (AVHRR) [94,303,304]. METEOSAT provides an efficient procedure for declouding data, thus increasing the number of observations possible in coastal upwelling areas prone to cloud-coverage [302]. AVHRR provides SST through three thermal infrared channels [305]. Most studies used the AVHRR daily, 8-day composite, and monthly composite nighttime radiation products to avoid the influence of daily surface heating [94]. Some studies also calculated monthly SST anomalies instead of using mean SST data [306]. Other important sea surface parameters are SSH and SSHA [141,307] where SSHA is used to describe and quantify mesoscale features [307]. These sea surface-related parameters are used to define the boundaries of biogeographic regions so that seasonal variations can be observed [288,308,309]. Furthermore, these are used to understand the causes of HABs by developing a relation between HAB proxies and sea surface data [310] in order to better understand the ecosystem [311].
Researchers have also demonstrated the value of including water quality parameters such as TSM [132,228], turbidity [59], SDD [262], and CDOM [248]. These water quality parameters are particularly needed to accurately detect chlorophyll-a in turbid waters such as in reservoirs and coastal areas [311,312]. Specifically, the absorption by CDOM greatly affects the spectral response from phytoplankton, especially near the 440 nm region [66]. Furthermore, the water-leaving radiance from turbid waters is affected by the optical properties of particles suspended or dissolved in the water [74,313].
Studies have incorporated meteorological parameters to better understand the parameters controlling HAB generation. This includes wind stress data from QuickSCAT [303] or government archives [302], wind velocity and direction [311,314], and precipitation [315]. Wind vector related data is mostly used in the case of oceanic waters to understand HAB dispersion due to high wind speed. A small number of freshwater studies also incorporated hydrological data such as river inflow and river discharge [91,316,317]. The causes for HABs are not the focus of this manuscript but are comprehensively described in the literature [2,5,6,23,29,[38][39][40].

Software Environments Used
A detailed analysis of the 420 articles reviewed found that SaeDAS was the most frequently used package for estimating HABs proxies. This is not surprising, given that MODIS and SeaWiFS data are the most common datasets for HABs modeling and are typically processed using the SeaDAS package. Sensor-specific processing packages such as BEAM and SeaDAS were used in many studies (n = 124), mostly for atmospheric correction. Among the other image processing and geospatial analysis software, ENVI was used most frequently. Although no study attributed a specific reason for this, the number of studies using ENVI may be higher because it provides greater user input flexibility as compared to similar software, thus allowing users more options for customization. Most of the studies involving MSI and OLCI data were processed using the SNAP toolbox, as can be expected giving the purposeful design of SNAP for use with Sentinel data. Among the scripting software, no meaningful difference was observed in the use of R and Python (open source) versus MATLAB (commercial), although it could be observed from other meta-analysis studies that there is an inclination towards open-source software [232]. Furthermore, despite the growing use of GEE over the last several years, only seven studies implemented GEE for HABs monitoring. Although a meta-analysis of GEE [318] showed 62 studies related to water research, only one study had a focus on HABs. TIMESAT provides free tools for the reconstruction of time series data and, consequently, the extraction of phenological information based on user-defined input parameters. However, the literature reports some limitations associated with this software such as the provision of only thresholdbased methods for the extraction of seasonal metrics [319], and very few studies have used TIMESAT for time series analysis of phytoplankton blooms [64,320]. This number difference is also well reflected in the timeline as seen in Figure 16. The highest number of studies used SeaDAS, which was the first software reported in the literature. Similarly, MATLAB, ENVI, BEAM, ArcGIS, and R were used in many studies, which reflects their long-term availability. Studies are now using open access software and are shifting towards cloud computing platforms.

Challenges and Future Directions
Undoubtedly, remote sensing is a promising approach for the continuous monitoring and mapping of Earth's dynamics [321], however, there are still limitations and challenges. First, data contamination, especially in the form of cloud cover and atmospheric influence, introduces a lot of uncertainty. This uncertainty is not only limited to space-borne sensors but also to UAV and airborne sensors. Currently, atmospheric correction for UAV imagery is not a standard practice because these effects can be ignored under optimal flight conditions [83]. However, while atmospheric effects decrease with lower sensor altitude, atmospheric correction is recommended for accurate detection and monitoring of HABs. A review on UAVs for HAB [85] showed that only one study used atmospheric correction before processing [322]. Some studies using airborne sensors have adopted approaches that were not affected by atmospheric correction [323]. Atmospheric correction for UAVs and airborne sensors is an ongoing area of research.
The studies included in this review predominantly performed atmospheric correction, but model accuracies for some of the studies were quite low for commonly used sensors [211,301]. In addition, in tropical areas, persistent cloud cover can affect results generated from optical imagery. One solution to this problem is the utilization of geostationary satellites such as COMS, as mentioned in Section 4.6. Their coarse spatial resolution has historically been a problem, but the new generation of geostationary satellites such as Himawari-8 and have reported good results for monitoring HABs [292,324]. To increase the temporal resolution, studies have also used OLI and MSI harmonized data [325]. In terms of larger temporal scale or time series data, Landsat sensors can provide data for 50 years [326] and with the recent launch of Landsat 9 [327] the series will continue providing data at 15 m and 30 m spatial resolution with the OLI-2 sensor. Landsat 9 will be eight days out of phase with Landsat 8, hence increasing the temporal resolution of the series [328]. With an increased radiometric resolution of 14-bit, OLI-2 will also help differentiate intensity levels better than Landsat 8. Furthermore, the additional coastal band will help measure chlorophyll content, which is an indicator of phytoplankton [329].
The reviewed studies mostly focus on larger study areas in open waters [330,331], coastal areas [91], and large lakes [100,197,294,332]. Very few studies focus on smaller lakes (<10 km 2 area) [177,333], which are important as they are often a source of freshwater. Moderate to coarse resolution sensors such as MODIS are unsuitable for monitoring small lakes and reservoirs. However, with the availability of Sentinel-2 and Landsat 8, the number of studies focusing on inland waters has increased. Still, there is a need to incorporate high spatial resolution satellites into the studies. For that purpose, recently launched sensors such as China's GaoFang series (8 m resolution) [334], p. 1 and Planet's CubeSats, provide new opportunities for HAB mapping and monitoring, particularly for smaller extents [335]. PlanetScope data from CubeSats provides daily observation at high spatial resolution (3 m) and opens a new horizon for monitoring HABs. However, attaining high spectral resolution along with the increased spatial resolution remains a problem. Even with moderately high spatial resolution sensors such as that on board Landsat, there can be limitations such as the lack of red edge bands. Despite of that, studies have shown the utility of the sensors onboard Landsat, despite their spectral resolution [59,102,172,[336][337][338][339]. OLI has also been used in conjunction with MSI to provide good results [181,251]. Although hyperspectral sensors can provide the needed spectral resolution for HAB detection, broad coverage of data from airborne or UAV based hyperspectral instruments can be costly. Therefore, the field will benefit from two upcoming space-based hyperspectral satellite missions from the National Aeronautics and Space Administration (NASA). One of them is "Plankton, Aerosol, Cloud, Ocean Ecosystem" (PACE) [340], which is planned for launch in 2023. PACE will carry the "Ocean Color Instrument (OCI)" [341] for monitoring ocean color, detecting chlorophyll for characterizing phytoplankton abundance, and will aid in understanding the complex ocean ecosystems. OCI is a hyperspectral imaging radiometer operating between 340 nm to 890 nm, with seven additional bands in the SWIR region. The second planned NASA mission, the "Surface Biology and Geology (SBG)," is still in its initial phase of development [342]. This will also be a hyperspectral sensor with a focus on monitoring the physiology and health of both inland and coastal waters. These hyperspectral sensors will also help to identify HAB species and classify various phytoplankton groups. Such a sensor will be valuable in identifying the cell density of HABs as most of the studies focus on chlorophyll-a detection. It is noteworthy here that these upcoming sensors will also aid in the detection of phycoerythrin, which is a phycobilin pigment [43] with an absorption peak between 540-570 nm [343]. Previous studies have used fluorescence [344][345][346] and absorption features at 550 nm [347,348], which show potential for detecting phycoerythrin through remote sensing methods.
Over the last decade, machine learning methods have made their way into satellite image processing by achieving higher accuracy compared to conventional methods [349][350][351]. For HAB monitoring and mapping, studies have used SVM [336], gradient boosting [172,352], random forest [352], and convolutional neural network [231]. Machine learning methods such as random forest have proven advantageous because they can deal with multisource data and are not affected by the outliers [232]. However, only recently have studies begun to utilize machine learning for HAB monitoring and detection. Recent studies have also used long short-term memory (LSTM) for time series analysis for HAB proxies [74]. However, the assessment of the full capacity of these methods should be a future avenue in the field of HAB monitoring and mapping. Another area for future development is a move toward bio-optical models, which are not affected by environmental factors and do not vary spatially. Bio-optical models will be helpful in monitoring and predicting the three-dimensional movement of HABs, by monitoring the vertical movement of HABs along with horizontal.
There is also an ongoing need for a standardized framework for performing HAB monitoring, including sample selection, spatiotemporal analysis of algal blooms, and accuracy assessment [119]. The differences in the reported methodologies, processing steps, and accuracy metrics make it difficult to quantitatively compare studies and chose a suitable method for a particular application. For example, models best suited for calm waters cannot be applied to turbid waters, due to multiple contributing factors. Researchers across domains such as environmental science and image processing, need to collaborate and standardize the process to make data more accessible and comparable.
Somewhat surprisingly, the use of programming-based environments for monitoring HABs was significantly less as compared to using the traditional packages via a graphical user interface (GUI). While cloud computing platforms like GEE are being widely used for mapping and monitoring of environmental variables [295,318,353,354], in this metaanalysis only seven studies were found to have used GEE. Leveraging cloud computing platforms like GEE for monitoring and mapping HABs should be explored as a future direction.
The scope of this paper was limited to the meta-analysis of studies on HAB monitoring and detection, based on the parameters described in Section 2 of this paper. Future work should include the analysis of observations from multiple data sources to understand the relationship between HABs and environmental factors, hydrological factors, other water quality parameters, and oceanography.

Conclusions
This study carried out a meta-analysis of 420 peer-reviewed journal articles on HAB detection and monitoring using Earth observation-based remote sensing. The following conclusions can be drawn from this study:

•
The number of published studies show an increasing trend, especially in USA and China. These studies were published in a wide range of journals indicating the diverse backgrounds of the researchers. Furthermore, evaluation metrics such as the number of citations and journal impact factor showed that the quality of HAB-related studies has also increased along with the quantity. However, the studies were not distributed around the globe hence there is need to evaluate other potential at-risk aquatic bodies to have a coherent picture about HABs.

•
The most frequently used multispectral sensors were MODIS, SeaWiFS, MERIS, and Landsat-based sensors TM/ETM+/OLI. Though the launch of the Sentinel series is recent, a significant number of studies were utilizing MSI and OLCI datasets. About 75% of the studies were conducted over a longer temporal scale indicating the importance of continuous monitoring of HABs. These studies have utilized multiple sensors to generate a continuous temporal dataset. • Data with various resolutions were used for different types of study areas. However, we can provide some generalization in that high spatial resolution data are more common for smaller study areas such as inland waters, and low spatial resolution data are used for larger study areas such as coastal bodies and open waters. A tradeoff between resolutions is generally observed, therefore a virtual constellation of satellites and data fusion is recommended to fill the data gaps and improve the accuracy. The geostationary satellites have been proven useful for monitoring of HABs as they have greater temporal resolution. Experiments with CubeSats such as Planetscope have also revealed their potential for HAB detection and monitoring. However, further studies are required in order to draw a concrete conclusion. • Among the data processing levels, level 1 was most frequently used as it maximizes flexibility to customize the preprocessing steps. Studies used multiple atmospheric correction methods; however, no standard model was observed. Hence, a suitable atmospheric correction model especially for turbid waters having low HAB proxy concentrations is still an active area of research.

•
In terms of HAB proxies, studies mostly modeled chlorophyll using airborne and spaceborne data. Although phycocyanin has distinct spectral features, the unavailability of phycocyanin pigment in routine water quality measuring projects makes it difficult to calibrate with the remotely-sensed Earth observation data. There is also a need to focus more on using FLH and cell densities to discriminate different phytoplankton groups.

•
There are multiple sources of ancillary data that are used in conjunction with HAB proxies. The most frequently used, particularly in the case of oceanic waters were SST, wind vector, and SSHA data. These ancillary data help in understanding the seasonal variations of HABs and potential causes behind those variations. For coastal and reservoir waters, ancillary data such as TSM, turbidity, and hydrological parameters were used. These data help in understanding the sources and movements of sediments that cause turbidity in water. Further research is needed to understand the relation between turbidity and the generation of HABs. • There are various estimation methods for HAB proxies among which the regressionbased methods outperform the spectral based methods. However, the performance comparison of regression models with analytical models was inconclusive. Therefore, further research is needed as the performance varied on a case-by-case basis. The models' performance depends on multiple factors such as the quantity and quality of ground-based data, in situ reflectance, preprocessing steps, sensor specifications and the type of water body. There is a need for standardized reporting of methodology to make direct comparisons between studies. Furthermore, the various accuracy assessment metrics also limit cross-study comparison. However, in terms of validation, regression-based models generally used R 2 and RMSE while overall accuracy and kappa coefficient were used for classification-based methods.

•
The most frequently used processing software was SeaDAS. Generally, there was greater use of GUI-based environments as compared to programming based. However, over the last few years, more studies were using cloud-computing platforms such as GEE for broad-scale HAB detection and monitoring. GEE is also being used to develop applications for real-time HABs detection. • For future work, there is still a need to utilize machine learning algorithms not only for detection but for time series analysis as well. The utilization of analytical methods along with satellite imagery is still an open area of research.

•
The upcoming satellites such as Landsat 9 and NASA's hyperspectral satellites will open multiple avenues of research for HAB monitoring and detection. These datasets will help in species discrimination and can be useful for the detection of the vertical movement of HABs.
This manuscript conducted a meta-analysis of 420 journal articles to extract trends in HAB monitoring and detection using remotely sensed earth observation data. Future work should focus on understanding the relationship between HABs and environmental factors, hydrological factors, other water quality parameters, and oceanography using remotely sensed earth observation data.

Conflicts of Interest:
The authors declare no conflict of interest.