Mapping and forecasting onsets of harmful algal blooms using MODIS data over coastal waters surrounding Charlotte County , Florida

Over the past two decades, persistent occurrences of harmful algal blooms (HAB; Karenia brevis) have been reported in Charlotte County, southwestern Florida. We developed data-driven models that rely on spatiotemporal remote sensing and field data to identify factors controlling HAB propagation, provide a same-day distribution (nowcasting), and forecast their occurrences up to three days in advance. We constructed multivariate regression models using historical HAB occurrences (213 events reported from January 2010 to October 2017) compiled by the Florida Fish and Wildlife Conservation Commission and validated the models against a subset (20%) of the reported historical events. The models were designed to specifically capture the onset of the HABs instead of those that developed days earlier and continued thereafter. A prototype of an early warning system was developed through a threefold exercise. The first step involved the automatic downloading and processing of daily Moderate Resolution Imaging Spectroradiometer (MODIS) Aqua products using SeaDAS ocean color processing software to extract temporal and spatial variations of remote sensing-based variables over the study area. The second step involved the development of a multivariate regression model for same-day mapping of HABs and similar subsequent models for forecasting HAB occurrences one, two, and three days in advance. Eleven remote sensing variables and two non-remote sensing variables were used as inputs for the generated models. In the third and final step, model outputs (same-day and forecasted distribution of HABs) were posted automatically on a web-based GIS (http://www.esrs.wmich.edu/webmap/bloom/). Our findings include the following: (1) the variables most indicative of the timing of bloom propagation are bathymetry, euphotic depth, wind direction, SST, chlorophyll-a [OC3M] and distance from the river mouth, and (2) the model predictions were 90% successful for same-day mapping and 65%, 72% and 71% for the one-, twoand three-day advance predictions, respectively. The adopted methodologies are reliable, dependent on readily available remote sensing data sets, and cost-effective and thus could potentially be used to map and forecast algal bloom occurrences in data-scarce regions.


Introduction
An increase in agricultural activities introduces nutrients into water bodies and may adversely affect the biodiversity and habitats of aquatic ecosystems.One of the major sources of such nutrients are nitrogen-based fertilizers [1] that are widely used to increase agricultural productivity.These nonpoint sources of nitrogen through fertilization were found to be the predominant sources of overall nitrogen quantities in the Gulf of Mexico [2], where the study area (Charlotte County) is located.The introduction of nutrients increases the productivity of aquatic systems and enhances the growth of harmful algal blooms (HABs) which, in turn, produce toxins causing detrimental health effects [3] to humans and ecosystems [4].Karenia brevis (K.brevis), formerly known as Gymnodinium breve and Ptychodiscus brevis, is the most predominant HAB species in the Gulf of Mexico [5][6][7], and its adverse socioeconomic impacts on the region have been investigated in previous studies [8].These impacts include but are not limited to adverse effects to human health, marine life, tourism, and recreational activities [3,4,8].
Earlier efforts to map or forecast HAB occurrences examined the distribution of HABs in relation to a wide range of causal parameters, such as wind-driven water exchanges [9], temperature [10], relative abundance of protozoans that feed on algae (e.g., Mesodinium species) [11], cell distribution through oceanic currents [12], and hydrodynamic variables (e.g., current pathways, rate and volume of flow, upwelling and downwelling pulses) [13].Such parameters were subsequently used to conduct same-day mappings of bloom occurrences, to model onsets of blooms [14][15][16] and to forecast seasonal algal bloom occurrences [12].These investigations and mapping efforts provided the basis for the development of early warning systems based on (i) solid-phase adsorption toxin tracking [17], (ii) real-time field monitoring of chlorophyll and dissolved oxygen [18], and (iii) Moderate Resolution Imaging Spectroradiometer (MODIS)-derived fluorescence data to detect and monitor algal blooms [19][20][21].The latter (fluorescence) was found to be sensitive to chlorophyll-a concentrations [22][23][24][25].The development and operation of the overwhelming majority of these monitoring and forecasting systems require continuous current and archival field data (e.g., nutrient concentration in surface water).Unfortunately, such datasets are not present for many of the coastal areas where HAB monitoring and/or forecasting systems are needed.This study addresses this potential problem.Although our methodology does require continuous records of present and archival data, it instead utilizes readily available, global remote sensing datasets in the public domain.Additionally, limited field data, where available, are utilized.
Earlier studies that utilized remote sensing datasets in identifying and mapping the distribution of HABs focused on a limited number of ecological variables.Examples include utilization of a single ecological variable (e.g., chlorophyll-a concentration) [19,[26][27][28][29][30], two variables such as chlorophyll-a concentration and sea surface temperature (SST) [31][32][33][34][35][36][37] or chlorophyll and primary productivity [38], and three variables (chlorophyll, SST and wind) [39,40].A review article by Shen et al. [41] indicated that most of the remote sensing-based detection techniques of HABs were restricted to three parameters or less and these limited number of parameters do not fully constrain ecosystem model parameters [41,42].Although more robust field-based HAB detection and early warning systems are in place in some areas [13], those systems are absent in many other locations where there is a need to monitor and predict HAB occurrences.Their absence could be related to the extensive resources needed to construct and maintain monitoring networks, to support the continuous sampling and analysis (geochemical, biological, and physical) of the investigated water bodies.In this study, we develop methodologies that utilize a large number of remote sensing-based water quality parameters together with optical properties that are extracted from readily available remote sensing datasets to map HAB occurrences and predict their distribution.
The study area is in the Charlotte County, Florida; it incorporates the county's coastal areas (width: 15 to 30 km) and nearby estuaries (Figure 1).Like many other coastal areas within the Gulf of Mexico, the study area has been subjected to persistent HAB outbreaks that pose serious environmental challenges to the county's tourism and fishery industries [8].Unfortunately, the study area lacks continuous field-based monitoring of water quality as it is challenging to cover large geographic areas with limited resources [43].The primary goals of this study involved identifying the factor(s) controlling HAB occurrences in the study area, developing same-day mapping and predictive models for HAB occurrences by utilizing daily remote sensing data, disseminating our findings, and automating the process.

Materials and Methods
We accomplished the goals described above by developing multivariate linear regression statistical models, distributing our findings via a web-based interface and utilizing a geographic information system (GIS) framework for automation purposes.Data-driven models that rely on historical remote sensing and corresponding field data were developed to identify factors controlling the algal blooms and to forecast their occurrences.An inventory was compiled for the reported (dates and locations) HABs in the coastal waters surrounding Charlotte County by the Florida Fish and Wildlife Conservation Commission's Fish and Wildlife Research Institute (FWRI: http://myfwc.com/research/redtide/monitoring/database/),and a database was generated for remote sensing datasets that were acquired during the reported HAB occurrences.The compiled satellite and field data covered the period between January 2010 and October 2017 in which 213 HAB events were reported.The workflow involved three major steps: (1) downloading and processing of daily MODIS data; (2) developing multivariate regression models based on historical HAB occurrences; and (3) using the model for same-day mapping and forecasting HAB, automating the process, and publishing the findings (Figure 2).

Materials and Methods
We accomplished the goals described above by developing multivariate linear regression statistical models, distributing our findings via a web-based interface and utilizing a geographic information system (GIS) framework for automation purposes.Data-driven models that rely on historical remote sensing and corresponding field data were developed to identify factors controlling the algal blooms and to forecast their occurrences.An inventory was compiled for the reported (dates and locations) HABs in the coastal waters surrounding Charlotte County by the Florida Fish and Wildlife Conservation Commission's Fish and Wildlife Research Institute (FWRI: http://myfwc.com/research/redtide/monitoring/database/), and a database was generated for remote sensing datasets that were acquired during the reported HAB occurrences.The compiled satellite and field data covered the period between January 2010 and October 2017 in which 213 HAB events were reported.The workflow involved three major steps: (1) downloading and processing of daily MODIS data; (2) developing multivariate regression models based on historical HAB occurrences; and (3) using the model for same-day mapping and forecasting HAB, automating the process, and publishing the findings (Figure 2).

Step 1
The first step involved the identification of temporal ocean color products and spatial variables that could control, or correlate with, the distribution of algal blooms in general and/or the HAB in the study area (in our case K. brevis).The selection of these variables was largely based on reported findings from similar settings elsewhere and, to a lesser extent, on our observations.This step involved automatic downloading and processing of daily ocean color data products acquired by the National Aeronautics and Space Administration (NASA) MODIS Aqua satellite.NASA's ocean color processing website (https://oceancolor.gsfc.nasa.gov/)provides an option for periodical data download for specified regions via a free data subscription service.We specified southwestern Florida as a region of interest, Aqua MODIS as a source of data and daily data as a download option.The automatic data download was scheduled using the task scheduling programs available within the Linux environment.The downloaded Level 0 data was processed to Level 1 and later to Level 2 using SeaDAS (NASA, Greenbelt, MD, USA, version 7.4) Ocean Color Science Software (OCSSW).Level 1 data has the radiometric and geometric calibrations applied and the ocean data products were extracted during the Level 2 processing.The applications of these calibrations correct for differences in acquisition geometry for the scenes although minor variation in ocean color products is unavoidable [44].The OCSSW software was used to extract relevant temporal variables as shown in Table 1.The table shows the input, output, processor and parameters specified in the command line operator in Linux environment to enable unattended data extraction.A total of 13 ocean color products were extracted from the downloaded MODIS products.These products include: euphotic depth, ocean chlorophyll three-band algorithm for MODIS (chlorophyll-a OC3M), chlorophyll-a Generalized Inherent Optical Property (GIOP), chlorophyll-a Garver-Siegel-Maritorena (GSM), fluorescence line height (FLH), a diffuse attenuation coefficient for downwelling

Step 1
The first step involved the identification of temporal ocean color products and spatial variables that could control, or correlate with, the distribution of algal blooms in general and/or the HAB in the study area (in our case K. brevis).The selection of these variables was largely based on reported findings from similar settings elsewhere and, to a lesser extent, on our observations.This step involved automatic downloading and processing of daily ocean color data products acquired by the National Aeronautics and Space Administration (NASA) MODIS Aqua satellite.NASA's ocean color processing website (https://oceancolor.gsfc.nasa.gov/)provides an option for periodical data download for specified regions via a free data subscription service.We specified southwestern Florida as a region of interest, Aqua MODIS as a source of data and daily data as a download option.The automatic data download was scheduled using the task scheduling programs available within the Linux environment.The downloaded Level 0 data was processed to Level 1 and later to Level 2 using SeaDAS (NASA, Greenbelt, MD, USA, version 7.4) Ocean Color Science Software (OCSSW).Level 1 data has the radiometric and geometric calibrations applied and the ocean data products were extracted during the Level 2 processing.The applications of these calibrations correct for differences in acquisition geometry for the scenes although minor variation in ocean color products is unavoidable [44].The OCSSW software was used to extract relevant temporal variables as shown in Table 1.The table shows the input, output, processor and parameters specified in the command line operator in Linux environment to enable unattended data extraction.A total of 13 ocean color products were extracted from the downloaded MODIS products.These products include: euphotic depth, ocean chlorophyll three-band algorithm for MODIS (chlorophyll-a OC3M), chlorophyll-a Generalized Inherent Optical Property (GIOP), chlorophyll-a Garver-Siegel-Maritorena (GSM), fluorescence line height (FLH), a diffuse attenuation coefficient for downwelling irradiance at 490 nm (Kd_490), particulate backscattering coefficient at 547 nm (bbp_547_giop), turbidity index, sea surface temperature (SST), wind direction, wind speed, chromophoric dissolved organic matter (CDOM) index [45] and Secchi disk depth (Zsd morel, based on Morel version) [46,47].Additional spatially relevant variables were considered, as well.Our preliminary inspection of these products revealed large and rapid variations in chlorophyll-a content, SST, the attenuation coefficient, and euphotic depth in proximity to the shoreline and to the freshwater outlets (river mouth; Figure 3), thus suggesting that bathymetry and distance from the river mouth should be incorporated in the model's development.Uniformly spaced grid points were used to extract the values from products of different resolutions and subsequent processing was done on the same grid to achieve computational efficiency.The collected ocean color products were later checked for consistency and significance.Discontinuous data were not considered.For example, the data for CDOM index was found to be discontinuous and patchy over the investigated period (2010 to 2017) and was thus omitted from the list of potential variables considered for model development.
An exploratory stepwise linear regression was conducted to identify the determinant and significant variables, as well as the optimum combination of the variables.Spatial Statistics extension in ArcGIS together with Minitab software were used for these analyses.The significance of the variables was investigated using the p-value and R-square value.Variables that were found to be highly correlated (redundant) and insignificant were omitted.The variables that contributed to the multicollinearity (redundant variables) were identified using the extracted Variance Inflation Factor (VIF) [48] values.A variable with a VIF value exceeding 7.5 was considered redundant with the second highest VIF value.In cases where multiple variables were identified as being redundant, the significant variables were retained and the insignificant ones were omitted.Using water clarity measurements as an example, Secchi disk depth was found to be redundant with euphotic depth, and the former was found to be less significant and was dropped.Following the omission of redundant variables, the multivariate regression was run again to make sure the R-square value and model's significance did not decrease.The overall target of this iterative exercise was to obtain the highest R-square value with a minimal number of significant variables.Only 13 of the initial 15 variables were considered for model construction.The spatial and temporal variables included in the model are explained below.

Euphotic Depth (m)
The euphotic depth represents the depth at which 1% of the light incident on the ocean's surface can reach [47,49,50].This depth provides a measure of the depth where light penetrates, nutrients and algae diminish, and productivity decreases [51].Water bodies with low euphotic depths generally have a high nutrient content, are more productive and eutrophic [52], and provide favorable conditions for HAB development [53].The euphotic depth was calculated using the technique documented in a previous study [47].The wind direction and speed can affect the distribution of algal blooms in three major ways: (1) prevailing wind directions create ocean currents and water exchanges that transport HAB cells [9,54] and biotoxins [9]; (2) wind and bathymetry guide the location of nutrient upwelling facilitating the concentration of the algae [13]; and (3) winds can transfer the aerosols [21] promoting the growth of toxic phytoplankton [55].The wind direction and wind speed were calculated using a reflectance model based on the Cox-Munk wave-slope distribution [56].
The concentration of chlorophyll-a provides direct measurements of the growth of the algae in aquatic environments [57].Three different types of algorithms were used to compute the chlorophyll-a content: chlorophyll-a OC3M (ocean chlorophyll three-band algorithm for MODIS [58]), chlorophyll-a GSM (Garver-Siegel-Maritorena [59]) and chlorophyll-a GIOP (Generalized Inherent Optical Property [60]).These algorithms use different sets of reflectance bands to estimate phytoplankton biomass [61] and these have been validated with field observations in different parts of the world [62][63][64][65].An increased chlorophyll-a concentration has been taken as a strong indicator of HAB distribution [66,67], and chlorophyll-a OC3M data has been used for detecting HAB along the west coast of Florida [20].Three types of chlorophyll-a measurements were considered in this study as they were found to be correlated with algal cell count during the exploratory multivariate regression.These products were not redundant to each other suggesting that the algorithms were either picking up unique spectral signatures exhibited by chlorophyll-a in the optically complex estuarine environment, or it may be the results of uncertainties in the algorithms [68].

Diffuse Attenuation Coefficient
The Diffuse Attenuation Coefficient for downwelling irradiance at 490 nm (Kd_490; m-1) measures the attenuation of the light (blue to green) for turbid water [69,70].A study in the Bohai Sea [71] showed that the attenuation coefficient can be used as a proxy for the growth of phytoplankton in turbid coastal waters given that the blue to green light attenuation positively correlates with scattering particles (e.g., HABs).A high correlation between chlorophyll-a concentration and diffuse attenuation coefficient was observed under harmful red tide conditions in the Persian Gulf using MODIS data [72].In another study done in the coastal waters of India, HABs were detected using satellite derived chlorophyll-a and diffuse attenuation coefficient images and were also validated through in situ measurement [37].The diffuse attenuation coefficient was calculated using the technique described in a previous study [70].

Turbidity Index
The turbidity index provides a measure for the clarity of the water through the scattering of light caused by suspended particles [73,74].Spatial and temporal variations of turbidity in water bodies has been successfully used to identify phytoplankton blooms [75,76].Although a turbidity index is not a direct indicator of HAB occurrences, it has been successfully used to estimate the severity of a HAB once it was independently detected [77].The turbidity index was calculated using procedures described in a previous study [78].
2.1.6.Particulate Backscattering Coefficient at 547 nm This is the backscattering coefficient of particles at 547 nm.The backscattering coefficient as determined by satellite sensors and in situ measurements has been used in the past to identify the distribution of HABs.A research study [79] employed satellite-based and underwater glider measurements of the backscattering coefficient at 547 nm to detect K. brevis blooms in the Gulf of Mexico and verified their findings by in situ observations.A backscattering coefficient at 551 nm extracted from a Visible Infrared Imaging Radiometer Suite (VIIRS) sensor, which is analogous to the MODIS backscattering coefficient at 547 nm, was used in conjunction with fluorescence data to detect the K. brevis bloom at the West Florida shelf [80].In the same area, in situ measurements of the backscattering coefficient at 551 nm and chlorophyll-a data were successfully used to detect a K. brevis bloom [81].The backscatter coefficient of particles at 547 nm was calculated using an algorithm available in the literature [82,83].
2.1.7.Sea Surface Temperature ( • C) SST influences phytoplankton productivity in multiple ways: (i) individual biological species including algal blooms thrive under different and specific temperature regimes, and (ii) the availability and solubility of many biochemical materials needed for their growth and development is temperature dependent [54,84].Many studies have shown a correlation between SST and algal bloom distributions in the Mediterranean Basin [85], Kuwait Bay [86,87] and on a global scale [88].The productivity of K. brevis increases in the fall and early spring at the west Florida shelf primarily because of the ideal temperature conditions during these times [19].Increased SST was found to be conducive to HAB development in the coastal waters of Oman [89] and in Gulf of Mexico [90].

Fluorescence Line Height (FLH)
Fluorescence line height (FLH) provides the relative measure of radiance leaving the sea surface in the chlorophyll fluorescence emission band [91].It has been successfully used in the detection of chlorophyll-a in several studies [22,23,25] including the one in southwestern Florida [19].A review of previous studies shows a positive correlation between the MODIS-derived fluorescence and chlorophyll-a concentration in ocean waters with algal blooms [91,92].More recently, an in situ FLH measurement was done in conjunction with the backscattering coefficient to map the distribution of K. brevis in the Gulf of Mexico [79].Similarly, FLH derived from VIIRS was used to detect K. brevis blooms at the West Florida Shelf [80].
Although other pigments (chlorophyll-b, chlorophyll-c, phycoerythrin and carotenoids) are common in HAB, chlorophyll-a estimate is the first choice in oceanography because of the practical reasons [93,94].It has been difficult to attain the detection limit of phycoerythrin using MODIS, instead there has been more efforts on the absorption bands at 495 nm and 545 nm [95].MODIS provides fluorescence band (676 nm) to derive FLH primarily for HAB detection [41,96].FLH has been successfully used to detect K. brevis bloom in the Gulf of Mexico [19,29].The concentration of K. brevis was found to have direct correlation with FLH in the Charlotte harbor in Florida [19].

Bathymetry (m)
Shelf properties, including bottom topography, influence the distribution of HAB in many ways [97].For example, water stratification, which is controlled in part by bottom topography, inhibits productivity [98], whereas the vertical mixing and added nutrient supply in shallow waters can enhance the primary productivity in coastal ecosystems [98].Our study site, and the continental shelf systems and coastal areas in general, are considered to be vulnerable to HAB occurrences due to the accumulation of biomass [99].Bathymetric data acquired from United States Geological Survey (https://coastal.er.usgs.gov/flash/bathy-entireFLSH.html) were used as one of the spatial variables.enhance the primary productivity in coastal ecosystems [98].Our study site, and the continental shelf systems and coastal areas in general, are considered to be vulnerable to HAB occurrences due to the accumulation of biomass [99].Bathymetric data acquired from United States Geological Survey (https://coastal.er.usgs.gov/flash/bathy-entireFLSH.html) were used as one of the spatial variables.

Step 2
The logarithm of K. brevis cell counts (base 10) in samples as analyzed by FWRI was used as the response variable because the growth of the algae takes place exponentially.Measurements were largely performed in response to reported K. brevis blooms around Charlotte County, Lee County to the south, and Sarasota County to the north.The cell count responses were lumped into three groups:

Step 2
The logarithm of K. brevis cell counts (base 10) in samples as analyzed by FWRI was used as the response variable because the growth of the algae takes place exponentially.Measurements were largely performed in response to reported K. brevis blooms around Charlotte County, Lee County to the south, and Sarasota County to the north.The cell count responses were lumped into three groups: (i) no bloom (cell count ≤ 300 per/L), (ii) low concentration (cell count > 300 and <10,000 per L) and (iii) high concentration (cell count ≥ 10,000 per L).We adopted the threshold values used by the Harmful Algal Bloom Observation System (https://habsos.noaa.gov/) in categorizing cell count to facilitate comparisons with NOAA's observations.During the investigated period (2010 to 2017), 128 blooms were reported with cell counts higher than 300 per L. Each of the input variables was normalized to the −1 to +1 range because the inputs displayed large variations in range and magnitude.Such variations, if not accounted for, could affect model outputs.For each inventoried location, we extracted the values of the normalized input variables.
Four linear regression models were constructed: (i) same-day; (ii) one day in advance; (iii) two days in advance; and (iv) three days in advance.For each of the models, data were divided into training (80%) and testing (20%) datasets.The training data were used to develop the regression, and the accuracy assessment was done on the testing datasets.For the same-day model, the regression was conducted on 80% of the reported HABs occurrences (102 unique-day bloom events) against the variables on bloom days (days when blooms were reported).For the one day in advance model, the regression was conducted for the response versus the variables acquired one day in advance of the bloom day.Similarly, for the two-day and three-day in advance models, the response was regressed against the variables acquired two and three days in advance, respectively.Each of the four models had its individual datasets (response and variables) for regression and validation.Predictive models (ii, iii, and iv) were designed to capture the onset of the HABs in contrast to those that developed days earlier and continued in the following days.To this end, a bloom reported on day n was excluded from the one-day advance dataset if another was reported in the same location in day n−1 .Similarly, a bloom reported on day n was excluded from the two-day dataset if another was reported in the same location in day n−1 or day n−2 and from the three-day dataset if a bloom was reported on day n−1 , day n−2 or days n−3 .The multivariate regression model was developed for each group of the data.For any new satellite data for any specific day, a respective regression was used to predict the HAB on the same-day and one, two and three days in advance of the potential HAB occurrence.

Step 3
The generated regression equations were utilized for same-day mapping and one-, two-and three-day advance predictions of HAB.The regression models were developed for three bloom lag periods and applied to the collective set of variables.The results (same-day mapping and one-, twoand three-day advance prediction of HAB) are published on our website (http://www.esrs.wmich.edu/webmap/bloom/) using the ArcGIS server and ArcGIS API for JavaScript.The MODIS data for every day is acquired at ~4 pm, is made available for download on NASA's website at ~5 pm, and is processed for HAB occurrences and published on our website at ~10 pm.This process was coded in Python 2.7 to allow the program to run automatically at the same time every day.

Results
The prediction was done in two phases: (a) nowcasting and (b) one, two and three days in advance forecasting.The model outputs are provided in Tables 2 and 3. Table 2 lists the selected variables with their relative significance (in percentage) for the same-day and the one-, two-and three-day predictions.Table 3 provides the multivariate regression coefficients for each of the selected variables for the same-day and the one-, two-and three-day predictions.The sign (±) in front of the coefficient for each variable indicates the nature (positive/negative) of the relationship between the variable in question and the response.The assessment of the performance of the four models is presented in Figure 4.The accuracy of the same day was the highest (90.5%), and the accuracies of the one-day, two-day, and three-day prediction models were assessed at 65.6%, 72.1%, and 71.9%, respectively.The prediction accuracies were calculated based on the three categories of the cell count that were pre-established instead of using binary value (presence and absence of the bloom) as an indicator of the success.In order to obtain stringent prediction criteria, we integrated locational accuracy as a part of verification process as we were using cell count data with spatial information provided by FWRI.Chlorophyll-a (GSM) ( Chlorophyll-a (GSM) ( Chlorophyll-a (GSM) (  Figure 4. Assessment of the accuracy of the generated multivariate models (same-day mapping and one-, two-and three-day advanced predictions).The points on the diagonal line represent the bloom events that were observed and also predicted.The points on the vertical axis represents the algal bloom events that were observed but not predicted by the model.The points on the horizontal axis are the algal bloom events that were predicted but not observed.The accuracy of each prediction models are given in parenthesis.

Discussion
The information provided in Tables 2 and 3 can be used to interpret the nature of the relationship between HAB occurrences and the individual variables.The information can also be used to determine the directionality (negative or positive) of the relationship and evaluate the comparative significance.For same-day mapping (or nowcasting), the bathymetry, euphotic depth, wind Assessment of the accuracy of the generated multivariate models (same-day mapping and one-, two-and three-day advanced predictions).The points on the diagonal line represent the bloom events that were observed and also predicted.The points on the vertical axis represents the algal bloom events that were observed but not predicted by the model.The points on the horizontal axis are the algal bloom events that were predicted but not observed.The accuracy of each prediction models are given in parenthesis.

Discussion
The information provided in Tables 2 and 3 can be used to interpret the nature of the relationship between HAB occurrences and the individual variables.The information can also be used to determine the directionality (negative or positive) of the relationship and evaluate the comparative significance.For same-day mapping (or nowcasting), the bathymetry, euphotic depth, wind direction, chlorophyll-a (OC3M, and wind speed were found to have a 78% contribution to the response variable as presented in Table 2.For the one-day forecasting, bathymetry, SST, wind direction, chlorophyll-a (OC3M), and diffuse attenuation coefficient (KD_490) were found to have a 65% contribution to the response variable.For the two-day forecasting, euphotic depth, chlorophyll-a (OC3M), distance to river mouth, diffuse attenuation coefficient (KD_490) and SST were found to have 69% contribution to the response variables.The euphotic depth, distance to river mouth, chlorophyll-a (OC3M), wind direction and SST had a 67% contribution to the response variable for the three-day forecasting.
A 1:1 correspondence in the ranking and significance of variables in the four models should not be expected given that the variables could have varying lag time effects on HAB development.For example, a study [104] found a positive correlation between algal bloom events and nitrate and ammonium concentrations as early as five days prior to the bloom.Similarly, a study [105] found a positive correlation between HAB occurrences and temperature and aerosols particle distribution, which are the air-borne sources of phosphate, iron and trace elements in the East China Sea.Higher concentrations of phosphorous and iron above the threshold did not correlate with the HAB events because these are limiting nutrients for HAB growth.The increase in concentration of nitrogen, however, correlated with the HAB concentration.The lag time between the spike in the nitrogen concentration in the aerosols and HAB event was two days.Similarly, in the coastal waters of Charlotte County, one can attribute the significance and high ranking of some variables (distance to river mouth, and chlorophyll-a) in our two-and three-day predictive models to the presence of a two day lag time for nutrients in rivers to reach the coastal waters and induce HABs.
Inspection of Tables 2 and 3 reveals differences in the rankings and significance of the variables between the four solutions.However, there are multiple variables that appear to be significant (significance ≥ 5%) for three or more of the four solutions.These include chlorophyll-a, Euphotic depth, SST, wind direction, Chlorophyll-a OC3M, distance to river mouth and turbidity index.Other variables appear to be consistently less important (significance < 5%) in 3 or more of the model outputs.These include bathymetry, wind speed, chlorophyll-a GIOP, fluorescence line height, diffuse attenuation coefficient (Kd_490), chlorophyll-a GSM and particulate backscattering coefficient (bbp_547_giop).
Additional spatial and temporal relations are inferred from Tables 1 and 2. Shallow bathymetry seems to be a significant factor for same-day predictions.The association of HABs with shallow bathymetry was inferred from the −ve sign of the coefficient for the bathymetry variable for the same-day prediction.Similarly, the association of HABs with increasing euphotic depth and turbidity is inferred from the +ve sign for the coefficient for these two variables (euphotic depth and turbidity index) in each of the four models.The chlorophyll-a (OC3M) content and wind direction show a positive correlation with bloom occurrences for all lag times, but the significance and rank varies for the investigated models.SST seems to be less significant on the day of the bloom compared to the one-day, two-days and three-days advanced predictions.Blooms occur at cooler SST as indicated by the −ve coefficients.For same-day predictions, the shorter the distance from the river mouth, the more likely HABs will develop as evidenced by the −ve coefficient value (Table 3).We suspect that the above-mentioned observations (rankings and significance of variables and the spatial and temporal relations) are largely of local nature, and thus comparisons with findings from published works are not straightforward.Moreover, such comparisons are also hampered by the paucity of similar applications that involve a large number of variables.
Despite the local and empirical nature of regression models, we favored this method over the analytical and semi-analytical solutions given the lack of continuous field-based datasets that are required for the application of these analytical approaches.The multivariate regression method was also favored over other statistical approaches (e.g., artificial neural networks, principal component analysis) that do not provide insights into the nature and the contributions of the factors controlling HAB occurrences.Although other techniques may provide better one time prediction than multivariate regression, this is the only technique supported in ArcGIS server environment that allows publication of results in the present form via web-based GIS.This is the reason why an operational algal bloom early warning system are nonexistent that utilize other prediction techniques (e.g., machine learning, hydrogeological modeling, principal component analysis) and are based on satellite-derived variables on a daily basis.
There are several limitations with the applied methodology.There can be differences between the time a bloom was reported and the time it was captured by the satellite imagery.The vertical and lateral movement of the algae can also occur in the water column due to changes in temperature, stratification and other factors.In field controlled and natural environments, a previous study [106] showed a decrease in fluorescence before reaching the maximum value under natural photosynthetically available radiation, while another study [107] reported diurnal variations in algae populations in the surface and in the mid-column.Similar diurnal variations were reported for K. brevis in the West Florida Shelf [108].Our approach does not consider this, but it assumes that the algae lie on the surface and are stationary.With the current temporal and spatial resolution of MODIS, diurnal physiological and ecological variations are not captured in the analysis.This could also be the reason why the significance of independent variables is fluctuating in different lag days (Table 2) in our model leading to low prediction accuracy one day prior to the bloom compared to other days as shown in Figure 4.Because of these practical limitations, physiological responses of the algae cannot be understood using the satellite data alone.
The developed models apply constant lag times of one, two or three days for all of the variables.Ideal models should instead apply lag times that produce an optimum target response.These lag times will undoubtedly differ from one variable to another, and such an application will enhance the predictability of the model.Unfortunately, this enhancement is difficult to achieve as that it requires a continuous acquisition of satellite imagery over an extended period, and such daily acquisitions are halted on cloudy days.Due to the weather dependency of the temporal variables, it is difficult to develop models that include multiple variables with varying time lags.Additionally, with the coarse spatial resolution of MODIS derived variables (e.g., SST: 1000 m; turbidity index: 500), the predictability of HAB detection is limited.The option to downscale satellite derived data is also impractical due to the lack of real time measurements at a finer scale.In this context, we are utilizing proxies (e.g., euphotic depth and turbidity indices) that can account for the nutrient content in the aquatic system despite its resolution.The predictability of the developed models could be improved with the inclusion of daily measurements of nitrate and phosphate in the water that can be used in the model.The current problem of discontinuous and coarse resolution of the MODIS data could be addressed locally if it was replaced by Unmanned Aerial Vehicles (UAVs) datasets.If we were to use UAV-generated datasets, the investigated area will have to be narrowed down to the immediate coastal waters to reduce the operational costs and to simplify the logistics involved in permitting flights.UAVs acquire high-resolution images devoid of atmospheric influences.These new data acquisition systems could increase the accuracy, predictability and replicability of our model in Charlotte County and elsewhere in the world.These methods would enable the construction of robust models that account for varying lag times and produce high-resolution (spatial and temporal) prediction.Although, the current MODIS spectral resolution is not perfect for HAB detection, it allows daily prediction until other option such as UAV is pursued in this area.The adopted empirical methodology could be applicable to many other coastal areas worldwide, yet it is to be expected that different sets of regression relationships should be developed for the individual areas to represent the local conditions that affect HAB occurrences.

Conclusions
The study focused on developing an early warning system for K. brevis-related HABs off the coast of South Florida.We used historical field HAB data from 2010 to 2017 to develop a multivariate regression and determine the significance of the variables for different prediction scenarios.The prediction system involved the same-day nowcasting method and forecasting for one, two and three days in advance of the onset of the bloom.The same-day nowcasting provided 90% accuracy, whereas the one, two and three days in advance forecasting provided 65%, 72% and 71% accuracies, respectively.The investigation took advantage of ocean color data to develop methodologies and procedures that may enhance decision-making processes, improve citizens' quality of life, and strengthen the local economy.Even though this project focuses on the K. brevis related HAB in Charlotte County and its surrounding neighbors, the model can be replicated for other species and can be applied in other areas.The prediction system can be utilized to plan uses of coastal waters for recreational purposes and other environmental services.Monitoring the extent and intensity of HABs could be used to improve the environmental and socioeconomic status of this area and develop long-term environmental programs and policies.This monitoring and early warning system for HABs could provide benefits, in Charlotte County to the public, policy makers, and the scientific community and could assist local agencies in developing solutions and plans to mitigate HABs.
Remote Sens. 2018, 10, x FOR PEER REVIEW 3 of 19 the factor(s) controlling HAB occurrences in the study area, developing same-day mapping and predictive models for HAB occurrences by utilizing daily remote sensing data, disseminating our findings, and automating the process.

Figure 1 .
Figure 1. Figure showing the study area, which covers coastal waters (width: 15-30 km) surrounding Charlotte County in Florida.The study area also covers the brackish water within the estuarine systems where freshwater and seawater mix.

Figure 1 .
Figure 1. Figure showing the study area, which covers coastal waters (width: 15-30 km) surrounding Charlotte County in Florida.The study area also covers the brackish water within the estuarine systems where freshwater and seawater mix.

Figure 2 .
Figure 2. Three-step workflow established for harmful algal blooms mapping and forecasting.

Figure 2 .
Figure 2. Three-step workflow established for harmful algal blooms mapping and forecasting.

2. 1 . 10 .
Distance from the River Mouth (m) Riverine organics are major sources of nutrients for the West Florida Shelf of the Gulf of Mexico [100].The riverine discharge provides high nutrient loads [101] that largely control the phytoplankton population and eutrophication around the river discharge locations and adjoining estuarine systems [102,103].The distance from the mouth of the river was computed using the Euclidean Distance function in ArcGIS (Environmental Systems Research Institute, Redlands, CA, USA, version 10.5).Remote Sens. 2018, 10, x FOR PEER REVIEW 8 of 19

2. 1 . 10 .
Distance from the River Mouth (m) Riverine organics are major sources of nutrients for the West Florida Shelf of the Gulf of Mexico[100].The riverine discharge provides high nutrient loads [101] that largely control the phytoplankton population and eutrophication around the river discharge locations and adjoining estuarine systems[102,103].The distance from the mouth of the river was computed using the Euclidean Distance function in ArcGIS (Environmental Systems Research Institute, Redlands, CA, USA, version 10.5).

Figure 3 .
Figure 3. Mean values for the significant variables including chlorophyll-a (OC3M), SST, diffuse attenuation coefficient, and euphotic depth calculated from MODIS products acquired throughout the period 2010 to 2017 over the study area.The distance from river mouth and bathymetry data are also shown.

Figure 3 .
Figure 3. Mean values for the significant variables including chlorophyll-a (OC3M), SST, diffuse attenuation coefficient, and euphotic depth calculated from MODIS products acquired throughout the period 2010 to 2017 over the study area.The distance from river mouth and bathymetry data are also shown.

Figure 4 .
Figure 4. Assessment of the accuracy of the generated multivariate models (same-day mapping and one-, two-and three-day advanced predictions).The points on the diagonal line represent the bloom events that were observed and also predicted.The points on the vertical axis represents the algal bloom events that were observed but not predicted by the model.The points on the horizontal axis are the algal bloom events that were predicted but not observed.The accuracy of each prediction models are given in parenthesis.

Table 1 .
Overview of the inputs, outputs, processor and relevant parameters applied in SeaDAS OCSSW to extract level 2 products.

Table 2 .
Selected variables with their relative significance (in percentage) for same-day nowcasting, and one-, two-and three-day predictions.

Table 3 .
Multivariate regression coefficients for each variable in predicting HABs for same-day mapping, and one-, two-and three-day advanced predictions.