An EOF-Based Algorithm to Estimate Chlorophyll a Concentrations in Taihu Lake from MODIS Land-Band Measurements : Implications for Near Real-Time Applications and Forecasting Models

For near real-time water applications, the Moderate Resolution Imaging Spectroradiometers (MODIS) on Terra and Aqua are currently the only satellite instruments that can provide well-calibrated top-of-atmosphere (TOA) radiance data over the global aquatic environments. However, TOA radiance data in the MODIS ocean bands over turbid atmosphere in east China often saturate, leaving only four land bands to use. In this study, an approach based on Empirical Orthogonal Function (EOF) analysis has been developed and validated to estimate chlorophyll a concentrations (Chla, μg/L) in surface waters of Taihu Lake, the third largest freshwater lake in China. The EOF approach analyzed the spectral variance of normalized Rayleigh-corrected reflectance (Rrc) data at 469, 555, 645, and 859 nm, and subsequently related that variance to Chla using 28 concurrent MODIS and field measurements. This empirical algorithm was then validated using another 30 independent concurrent MODIS and field measurements. Image analysis and radiative transfer simulations indicated that the algorithm appeared to be tolerant to aerosol perturbations, with unbiased RMS uncertainties of <80% for Chla ranging between 3 and 100 μg/L. Application OPEN ACCESS Remote Sens. 2014, 6 10695 of the algorithm to a total of 853 MODIS images between 2000 and 2013 under cloud-free conditions revealed spatial distribution patterns and seasonal changes that are consistent to previous findings based on floating algae mats. The current study can provide additional quantitative estimates of Chla that can be assimilated in an existing forecast model, which showed improved performance over the use of a previous Chla algorithm. However, the empirical nature, relatively large uncertainties, and limited number of spectral bands all point to the need of further improvement in data availability and accuracy with future satellite sensors.


Introduction
A number of studies have shown increasing trends of phytoplankton blooms in coastal and inland waters around the world (e.g., [1]), often caused by excessive nutrients and other pollutants derived from agriculture, urbanization, and industries.In the southern Caribbean, increased nutrient concentrations around coral reefs were linked to land-based runoff [2].Similarly, recurrent phytoplankton blooms in the Gulf of California were attributed to agriculture irrigation and runoff [3].On the west Florida shelf, increased nutrient inputs from coastal runoff were thought to be at least one reason leading to the increased occurrences of blooms of the toxic dinoflagellate Karenia brevis between the 1950s and the 2000s [4].In coastal waters of the Bohai Sea, Yellow Sea, and East China Sea, the number and size of toxic algae blooms have increased since 1998 [5], likely due to coastal eutrophication.The increased frequency and severity of macroalgae blooms of Ulva prolifera in these waters were linked to the increased coastal aquaculture [6,7].In Taihu Lake of east China, blooms of the cyanobacterium Microcystis aeruginosa were found to increase in both frequency and duration, accompanied with increases in total nitrogen and phosphorus [8,9].
Accurate assessment of the bloom conditions in coastal and inland waters, in terms of the concentrations of the phytoplankton chlorophyll a pigment (Chla in μg/L), provides information on the eutrophication state and thus can help implement management plans and strategies [10].Timely delivery of Chla distribution information is also important for management actions including closure of shellfish beds in response to toxic blooms, or physical removal of certain algae in the case of cyanobacterial blooms (e.g., [11][12][13]).The ability to predict Chla distributions, on the other hand, is critical in helping make proactive short-term and long-term management decisions such as reducing nutrient discharges through coordinated regulations [14,15].
Unfortunately, none of the above three tasks (quantification, delivery, and prediction) is straightforward.First, traditional ship-borne measurements of water quality parameters (including Chla) are regarded as the most accurate, yet they often lack sufficient spatial and/or temporal coverage.Satellite remote sensing, on the other hand, provides frequent and synoptic measurements, but the accuracy of the satellite-based data products for the optically complex coastal and inland waters are often questionable.Recent algorithm development efforts have led to significant progress in improving the Chla data product accuracy using either spectral bands in the green and red [16][17][18][19][20][21][22][23][24][25], neural-networks [26],or empirical orthogonal function (EOF) approaches [26,27].However, most of these studies are based on in situ measured reflectance data.While in situ data provides the basis for algorithm development and validation, the validity of the algorithm needs to be evaluated using long-term satellite data.In contrast, Le et al. (2013) used remote sensing reflectance (Rrs, sr −1 ) data derived from Moderate Resolution Imaging Spectroradiometer (MODIS) measurements to establish a long-term Chla data record for a medium-sized estuary (Tampa Bay, ~1000 km 2 ) between 2003 and 2012 [25].This algorithm, however, was specifically tuned to the optical relationships for Tampa Bay, and its general applicability to other estuaries or lakes is unknown.In short, developing accurate, satellite-based Chla algorithms is still an active research area, especially for coastal and inland waters that are typically complex in their optical properties.
Second, even after an algorithm is developed and validated for a study region, timely delivery of the data products such as Chla is still problematic due to (1) lack of near real-time satellite data receiving and processing capacity and (2) frequent cloud cover, sun glint, and other non-optimal observing conditions.For the case of MODIS, global data are openly available (through the U.S. National Aeronautics and Space Administration, NASA) and thus can be used to develop near real-time data products without a ground antenna [28].Unfortunately, the spectral bands designed for water studies and used in the above algorithms are often saturated for some coastal and inland waters [29], leaving no data to use.Alternative Chla retrieval approaches must be developed to use the non-saturating bands to avoid such problems.
Finally, predicting Chla or other water quality parameters requires models based on either statistics (empirical approach, e.g., Le et al. (2013) for Chesapeake Bay [30] and Millie et al. (2013) for Sarasota Bay [31]) or coupled hydrodynamics and biology (physics-based approach, e.g., Popova et al. (2002) for the northeast Atlantic [32]; Hu et al. (2006) for Taihu Lake [33]).The latter approach has physical-and biological-governing equations explicitly built into the models, making it easier to test the connections between physical/biological forces and the biological responses.The parameterization of such an approach, especially for biological variables in small water bodies, is difficult due to the optical and biological complexity.One way to overcome such difficulties is to assimilate either in situ or satellite-derived data to guide and tune the coupled model.Indeed, data assimilation has been increasingly used to improve the model performance [32,[34][35][36][37][38][39][40][41].
When synoptic satellite measurement is used, one limiting factor of data assimilation is the large uncertainty in the assimilated satellite data.For data assimilation for Taihu Lake, Qi et al. (2014) [42] used Chla data products derived from the red and NIR bands (645 and 859 nm) of MODIS [43] , which suffer from interferences of sediment resuspension.Indeed, despite the effort in the past decade on algorithm development for inland water bodies [20,23,24,[44][45][46][47][48][49][50], for a number of reasons there is no reliable Chla algorithm that can be applied to MODIS data in near real-time for the purposes of both timely information delivery and data assimilation for Taihu Lake.
Given such urgent needs for near real-time Chla and lack of a practical algorithm to deliver such data, the objective of this study was thus to develop and validate a local algorithm to estimate Chla from MODIS measurements.The study is focused on Taihu Lake, yet the approach might be applicable to other similar water bodies and environments where MODIS ocean bands also saturate.

Study Area
With a drainage basin of 36,895 km 2 , Taihu Lake is the third largest freshwater lake in China, having an average water depth of ~1.9 m and surface area of ~2338 km 2 .The drainage basin covers some of the most developed regions in China including Jiangsu, Zhejiang, and Anhui Provinces and the Shanghai Municipality, which contributed ~10% of China's GDP with only 0.4% of China's territory [51,52].In recent years, increasing eutrophication and cyanobacterial blooms have been reported, posing a significant threat the environment and humans who rely on the lake for drinking water, tourism, aquaculture, recreation, and other activities [9,53].
Several studies have shown locally optimized Chla algorithms for Taihu Lake [24,42,46], yet they are all based on in situ Rrs.Other studies used satellite data to show the multi-year cyanobacterial bloom patterns [8,9], but the index used in the studies is sensitive to only intense blooms when cyanobacteria form surface mats.To date, a practice Chla retrieval algorithm targeted for near real-time applications still needs to be developed.

Field Data
A series of cruise surveys were conducted by the Nanjing Institute of Geography and Limnology, Chinese Academy of Science (NIGLAS) between October 2004 and May 2011 (Figure 1).Six of these surveys measured surface Chla.Of these samples, after quality control and searching for the matching MODIS data records 28 were used in this study to develop the Chla retrieval algorithm.These surveys were conducted in October 2004, May 2008, October 2008, May 2010, March 2011, and May 2011.
In addition to the surveys, measurements were also collected at six fixed locations by NIGLAS (Figure 1).The data were collected weekly from April to September every year from 2008 to 2010.Total nitrogen, total phosphorus and Chla for the collected water samples were analyzed in the laboratory.Similar to the samples collected and measured during the field surveys, after data quality control and searching for the matching MODIS data 30 of these samples were used as the independent dataset used to validate the developed Chla algorithm.
Water samples were collected near the surface.Chla was then determined spectrophotometrically following community-accepted protocols [54,55].Specifically, a 47-mm Whatman GF/F glass fiber filter was used to filter the water and was subsequently soaked in 90% ethanol for 4-6 h in the dark.The sample in ethanol was then heated to 80-90 °C for 3-5 min to extract the pigments.Absorbance at 665 and 750 nm of the extract was measured with a UV2401 spectrophotometer (Shimadzu Corp., Kyoto, Japan), and Chla was calculated against a blank reference (i.e., filtered water).There have been reports that Chla determined spectrophotometrically tended to be overestimated as compared with high-performance liquid chromatography (HPLC) method (e.g., Pinckney et al. [56]).However, the overestimation was small systematic bias instead of random error.Thus, the Chla dataset used in this study is self-consistent and will not impact algorithm development or studies on detecting anomalies.

MODIS Satellite Data
MODIS Level-0 data collected by both Terra (2000-present) and Aqua (2002-present) covering the study region were obtained from the NASA Goddard Space Flight Center through its Ocean Biology Processing Group [57].With a swath width of 2330 km, the two MODIS instruments cover the Earth's surface (including Taihu Lake) at least once per day, making this dataset suitable for near real-time applications.Both MODIS instruments have nine spectral bands from 412 to 869 nm at approximately 1-km ground resolution.These bands were designed for ocean use and are highly sensitive but with a narrow dynamic range [29].They are saturated most of the time over Taihu Lake due to the turbid atmosphere and water, and therefore are not used in this study.
The MODIS instruments also have seven spectral bands from 469 nm to 2130 nm designed for land and atmosphere use.They are less sensitive but cover a higher dynamic range than the ocean bands, and therefore rarely saturate in this region.Although not designed for the ocean, the bands have shown wide applications for coastal and inland waters [8,58] because of their wide coverage and medium resolutions.The ground resolution of the 645-and 859-nm bands is 250 m, and 500 m for the bands at 469-, 555-, 1240-, 1640-, and 2130-nm.Because the 1240-and 1640-nm bands often contain substantial noise due to detector artifacts, in this study, only four bands at 469-, 555-, 645-, and 859-nm were used.
MODIS data were processed using the software SeaDAS (version 7.0).First, the data were converted to calibrated radiance (Level-1B).Then, because a full atmospheric correction through SeaDAS often resulted in data loss due to incorrect data masking and high uncertainties in the retrieved remote sensing reflectance (Rrs, sr −1 ), a partial atmospheric correction to correct for the gaseous absorption (mainly by ozone) and Rayleigh (molecular) scattering effects was applied to the Level-1B data, resulting in Rayleigh corrected reflectance (Rrc, dimensionless): where ρt is the top of atmosphere (TOA) reflectance after adjustment of the gaseous absorption, ρr is the reflectance due to Rayleigh scattering, ρa is that due to aerosol scattering and aerosol-Rayleigh interactions, t and t0 are diffuse transmittance from the image pixel to the satellite and from the sun to the image pixel, respectively.Note that ρa, t, and t0 are functions of aerosol type, aerosol optical thickness, and solar/viewing geometry.The above formulation assumes negligible contributions from whitecaps and sun glint.
The Rrc data were mapped to a cylindrical equidistant projection for further analysis.First, the Rrc data at 645-nm, 555-nm, and 469-nm were used to compose the Red-Green-Blue true color images in order to screen for clouds and sun glint.After visual inspection, a total of 853 data granules between 2000 and 2013 were found to contain minimal cloud cover and sun glint, therefore suitable for algorithm development and time-series studies.The granules are evenly distributed in all four seasons, with approximately once per week frequency (Table 1).Potential aliasing due to infrequent measurements may be thus minimized at monthly to seasonal scales.The Rrc data were then queried to find the data concurrent (same day) and collocated with the field measurements (termed "matching pairs").For both algorithm development and validation, the following criteria were applied.To minimize the potential impact of straylight, a 3 × 3 pixel box centered at the field measurement location was selected.Only when at least five pixels of the 3 × 3 box contained valid Rrc data was the box used for further analysis.To assure spatial homogeneity, the variance of Rrc(555) in the box (standard deviation/mean) must be <10%, otherwise the box was discarded.The median value of the 3 × 3 box was used to compare with the field data.After quality control, only 28 field stations showed valid MODIS matching pairs (Figure 1), and these were used for algorithm development.Thirty other data points from the six fixed stations also showed MODIS matching pairs, and these were used to evaluate the algorithm.
To understand the impact of partial atmospheric correction (as compared with the full atmospheric correction) on the algorithm performance, a sensitivity test was conducted where various atmospheric parameters were extracted from the SeaDAS LUTs and passed to the algorithm as perturbations.These parameters include aerosol optical thickness at 869 nm (τ869), ρa, t0, t, and extraterrestrial solar constant (F0(λ)) corresponding to the solar/viewing geometry specified by the solar zenith angle (θo), satellite zenith angle (θ), and their relative azimuth angle (ϕ).

Algorithm Development
Figure 2a shows the MODIS Rrc spectra of the four land bands (469, 555, 645, 859 nm) corresponding to the 28 Chla matching pairs.Note that the high reflectance led to saturation of most of the MODIS 1-km ocean bands, leaving only four land bands to be used for the algorithm development.Two existing algorithms were first tested to evaluate their performance.These are the blue-green band ratio algorithm following the traditional approach [59] and the red-green band ratio algorithm following the approach of [25].The algorithms used the Rrc(469)/Rrc(555) and Rrc(645)/Rrc(555) ratios, respectively, to correlate with concurrent Chla.The Rrc(469)/Rrc(555) band ratio from MODIS had virtually no correlation with the field-measured Chla (Figure 2b).Although the Rrc(645)/Rrc(555) band ratio showed some correlation with the field-measured Chla, there was substantial data scattering.The poor performance of the band-ratio algorithms is mainly because that the band ratios were designed for Rrs instead of Rrc data, and the blue band contained information from not just particulate matter but also colored dissolved organic matter.Clearly, alternative approaches other than band ratios must be developed.
The EOF-based approach developed by [27] showed great potentials in deriving the water column IOPs, and it was adapted to estimate UV light attenuation in optically shallow waters in the Florida Keys from MODIS Rrs data in the visible bands [60] .The EOF approach is similar to principal component regression (PCR) or partial least squares (PLS) model, which have been used extensively in retrieving water quality parameters and land properties [61].The EOF is used to reduce multi-band reflectance data to several independent (i.e., uncorrelated) variables (i.e., EOF modes or scores, aka eigenvectors) that retain most of the variance in the original data.Changes in the concentration of a water constituent (e.g., Chla) will affect the multi-band reflectance, thus will be correlated to one or more of the EOF modes.These modes can then be used in a stepwise fashion to reconstruct the corresponding Chla record, from which an empirical regression model is established.Further details of the original EOF approach and the modified approach can be found in [27,60], respectively.The approach was modified in this study to use the four-band MODIS Rrc data as the input.Specifically, the EOF analysis was implemented using the MATLAB™ function princomp.The input for this function consisted of an (N × m) array of Rrc(λ) spectra, where N was the number of samples, and m is the number of MODIS land band (m = 4).Following [27] and to focus on the variability in the spectral shape (as opposed to amplitude), MODIS Rrc spectra were normalized first: where <Rrc(λ)> is the normalized spectra [27,60].EOFs were then constructed using the MODIS <Rrc(λ)>, with each EOF expressed as a vector of four loadings corresponding to the four spectral bands.
Because only 4 MODIS bands were used in this analysis, the four EOF modes explained 100% of the variance in the <Rrc(λ)>.The resulting scores for each of these EOF modes were used as the independent variables in a stepwise multiple regression (i.e., stepwise principal component regression), with the dependent variable being the field-measured Chla.

Algorithm Evaluation
Several commonly accepted measures were used to assess algorithm performance.These include R 2 and Root-Mean-Square-Error (RMSE) in log space and unbiased RMSE (URMSE) in relative percentage (100%): where xi and yi refer to the measured and modeled values for the ith sample.The use of the log form was to follow the tradition for log-normal distributions [62].URMSE (2(y − x)/(y + x)) was used instead of the typical RMSE ((y − x)/x) to avoid outliers that cause skewed error distributions [63].This is particularly important when both x and y may contain large errors [64].

Algorithm Development
Figure 3a shows the algorithm performance using the 28 data pairs from MODIS and in situ measurements.Despite the data scattering around the 1:1 line, there is a statistically significant correlation between the EOF-modeled Chla and measured Chla, with coefficient of determination (R 2 ) of 0.47.Unbiased RMSE is 63.3% and RMSE(log) is 0.29 for Chla ranging between ~3 and ~50 μg/L, slightly higher than that for the global open ocean (~0.27, [65]).Some of the data scatter might be caused by the possible patchiness in this optically complex lake and also potential errors in the measured Chla due to low sampling volume.Thus, the performance of the algorithm may be acceptable, especially when considering that only four land bands were used in the EOF-based algorithm and only a partial atmospheric correction was performed.

Validation Using MODIS Data and Field Data
The algorithm was further validated using independent data collected from the 6 fixed stations.The 30 matching pairs led to performance statistics slightly worse than those from the algorithm development (Figure 3b), with URMSE = 77.6%,RMSE(log) = 0.36, and R 2 = 0.37.However, for the same arguments as above, the performance might be acceptable as long as the retrieved Chla patterns are spatially coherent and temporally consistent, as shown below.

Validation Using MODIS Data Alone
Regardless of the amount of effort expended in collecting field data, these data are always limited in space and time.An additional method to validate algorithms is through consistency measures where satellite data products of adjacent days are examined for their spatial and temporal patterns.This method has been used in validating an open-ocean Chla algorithm developed recently for all ocean color sensors [64], and is adopted here.
First, the EOF-based algorithm was applied to all MODIS images.After a static landmask was applied, the images were screened to exclude pixels contaminated with clouds and sun glint [8].Then, Rrc(λ) spectra from all valid pixels were normalized using Equation (4), which were then used as the input of the MATLAB™ princomp function to obtain the EOF scores for each Rrc spectrum (pixel).Finally, Chla for that pixel was derived as where β0-4 are the regression coefficients derived from the 28 data pairs in the algorithm development, and S1-4 are the EOF scores [27].Note that of the four EOF scores only the first three would explain at least 5% of the variance from the input Rrc data, yet for completeness the fourth score is kept here.In contrast, for hyperspectral or multi-spectral data with many bands (e.g., 10 or more), for simplicity only the first several scores are used in the regression (e.g., [60]).
Such derived MODIS Chla images were examined one by one to determine whether Chla spatial patterns were coherent (i.e., there was no significant edging effect or sharp boundaries between different water masses) and whether Chla temporal patterns in the same locations were consistent under different observing conditions.c,d) while the bottom panels show the retrieved Chla (e,f,g,h).The RGB images in winter (a,b) show significant variations in the aerosol content (haziness); yet the retrieved Chla images (e,f) are consistent in the large-scale spatial patterns and Chla magnitudes (e.g., higher Chla in the southern lake than in the northern lake); Likewise, for the bloom cases in summer, the retrieved Chla patterns in (g,h) appear to be insensitive to different aerosol perturbations in (c,d).Note that the optically shallow waters in the eastern lake are masked to prevent algorithm artifacts.
Figure 4 shows several examples of the algorithm performance under different aerosol perturbations for both non-bloom Figure 4a,b and bloom Figure 4c,d cases, respectively.For the non-bloom case in winter, the two MODIS images on consecutive days of 29 (Aqua) and 30 (Terra) January 2007 were collected under thin and thick aerosols, respectively, as revealed by the RGB images.The mean Rrc(859) differences between the two images was around 0.055, corresponding to aerosol optical thickness (τa) of 0.6 for maritime aerosols.Note that τa = 0.3 is the threshold for "thick aerosol" in the NASA processing software SeaDAS.Thus, even when the increased τa doubled the "thick aerosol" threshold (Figure 4a,b), the Chla spatial patterns were still temporally consistent for non-bloom conditions (Figure 4e,f).Likewise, for the two consecutive images on 8 August 2004 (Terra) and 9 August 2004 (Aqua) where mean Rrc(859) decreased by 0.036 (corresponding to τa decrease of 0.4, Figure 4c,d), most of the Chla spatial patterns for bloom conditions were also temporally consistent (Figure 4g,h).Note that the MODIS measurements were from both Terra (30 January 2007 and 8 August 2004) and Aqua (29 January 2007 and 9 August 2004), yet the EOF-retrieved Chla patterns are all consistent.

Sensitivity Test Using Radiative Transfer Simulations
Similar image comparison as shown in Figure 4 was performed for the entire MODIS time series, with similar findings, i.e., the EOF-based Chla algorithm appeared to be tolerant to aerosol perturbations.To further understand the algorithm sensitivity to aerosol perturbations under different solar/viewing geometry, radiative transfer simulations were used.In these simulations, aerosol perturbations (ρa(λ), t(λ), t0(λ) in Equation ( 1)) to Rrc(λ) were estimated using SeaDAS aerosol look up tables (LUTs) for several scenarios.Specifically, variations in satellite zenith angle (scene edge or scene center), aerosol type (coastal or maritime), relative humidity and aerosol optical thickness were considered.Figure 5 shows that the EOF algorithm performance, in terms of URMSE and R 2 , is roughly similar under these different circumstances, suggesting that the Rrc spectral shapes can be retrained under variable aerosol perturbations.This result is consistent to the image-based analysis (Figure 4).

Application to Long-Term MODIS Data
Figure 6 shows the Chla spatial distributions in the four seasons derived by use of the EOF algorithm from MODIS Rrc data.The top panels in Figure 6a-d show the representative RGB images in the four seasons where bloom slicks and patches can be clearly visualized, and the middle panels show the corresponding Chla distributions.The summer image on 21 July 2011 reveals high Chla in the NW Lake, SW Lake, Meiliang Bay, Gonghu Bay, and north of the Central Lake due to typical cyanobacterial blooms in summer.The images in the other three seasons showed variable but generally lower Chla in the various lake segments, with generally higher Chla in the bays than in the Central Lake.The bottom panels in Figure 6 show the seasonal mean Chla distributions.Of the 853 MODIS images between 2000 and 2013, >200 images were used to calculate each seasonal mean.As such, the Chla patchiness in the individual images has been removed in the seasonal mean.Summer showed the highest Chla in the western lake and Meiliang Bay, followed by spring, fall, and winter.Such spatial distributions and their temporal changes are consistent with the findings of [8] where statistics of floating algae mats were used to estimate seasonal bloom variability.Unlike the algae mats assessment, the current study provides additional quantitative estimates of Chla distributions for assimilation in forecast models.The MODIS Chla data were binned to calculate annual means and climatological monthly means for each of the six fixed sampling sites, with results shown in Figure 7.Despite the increasingly reported blooms in recent years, the 14-year trends in the annual means are not apparent in this dataset, although annual fluctuations are found in Figure 7a.In contrast, there is a clearly seasonality for most of the stations, with peak Chla between May and September (Figure 7b).Even though Chla in Taihu Lake could occasionally reach several hundreds of μg/L, such short-term variability has been smoothed in the annual and climatological monthly means, resulting in a relatively narrow range of Chla between 15 and 35 μg/L.1).Station 4 is in Meiliang Bay, Station 10 is in the NW Lake, and Station 8 is in the Central Lake.

Data Assimilation Results Using Default Chla and New Chla: A Comparison
The EcoTaihu model [33] is a vertically-compressed three-dimensional ecological model.The model includes three modules: a relatively independent hydrodynamics module, a food chain network module, and a material transform and transport module.The model has been used to study water transfers from the Yangtze River to Taihu Lake [66] and to predict algal blooms [52].The model parameterization, initialization, and forcing functions as well as other details can be found in [33], while the data assimilation method can be found in [43].
The model was initialized for 1 August 2010 and run through 31 August 2010, with a time step of 1 s and output every 24 h.The model initiation did not use any MODIS data, but used water quality data obtained from fixed monitoring stations during the previous month (July 2010) [33].Three separate runs were conducted, with results shown in Figure 8.The first was a free run without assimilating MODIS data, used as the control (Figure 8b,f).The second run assimilated MODIS Chla distribution on 13 August 2010 derived from an existing back propagation (BP) algorithm that used only 2 MODIS bands at 645 and 859 nm [42] (Figure 8c,g).The third run assimilated MODIS Chla distribution on 13 August 2010 derived from the EOF algorithm in this study (Figure 8d,h).Despite the model artifacts, there is some improvement in the model-predicted large-scale features.For example, the high-Chla patches outlined in the black circles appear to be unrealistic, and the use of the EOF-based MODIS Chla reduced such an artifact as compared with the use of the existing MODIS Chla (BP algorithm).This is especially apparent for the Chla patterns outlined in the blue circles.In general, there is a large difference between the model predicted Chla and MODIS-derived Chla (Figure 8e,h), suggesting future work is needed in model parameterization and tuning.Nevertheless, for the same model, the advantage of using the new EOF Chla over the old Chla in data assimilation is demonstrated here.

Discussion
Previous algorithm development for Taihu Lake and many other inland water bodies rarely used satellite data, but focused on field-measured Rrs (e.g., [24,67]).This may lead to two potential problems.One, the field data may not cover the full dynamic range of all environmental variables, limiting the application of the field-based algorithms to only those conditions under which the field data were collected.This is especially true when the field data were first classified before type-specific algorithms were developed.Two, the algorithms assumed error-free Rrs, which is not true for either field-measured or satellite-derived Rrs.Thus, in addition to the field-based algorithm development (which relies on both field and satellite measurements) and traditional point-based validation, this study also used image-based validation through inspection of the spatial and temporal patterns.
One aspect of this work is the use of Rrc instead of Rrs as a compromise between data quantity and data quality for near real-time use.Ideally, MODIS Rrs, after proper atmospheric correction, should be used for inversion to geophysical parameters such as Chla.However, although several case studies have shown significant improvement in turbid water atmospheric correction through using the MODIS shortwave infrared bands for Taihu Lake [68,69], the most significant problem for Taihu Lake real-time applications is lack of satellite data, as MODIS ocean bands often saturate over turbid atmosphere.Even without saturation, the requirements of the atmospheric correction on aerosol optical thickness (<0.3 at 869 nm) make valid MODIS Rrs retrievals extremely sparse for Taihu Lake.The MEdium Resolution Imaging Spectrometer (MERIS) has more spectral bands with more dynamic range, yet MERIS stopped functioning since April 2012.Several satellite missions are currently being planned by NASA and the European Space Agency (ESA).These include ESA's Sentinel-III, which will have a MERIS-like sensor called Ocean and Land Color Instrument (OLCI, 300-m resolution) and is expected to launch around June 2015 (Bryan Franz, personal comm.).However, as with all previous ocean color missions, these future missions may also be delayed.The most recently launched Visible Infrared Imaging Radiometer Suite (VIIRS, 2012-present) may potentially be used for real-time applications, yet the calibration problems, especially after December 2012 (personal comm., Menghua Wang, NOAA NESDIS), make its performance unclear.Although not optimal, before VIIRS performance is evaluated, the use of MODIS Rrc from the land bands is perhaps the best available option.Yet future algorithm development efforts should be dedicated to VIIRS multi-spectral data once its calibration is improved, as VIIRS has more spectral bands than MODIS that do not saturate.
The limited number of MODIS bands, together with the large uncertainties in the full atmospheric correction, led to the EOF approach for near real-time Chla retrievals.Similar to other neural-network based approaches, the EOF is based the variance of the spectral reflectance and thus purely statistical without explaining the physical meaning of the EOF modes.However, the Chla spatial distribution patterns derived from the MODIS images appear reasonable.One reason may be due to the spectral normalization (Equation ( 4)), which may partially remove the aerosol effects while retaining most of the spectral shape information.Likewise, although the spectral magnitudes and shapes in the extracted MODIS Rrc data suggest sediment-rich waters, the influence of sediments on the Chla retrievals is implicitly suppressed by the EOF model tuning, as all the four spectral bands (especially the three visible bands) do contain information from phytoplankton.For example, the ratio of 645/555 reflectance has been shown highly correlated with Chla in Tampa Bay [70], and the 469-nm band is also influenced by phytoplankton pigment absorption.Nevertheless, given the large uncertainties and purely statistical nature of the algorithm, the EOF approach developed for the four MODIS bands is a temporary solution to derive large-scale Chla patterns in near real-time.This is especially true when considering that (1) Chla in the training dataset (about 3-100 μg/L) may not cover the full data range of all possible blooms, especially those during the summer.Zhang et al. (2009) showed Chla during all four seasons, and they reported median ± std of 31.6 ± 58.1 μg/L for summer, meaning that about 25% of their data are above 100 μg/L (i.e., exceeding the data range used in this study) [49].Although some of the MODIS images showed that the algorithm could retrieve Chla > 100 μg/L (especially over surface scums), the validity of these high-Chla values needs to be verified; (2) perturbations from sun glint, thin clouds, whitecaps, and other aerosol types that were not used in the sensitivity simulations (Figure 5) could induce additional uncertainties in the algorithm.Figure 5 shows that compared with the original URMSE of 63.3% in the Rrs-based algorithm development (Figure 3a), most aerosol perturbations could induce another 5% in the URMSE.Other perturbations that were not considered in the simulations could possibly induce extra 5% or 10% URMSE, leading to a total URMS approaching or even higher than 80%.Clearly, future efforts should be dedicated to collect more field-MODIS data pairs to fine tune and improve the algorithm under most, if not all, satellite measurement conditions.
The limited model runs results also suggest the need for improvement of forecasting models.Over the last two decades, two general approaches have been developed to forecast Chla.The first was based on statistics through either regression [71], artificial neural networks [72,73], or fuzzy logic [74].The second was based on physical and biological processes, which could reveal controlling factors and causal relationships [75][76][77][78][79][80].For Taihu Lake, such process-oriented models have been developed and used to study hydrodynamics and biological response to physical forcing [33,52,81,82].However, application of such a model together with MODIS data assimilation revealed large-scale artifacts, where the reasons need to be further diagnosed.For example, it might be possible that the model has inherent limitations due to the shallow water bottom (mean water depth of Taihu Lake is ~2 m), which presents a dragging force but is neglected in nearly all hydrodynamic models.Nevertheless, even with such apparent model artifacts, the improvement in MODIS Chla still led to improved performance in the model prediction.
Clearly, an inter-disciplinary effort is required to improve both the forecast model and the MODIS Chla retrievals in order to implement a near real-time operational forecasting model for Taihu Lake.
Can the EOF approach be applied to other similar water bodies in different geographical and biogeochemical regimes where MODIS ocean bands also saturate?A preliminary test of the algorithm using data collected from a nearby lake, Chaohu Lake, showed reasonable Chla distribution patterns although their validity still requires rigorous validation.It is believed that similar to previous EOF approaches to retrieve water quality parameters [27,60], the principle of the EOF approach should be applicable to other similar water bodies for most water quality parameters including Chla, yet the number of spectral bands, number of modes, and parameterization of the algorithm may need to be localized to account for the different bio-optical properties (e.g., dominated by suspended sediments, CDOM, or phytoplankton).We anticipate carrying out these experiments for other lakes in the near future.

Conclusions
This work represents one step towards the ultimate goal of establishing an operational, near real-time forecast model to predict Chla distribution in a turbid lake.Using in situ data and only four MODIS land bands that do not saturate over turbid atmosphere or turbid water, the empirical approach based on Empirical Orthogonal Function (EOF) analysis provides a preliminary solution to derive Chla under cloudfree conditions in near real-time for Taihu Lake of China, which can be assimilated in forecast models to improve Chla prediction.For Chla ranging between 3 and 100 μg/L, the unbiased RMS uncertainties were estimated to be <80%.Together with 853 cloud-free MODIS images between 2000 and 2013, the algorithm led to the development of spatial and temporal Chla distribution patterns in Taihu Lake.The preliminary success is encouraging as MODIS is currently the only well calibrated operational sensor for near real-time applications, yet the large uncertainties and the limited number of spectral bands suggest further improvements in both the Chla retrieval algorithms and data availability from future satellite sensors.The discrepancy between model predictions and satellite observations also points to the need of improving the forecasting models.

Figure 1 .
Figure 1.Taihu Lake in the Yangtze River Delta, China.Annotated on the image are locations of the 28 sampling stations visited by the six NIGLAS cruise surveys (October 2004, May 2008, October 2008, May 2010, March 2011 and May 2011) and 6 fixed sampling sites.

Figure 2 .
Figure 2. (a) MODIS Rrc spectra (469, 555, 645, and 859 nm) corresponding to the 28 in situ measurements shown in Figure 1.These spectra were used with field-measured Chla to develop the EOF model; (b) MODIS Rrc ratios versus field-measured Chla.

Figure 3 .
Figure 3. (a) Comparison between field measured Chla at the 28 stations (Figure 1) and Chla derived from the MODIS Rrc using the EOF model; (b) Validation of the EOF model using independent field measurements from the six fixed stations (Figure 1).The dashed lines are 1:1 lines.

Figure 4 .
Figure 4. Performance of the Chla retrieval algorithm under different aerosol conditions in both winter (29 and 30 January 2007) and summer (8 and 9 August 2007).The top panels show MODIS RGB images (a,b,c,d) while the bottom panels show the retrieved Chla (e,f,g,h).The RGB images in winter (a,b) show significant variations in the aerosol content (haziness); yet the retrieved Chla images (e,f) are consistent in the large-scale spatial patterns and Chla magnitudes (e.g., higher Chla in the southern lake than in the northern lake); Likewise, for the bloom cases in summer, the retrieved Chla patterns in (g,h) appear to be insensitive to different aerosol perturbations in (c,d).Note that the optically shallow waters in the eastern lake are masked to prevent algorithm artifacts.

Figure 6 .
Figure 6.Examples of MODIS RGB and Chla in different seasons.The top panels (a-d) show MODIS RGB images selected in the four seasons, the middle panels (e-h) show the retrieved Chla corresponding to the RGB images, and the bottom panels (i-l) show the seasonal mean Chla between 2000 and 2013.

Figure 7 .
Figure 7. (a) Annual mean MODIS Chla and (b) Climatological monthly mean MODIS Chla for the six fixed sampling sites (Figure 1).Station 4 is in Meiliang Bay, Station 10 is in the NW Lake, and Station 8 is in the Central Lake.

Figure 8 .
Figure 8. Top panels in (a-d): Chla distributions on 13 August 2010.(a) MODIS (EOF algorithm); (b) Forecast model (EcoTaihu, started from 1 August) without assimilating MODIS data; (c) Forecast model after assimilating MODIS Chla from the existing BP algorithm [42]; (d) Forecast model after assimilating MODIS Chla from the new EOF algorithm (a).The forecast model (EcoTaihu) generated model output every 24 h on 12:00 AM local time.Bottom panels in (e-h): Chla distributions on 15 August 2010.Note that although the MODIS Chla image on 13 August was assimilated in the model, the MODIS Chla image on 15 August 2010 was not assimilated in the model and therefore served as a reference to validate the model's performance in predicting Chla in two days.

Table 1 .
Number of MODIS images used in each season, 2000-2013.