This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (

As a major indicator of lake eutrophication that is harmful to human health, the chlorophyll-a concentration (Chl-a) is often estimated using remote sensing, and one method often used is the spectral derivative algorithm. Direct derivative processing may magnify the noise, thus making spectral smoothing necessary. This study aims to use spectral smoothing as a pretreatment and to test the applicability of the spectral derivative algorithm for Chl-a estimation in Taihu Lake, China, based on the ^{3}, 4.0–98.0 mg/m^{3} and 11.4–35.8 mg/m^{3}, respectively. The derivative model was first used and then compared with the band ratio, three-band and four-band models. The results show that the first-order derivative model at 699 nm had satisfactory accuracy (R^{2 }= 0.75) after kernel regression smoothing and had smaller validation root mean square errors of 15.21 mg/m^{3} in 2005 and 5.85 mg/m^{3} in 2011. The distribution map of Chl-a in Taihu Lake based on the HJ1/HSI image showed the actualdistribution trend, indicating that the first-order derivative model after spectral smoothing can be used for Chl-a estimation in turbid lake.

Freshwater lakes are the main source of drinking and agricultural water in many areas, and their water quality can greatly affect human health. Due to the increasing economic development in China, lake eutrophication has become a serious water quality problem that has recently attracted much attention. The main danger caused by eutrophication is the toxins produced by some algae, which are harmful for drinking and can poison or even kill humans and animals that consume contaminated water and food [

The chlorophyll-a concentration (hereafter referred to as Chl-a) is an important parameter in evaluating water quality, nutrition status and organic pollution extent, providing useful information for managing water quality [

The derivative analysis of spectra is effective for information detection [

In addition to laboratory spectroscopic analysis, the derivative method can also be used for tackling analogous problems, such as interference from water and other background in the remote sensing retrieval of Chl-a in water. Previous studies showed that the first-order derivative is able to remove pure water effects and the second-order derivative can remove suspended sediment effects [

However, derivatives are notoriously sensitive to noise, and direct spectral derivative processing will magnify the noise. Therefore, smoothing or otherwise minimizing random noise is necessary. Tsai [

The objectives of this study are the following: (1) to compare the influence of three typical smoothing methods on the spectrum and select the most suitable smoothing method; (2) to build a derivative model based on the

Taihu Lake, the second largest freshwater lake in China and with typically turbid water, is located at the junction of Jiangsu and Zhejiang provinces. Taihu Lake covers an area of 2,427.8 km^{2} and has an average depth of 2.12 m, with eutrophication status ranging from moderate to heavy [

In July–August of 2004 and 2005, water samples covering the lake were collected, and the spectrum above the water surface was measured at Taihu monitoring sites. In March 2011, the sampling positions mainly covered the Meiliang Bay heavy eutrophication area. The

The

Sample distribution of Taihu Lake in (

Chl-a was analyzed in the laboratory according to national standard SL88-1994 in China (three-color spectrophotometry). First, the water samples collected in the field were filtered through a GF/C membrane. The membrane was dried in a low-temperature refrigerator for more than 12 h and then the chlorophyll was extracted by 90% acetone from the filter. The extracted liquid was centrifuged and kept still for 12 h, and the supernatant was then measured in a UV-2550 spectrophotometer. The Chl-a was calculated using the absorbance at 750 nm, 663 nm, 645 nm and 630 nm.

The statistical characteristics of Chl-a in the three datasets are shown in ^{3}) in 2004 was not used because its spectrum is similar to that of algae bloom.

Statistical characteristics of Chl-a in July–August of 2004 and 2005 and in March of 2011.

Dataset | Sample numbers | Minimum | Maximum | Median |
---|---|---|---|---|

2004 | 23 | 5.0 | 156.0 | 33.0 |

2005 | 21 | 4.0 | 98.0 | 29.0 |

2011 | 12 | 11.4 | 35.8 | 23.3 |

The total suspended sediment concentration (TSS) was measured according to the gravimetric method (China national standard GB11901-89, 1990). Based on the measurements from 1998 to 2003 in Taihu Lake, the average TSS value in July–August was 49.2 mg/L, ranging from 12.0 mg/L to 261.0 mg/L. The TSS value in Meiliang Bay ranged from 7.3 mg/L to 21.1 mg/L in March of 2011.

Three typical smoothing methods,

The following formulas were used to calculate the first- and second-order derivatives of remote sensing reflectance:
_{i+1}, _{i}, and _{i-1} are the adjacent wavelengths and _{i}), _{i}) and _{i}) are the original spectra, first-order spectral derivative and second-order spectral derivative at band _{i}, respectively.

The first- and second-order derivative models were built and then compared with other typical models. The band ratio model is commonly used for Chl-a estimation, using the ratio of the reflectance peak near 710 nm to the reflectance valley near 670 nm [_{1}, _{2}, _{3}, and _{4} are the reflectance at the wavelengths of _{1}, _{2}, _{3} and _{4}, respectively.

With respect to the band ratio model, _{2} and _{1} were determined in the range from 700 to 720 nm and 650 to 690 nm, respectively, to minimize the RMSE of _{2}/_{1}. With respect to the three-band model, the optimal bands were determined in the wavelength range from 450 to 750 nm, with the initial iteration positions of _{1} and _{3} at 675 nm and 750 nm, respectively, and the detailed description of the model tuning method was from Zimba and Gitelson [_{1} and _{3} were used to find the optimal _{2} position through iteration until the RMSE of the model, (1/R675 − 1/R [_{2}]) × R750, became minimal. Then, _{1} and _{2} were fixed to find the position of _{3} at which the RMSE was minimal. This iterative calculation continued until all three positions no longer changed. The four-band model was built according to the method of Le [

The RMSE, average relative error (ARE) and normalized RMSE (NRMSE) were used to evaluate the model accuracy, and their formulas are as follows:
^{3}) , ^{3}) , _{max} and _{min} are the maximum and minimum of measured Chl-a, respectively.

The HJ1 environmental satellite is the first satellite used for environmental monitoring in China, and the hyperspectral imager (HSI) on it has contiguous spectral bands and a short return cycle. The spatial resolution of the HSI data is 100 m, the time resolution is 96 h, and the average spectral resolution is 5 nm, with a total of 115 bands from 450 to 950 nm.

Because the weather in the region around Taihu Lake is perennially cloudy and rainy, obtaining synchronous images with the field data is difficult. The HSI image on 9 May 2009, covering a large part of Taihu Lake, was collected to generate the Chl-a distribution map. The geometric correction was based on a standard TM image with precise geometric information, and the error was within one pixel.

The atmospheric correction of the HJ1/HSI image was conducted using the 6S algorithm, which considered the non-Lambertian surface situation and solved the coupled problems of the BRDF surface and atmosphere [

Spectral derivatives can distinguish the detailed information in the spectrum, but the direct derivative processing of the remote sensing reflectance can magnify the noise caused by the environment or measurement influence. Thus, smoothing the spectrum before derivative calculation is necessary. Three methods were used to smooth two spectra randomly selected from the dataset of 2004, and the spectra before and after smoothing are displayed in

Spectrum reflectance (

There is always a trade-off between noise removal and the ability to resolve fine spectral details. As the filter size increases, spectral details may also be suppressed. Ideal smoothing can remove noise without altering the real spectral features, and the optimal filter size for approaching this goal depends on both the noise type and the smoothing algorithm. For the spectra in

The spectrum data from July-August of 2004 and 2005 and March of 2011 were used for the derivative analysis. The magnitude and shape of the reflectance in

The original spectrum in (

There are some differences in the spectra data of the three datasets: the reflectance peak at 700 nm and valley near 670 nm of the spectra in 2004 and 2005 are obvious, indicating high Chl-a in the phytoplankton-dominated lake water in summer. The fluorescence peak at 700 nm is lower in the spectrum of March 2011 because the suspended solids dominate the water in the winter [

All spectra data were smoothed using the kernel regression, and the first- and second-order derivatives were then calculated. The correlation coefficients between Chl-a and the original spectra, first-order spectral derivative, and second-order spectral derivative were calculated and shown in

The spectral derivatives focused mainly on the shape characteristics of the spectral curve while ignoring the magnitude difference and suppressing the background effects from the water and other substances. Thus, the derivatives can accurately extract the spectral characteristics of the water component.

Han [

With respect to the spectra data from 2004, the first-order derivative at 699 nm and the second-order derivative at 685 nm were found to be the most highly correlated with Chl-a. Regression models between Chl-a and the first-order derivative at 699 nm and between the second-order derivative at 685 nm were built, and the results are shown in

The Chl-a estimation model built using (

^{3}, with the R^{2} of both the first- and second-order derivative models being greater than 0.6 ( ^{3}. These results also showed that the estimation of lower Chl-a is relatively better than that of high Chl-a because the reflectance peak of high Chl-a redshifts to a longer wavelength, thus decreasing the sensitivity of the derivative spectra to Chl-a.

The

Validation of (

The RMSEs validated using the first-order derivative model in 2005 and 2011 were 15.21 mg/m^{3} and 5.85 mg/m^{3}, respectively. The ^{3} and overestimates higher Chl-a for both datasets. Notably, the 2011 dataset did not contain Chl-a greater than 36 mg/m^{3}.

With respect to the second-order derivative model, the validation results for the 2005 dataset significantly deviated from the 1:1 curve, and the model overestimated the values for almost all samples. In contrast, the validation results for the 2011 dataset were satisfactory, and this result could also be shown by their NRMSEs. In conclusion, the validation results of the first-order derivative model were satisfactory and consistent for different datasets, and thus the first-order derivative model at 699 nm was preferably chosen for the Chl-a estimation.

To compare the performance of the first-order derivative model with that of other models, the band ratio, three-band and four-band models were all built based on the 2004 dataset, and the results are shown in

Three types of model built using the data from 2004 and their validation results using the data from 2005 and 2011. The band ratio model built using data from (

Comparing the model results of ^{2} = 0.75, RMSE = 17.84 mg/m^{3}) than the other models because the latter have a higher goodness of fit (R^{2} > 0.85) and lower RMSE (RMSE < 15.00 mg/m^{3}), although the ARE differs little. The first-order derivative model performed worse than the two-band, three-band and four-band models, indicating that more bands used in the model building can increase the goodness of fit.

However, the validation results (^{3} were much lower, whereas RMSEs of other models were greater than 17.08 and 8.16 mg/m^{3}, respectively. The ARE had similar tendencies. The NRMSEs show that all of the three models for 2004 had better validation in 2005 than in 2011. The first-order derivative model performed better than the other models in the validation, showing that fewer bands in a model can help increase the validation accuracy.

To further test the validation performance of the first-order derivative model, the calibration and validation datasets were switched in turn. First, the data from 2005 were used to build the model, and the data from 2004 and 2011 were used to validate it; second, the data from 2011 were used to build the model, and the data from 2004 and 2005 were used to validate it. The results showed that the first-order derivative model had consistently better validation results than the other models.

This difference can be explained by two aspects: (1) The optical characteristics of the water at different dates may have great variations, which can be reflected in the multiple bands of the spectral curve. A combination of more bands will produce more uncertainty, and fewer bands can thus reduce such uncertainty. Therefore, although the first-order derivative model was not very satisfactory in a single dataset, the model calculated by the spectral data after smoothing suppressed the uncertainty and thus increased the model availability for new datasets. (2) More parameters in a model fit the data better but produce higher variance; fewer parameters in a model do not fit the data satisfactorily, but the goodness of fit is stable in different datasets [

HSI/HJ1 data was performed by geometric correction and 6S atmospheric correction, and then smoothed using the low pass convolution method with a kernel size of 7 pixels. The region outside Taihu Lake was masked by the boundary of the lake. The HSI bands with central wavelengths at 701.66 nm and 696.845 nm were used to calculate the first-order derivative image.

The Chl-a in the lake was calculated according to the first-order derivative model, Chl-a = 178991 × R699' + 37.766 , built in

The HJ1/HSI image of Taihu Lake and the estimation result of Chl-a. (

The growth of algae in Taihu Lake generally involves four periods: sinking sleep (December of the prior year to February), floating recovery (March–April), massive growth (April–September) and floating accumulation (April–November) [^{3} calculated indicated the bloom information around these two bays very well (

This study applied derivative analysis to estimate the Chl-a in Taihu Lake, China, and demonstrated the necessity of spectral smoothing before building the derivative model. Three smoothing methods,

With respect to the derivative spectrum of the 2004 dataset, the first-order derivative model at 699 nm was robust and had consistently high validation accuracy in multiple datasets. The RMSE values were 15.21 mg/m^{3} and 5.85 mg/m^{3}, respectively, when the model was validated using the

The work is support by NSFC (No. 40771152), the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (09KJA420001) and the Research and Innovation Project for Graduates of Jiangsu Higher Education Institutions (CXLX12_0394). We would like to express our gratitude to Jiao Hongpo, Liu Ke, Gong Shaoqi, Xu Ning, Wang Lei, and Zhang Xiaowei for their work in the field and to Zhang Jing for her work in the laboratory.

The authors declare no conflict of interest.