Next Article in Journal
Ground Deformation Detection Using China’s ZY-3 Stereo Imagery in an Opencast Mining Area
Next Article in Special Issue
Reconstruction of River Boundaries at Sub-Pixel Resolution: Estimation and Spatial Allocation of Water Fractions
Previous Article in Journal
A New Recursive Filtering Method of Terrestrial Laser Scanning Data to Preserve Ground Surface Information in Steep-Slope Areas
Previous Article in Special Issue
Spatial Analysis of Linear Structures in the Exploration of Groundwater

ISPRS Int. J. Geo-Inf. 2017, 6(11), 360; https://doi.org/10.3390/ijgi6110360

Article
Evaluation of Empirical and Machine Learning Algorithms for Estimation of Coastal Water Quality Parameters
1
Earth & Atmospheric Remote Sensing Lab (EARL), Department of Meteorology, COMSATS Institute of Information Technology, Islamabad 45550, Pakistan
2
School of Marine Sciences, Nanjing University of Information Science and Technology, Nanjing 210044, China
3
Department of Geography, College of Social Sciences, Kuwait University, P.O. Box 5969, Safat 13060, Kuwait
*
Author to whom correspondence should be addressed.
Received: 13 October 2017 / Accepted: 14 November 2017 / Published: 15 November 2017

Abstract

:
Coastal waters are one of the most vulnerable resources that require effective monitoring programs. One of the key factors for effective coastal monitoring is the use of remote sensing technologies that significantly capture the spatiotemporal variability of coastal waters. Optical properties of coastal waters are strongly linked to components, such as colored dissolved organic matter (CDOM), chlorophyll-a (Chl-a), and suspended solids (SS) concentrations, which are essential for the survival of a coastal ecosystem and usually independent of each other. Thus, developing effective remote sensing models to estimate these important water components based on optical properties of coastal waters is mandatory for a successful coastal monitoring program. This study attempted to evaluate the performance of empirical predictive models (EPM) and neural networks (NN)-based algorithms to estimate Chl-a and SS concentrations, in the coastal area of Hong Kong. Remotely-sensed data over a 13-year period was used to develop regional and local models to estimate Chl-a and SS over the entire Hong Kong waters and for each water class within the study area, respectively. The accuracy of regional models derived from EPM and NN in estimating Chl-a and SS was 83%, 93%, 78%, and 97%, respectively, whereas the accuracy of local models in estimating Chl-a and SS ranged from 60–94% and 81–94%, respectively. Both the regional and local NN models exhibited a higher performance than those models derived from empirical analysis. Thus, this study suggests using machine learning methods (i.e., NN) for the more accurate and efficient routine monitoring of coastal water quality parameters (i.e., Chl-a and SS concentrations) over the complex coastal area of Hong Kong and other similar coastal environments.
Keywords:
coastal waters; water quality modeling; Landsat; HJ-1 A/B CCD

1. Introduction

Over half of the world’s population lives near the coast, adding more pressure on coastal environments [1]. Extensive anthropogenic activities along the coast, such as land reclamation and disposal of household/industrial waste into the coastal waters, significantly degrades the coastal water quality [2,3]. Additionally, a major source of marine water pollution is due to the storm water runoff into the coastal areas without any treatment [4]. Thus, effective coastal monitoring programs are highly demanded to protect and maintain coastal ecosystems. Monitoring these dynamic ecosystems require the integration of in situ data with high spatial and temporal resolution data, such as remotely-sensed data [5]. Remote sensing datasets provide a synoptic view of coastal areas with the ability to measure the upwelling radiance in different spectral regions [6,7,8,9,10].
Coastal waters, specifically the estuarine coastal waters, are dynamic in nature and are characterized with fluctuating water turbidity. For instance, three major water types exist in the coastal area of Hong Kong, i.e., the clear water type in the eastern region, moderately turbid waters in the central region, and highly turbid waters in the western region. The western region of Hong Kong receives a large amount of suspended particles from the Pearl River, while the eastern region shows a significantly different clear water type due to its connection with the South China Sea. A single model developed for such complex coastal environments might be associated with high uncertainties. Thus, the development of a separate water quality parameter (WQP) estimation model for each water type found in the region could improve the estimation of coastal water parameters. A previous study [11] has suggested the existence of five optically distinct water classes found in the coastal area of Hong Kong (Figure 1). Different studies [4,12,13] have reported the role of chlorophyll-a (Chl-a) and suspended solids (SS) in aquatic ecosystem health, tourism, shipping, and fisheries, which can be enhanced if estimations of Chl-a and SS are obtained with high accuracy. Additionally, accuracy becomes more important for complex water bodies, such as the South China Sea, which experiences a high volume of pollutants and storm water runoff from the Pearl River Delta.
The coastal WQPs have been estimated over a variety of geographical locations and environmental conditions using different remotely-sensed data, including Landsat and Chinese HJ-1 A/B charge couple device sensors (hereafter referred as HJ-1 CCD) imagery [14,15,16,17,18,19]. Remote sensing studies over the Pearl River Delta and Hong Kong coastal area have also been conducted to estimate coastal water constituents. Zhang et al. [20] used Terra MODIS (Moderate Resolution Imaging Spectroradiometer—1000 m) and MERIS (Medium Resolution Imaging Spectrometer—300 m) sensors to estimate Chl-a concentrations. Xi and Zhang [21] developed an empirical model using MERIS imagery to retrieve the SS concentrations. Using the two-band model (MERIS bands 6 and 7) they retrieved and mapped the SS concentration with an R2 of 0.75 and root mean square error (RMSE) of 1.69 mg/L. Tian et al. [22] used the medium (30 m) spatial resolution HJ-1 CCD sensors data to model SS concentrations for the Deep Bay area of Hong Kong.
Chen et al. [23,24] classified the Hong Kong and Pearl River Delta region waters based on their optical properties, into five different classes using a Landsat Thematic Mapper (TM) image. They used three classification techniques namely, maximum likelihood (MLH), neural network (NN), and support vector machine (SVM), to recognize spatial patterns in water color, and found similar spatial patterns of spectral reflectance, with varying classification accuracies in all classification techniques. In classifying the optically different water types, they made imprecise assumptions in (i) obtaining actual reflectance values of Landsat TM, i.e., using an image-based atmospheric correction method and no subsequent validation of the atmospheric correction results; (ii) defining classes based on only one TM image i.e., using one image one cannot pick the variation in coastal water dynamics (iii) training/validation of the classification techniques using satellite-retrieved WQPs, i.e., retrieving the Chl-a and SS concentrations from SeaWiFS (Sea-Viewing Wide Field-of-View Sensor—1.1 km) and AVHRR (Advanced Very High Resolution Radiometer—1.1 km) images, respectively; and (iv) in considering that the water reflectance was constant over different dates and flushing in the Pearl River Delta was not strong, i.e., the time difference between image date and the in situ WQP data collection was 3–12 days (for 66% of the data) and 14–23 days (for 34% of the data). However, it is reported that in a turbid estuary the water properties can vary within ±24 h [25].
In this study, we aimed to use in situ and remotely-sensed data to model two of the most commonly estimated WQPs (i.e., Chl-a and SS concentrations) within the entire coastal area of Hong Kong and within each coastal water class of Hong Kong using empirical predictive and machine learning methods. Empirical methods are straightforward, whereas machine learning methods require a certain level of expertise by the user, but are computationally fast, and can handle large data [26]. Hence, the results of this study will not only benefit the scientific community, but will also help the policy-makers to protect Hong Kong’s coastal environment based on significantly reliable and efficient routine estimates of Chl-a and SS concentrations.

2. Datasets

2.1. Satellite Data

This study combines the archived datasets of Landsat-5 (L5) TM, Landsat-7 (L7) Enhanced Thematic Mapper Plus (ETM+), and HJ-1 CCD sensors. The L5 TM and L7 ETM+ were launched in March 1984 and April 1999, respectively. Both sensors have a spatial resolution of 30 m for multispectral bands. All ETM+ scenes acquired after 31 May 2003 suffer from failure of the scan line corrector, causing a loss of 22% of the scene [27]. The HJ-1 A and HJ-1 B satellites were launched in September 2008, together carry four CCD cameras (i.e., CCD1 and CCD2 for HJ-1 A and similar for HJ-1 B). Apart from the first band of the HJ-1 CCDs (B1 = 475 nm), which is 10 nm wider than that of Landsat TM/ETM+ (B1 = 485 nm), the other three bands are identical (B2 = 560 nm, B3 = 660 nm and B4 = 830 nm). The spatial resolution of the four CCD sensors matches the first four bands of Landsat TM/ETM+. The side rotation (±30°) feature of HJ-1 CCDs confers an advantage over Landsat, by shortening the revisit time from 16 days to 48 h or less [28].
Overall, 57 images were used in this study, including four images from L5 TM, twenty-three images from L7 ETM+ (January 2000 to December 2012) and thirty images from HJ-1 CCDs (September 2008 to December 2012). Images from different years and months enabled the investigation of temporally-variable water quality over the study area, with dates selected according to availability of in situ water quality data and cloud-free conditions.

2.2. In Situ Chl-a and SS Concentrations Data

Monitoring of Hong Kong’s coastal waters is carried out by the Hong Kong Environmental Protection Department (EPD) from a scientific vessel equipped with a differential global positioning system. The water samples are collected at 76 fixed monitoring stations (Figure 1) from three depths, namely, near the surface (1 m below surface), the middle layer, and near the seabed (1 m above sea floor). Samples are collected in a 500 mL Nalgene bottle (separately for each parameter, i.e., 500 mL for Chl-a and 500 mL for SS) and refrigerated for transport, then analyzed for extraction of Chl-a and SS concentrations. The Chl-a concentration is determined using an ‘in-house’ GL-OR-34 method based on the American Public Health Association (APHA) 10200H 2 spectrophotometric method and the SS concentration is determined using an ‘In-house’ GL-PH-23 method based on the APHA 2540D weighing method, by the Government Laboratory of Hong Kong [29]. The in situ “surface” data of Chl-a and SS concentrations for dates coincident with Landsat TM, ETM+, and HJ-1 CCD images were retrieved from the EPD water quality parameters database. The ocean color data was combined with the satellite data to make a single dataset for the modeling of water quality parameters i.e., Chl-a and SS concentrations. The descriptive statistics of the in situ measurements are provided in Table 1.

3. Methodology

3.1. Satellite Imagery Pre-Processing

Landsat TM/ETM+ scenes were processed to Standard Terrain Correction (Level 1T) by the provider, i.e., the United States Geological Survey (USGS). Level 1T product provides systematic geometric accuracy by incorporating ground control points (GCPs) while employing a digital elevation model (DEM) for topographic accuracy. When obtained from the China Centre for Resources Satellite Data and Application (CRESDA), the HJ-1 CCD images were not geometrically corrected. Therefore, all the images were geometrically corrected using concurrent or most recent Landsat TM/ETM+ image(s) as a reference. As the HJ-1 CCD images have a wide swath width (360 km) in comparison with a Landsat TM/ETM+ scene (185 km), before geometric rectification, all the HJ-1 CCD images were subset over the study area (i.e., Hong Kong) to reduce the processing time and to save disk space.
After having sufficient number of GCPs (from 25 to 50) from reference Landsat TM/ETM+ image(s), the HJ-1 CCD images were geometrically corrected with an average RMSE of 1 pixel. To minimize the loss of spectral information resulting from image resampling during geometric correction, the nearest neighbor resampling method was used. After the geometric correction, images were further processed to convert to a standard radiometric scale. For Landsat TM and ETM+ sensors, Equation (1) was used to convert the digital numbers (DN) to the top-of-atmosphere (TOA) radiance (Lsatλ) [27]:
L sat λ = ( G rescale λ × Q cal λ ) + B rescale λ
where:
G rescale λ = L max λ L min λ Q calmax Q calmin
B rescale λ = L min λ ( L max λ L min λ Q calmax Q calmin ) × Q calmin
where:
  • Lsatλ = At-satellite spectral radiance/TOA radiance for band λ (W/(m2 sr μm));
  • Qcalλ = Quantized calibrated pixel value for band λ (DN);
  • Qcalmin = Minimum quantized calibrated pixel value corresponding to Lminλ (DN);
  • Qcalmax = Maximum quantized calibrated pixel value corresponding to Lmaxλ (DN);
  • Lminλ = Spectral at-sensor radiance, scaled to Qcalmin for band λ (W/(m2 sr μm)); and
  • Lmaxλ = Spectral at-sensor radiance, scaled to Qcalmax for band λ (W/(m2 sr μm)).
For HJ-1 CCD sensors, Equation (4) (given in the metadata file of the imagery) was used to convert the DN values to TOA radiance (Lsatλ):
L sat λ = DN λ G λ + L 0 λ
where:
  • DNλ = Quantized calibrated pixel value for band λ;
  • Gλ = Band-specific gain factor [DN/(W/(m2 sr μm)] for band λ; and
  • L = Band-specific bias factor [W/(m2 sr μm)] for band λ.
The values of all the above-mentioned parameters, required for the radiometric correction, were obtained from the respective image metadata files.

3.2. Cross-Comparison of Sensors

Landsat TM/ETM+ and HJ-1 CCD sensors have similar band designations, data bit levels (8-bit) and spatial resolutions (30 m) [30]. Further, Nazeer and Nichol (2015) [31] compared the image statistics for the common homogenous water areas observed near-simultaneously by the two satellite sensors. They revealed that there is a high degree of consistency between the two sensors, i.e., a correlation of ≥0.80 was observed for visible bands (B1, B2, B3, and B4). Therefore, this analysis provides a sound base for combining the data from the two sensors.

3.3. Atmospheric Correction

Atmospheric correction is a key pre-processing step that needs to be performed when combining data from different sensors, as well as from different dates [32]. Atmospheric correction was, therefore, performed using the Second Simulation of the Satellite Signal in the Solar Spectrum (6S) method [28]. 6S is a radiative transfer model [33] that determines the three coefficients (xa, xb, and xc) required for estimation of surface reflectance ( ρ ) for each band (λ) (Equation (5)). The coefficients (xa, xb, and xc) were calculated by the 6S model based on input parameters including sensor type, image acquisition date and time, solar zenith and azimuth angles, sensor's zenith and azimuth angles, atmospheric model, and aerosol optical depth/visibility:
ρ = Y 1.0 + ( x c × Y )
Y = ( x a × L sat λ ) x b
where, xa is the inverse of transmittance, xb is the scattering term of the atmosphere and xc is the reflectance of the atmosphere for isotropic light.
Information on solar and sensor zenith and azimuth angles was extracted from the image metadata files. The datasets for aerosol optical depth and water vapor were retrieved from MODIS Terra Daily Level–3 (1° × 1°) global atmospheric product (MOD08_D3.051) and the Ozone Monitoring Instrument’s (OMI) Daily Level–3 (0.25° × 0.25°) global gridded product, respectively [34].

3.4. Satellite and In Situ Data Matching

Water properties can vary rapidly in a turbid coastal environment. Therefore, a narrow time window (±3 h) should be considered to match the satellite and in situ water samples [35]. To capture short-term changes in coastal waters of Hong Kong, a time window of ±2 h (9:00 am to 1:00 pm local) of the image acquisition time was used to identify collocated samples. Water sampling locations affected by adjacency effects, clouds, scan line errors on ETM+ images, and ship wake effects were not included. A mean surface reflectance of each sampling station was extracted from a window of 3 × 3 pixels. This criteria resulted in 240 observations (N = 240) for a Chl-a range of 0.30 to 13.0 µg/L and SS concentration range of 0.5 to 56.0 mg/L. Table 2 shows the satellite and in situ data match-ups for each of the water classes (defined by [11]), as well as the range of Chl-a and SS concentrations for the match-ups.

3.5. Modeling of Chl-a and SS Concentrations

EPM and NN techniques were used for the development of models for the estimation of Chl-a and SS concentrations using Landsat TM/ETM+ and HJ-1 CCD images. The reason for the inclusion of the satellite data from different years and months was to capture the variability of WQPs over the study area. Two types of models were developed, i.e.
  • Regional Models—a single model for the entire coastal area of Hong Kong; and
  • Local Models—a separate model for each of the water class defined by [11].
For modeling the entire coastal area of Hong Kong (regional models), 200 observations (from 2000–2010) were used in the model development and 40 observations (from 2011–2012) were used for validation. The mean Chl-a concentration for the model development dataset was 3.23 µg/L with a standard deviation (StDev) of 2.94 µg/L, while the mean for the validation dataset was 1.73 µg/L with a StDev of 1.84 µg/L. The mean SS concentration of the model development dataset was 5.86 mg/L with a StDev of 6.46 mg/L, while the mean of the validation dataset was 5.09 mg/L with a StDev value of 3.52 mg/L. The high standard deviation values are due to the pronounced spatial variability of Chl-a and SS concentrations over the entire area of Hong Kong’s coastal region, from the South China Sea in the east to the Pearl River Delta in the west, reflecting the complexity of Hong Kong waters. This high variability is similar in both the development and validation datasets.
For local class-specific models, the data set of EPM and NN was divided into two segments: a training segment, which included 70% of the data to be used in the training phase, and a validation segment including the remaining data points (30%), used to examine the model efficiency. Both subsets were randomly selected from all 57 images. They covered the entire observed range of Chl-a and SS concentrations within each class.

3.5.1. Empirical Predictive Modeling (EPM)

For the EPM of Chl-a and SS concentrations, first of all, the correlations of these two parameters were determined using the first four bands of TM/ETM+ and HJ-1 CCDs surface reflectance (after atmospheric correction). There is no particular band or band ratio that consistently shows a good correlation between in situ-measured WQPs (Chl-a and SS concentrations) and remotely-sensed data for every geographical location. Therefore, the relationship between in situ WQPs and water surface reflectance was explored before developing a model. In the analysis the band transformations, including addition, subtraction, multiplication, division, averaging, and logarithm, were also considered (e.g., [36]).
To select the best independent variables, the variables having p-value of less than or equal to 0.05 and a correlation coefficient (R) of ≥0.50 were considered in the EPM development. An independent variable in empirical predictive modeling was selected based on R, standard error, t-test, p-value, and the residual. If the value of R was close to 1, and values of standard error (S), t-test, p-value, and residual were close to zero, or negative, then that band, or the combination of bands, was marked as potentially meaningful.

3.5.2. Neural Network Modeling (NN)

A multilayer perceptron NN was adopted for the retrieval of Chl-a and SS concentrations using Landsat TM/ETM+ and HJ-1 CCD images over the study area. To achieve the best possible output, the network was trained using a supervised learning technique, i.e., training using prior information of the desired output corresponding to a set of input data (Chl-a and SS concentrations). In the training phase a relationship between the input and the desired output was established based on the weight values for each connection. The weights were iteratively adjusted to minimize the error, computed as the mean square difference between the model and the actual output. The training set consisted of in situ measured Chl-a and SS concentrations and the Landsat TM/ETM+ and HJ-1 CCD reflectance values from bands B1 to B4. Inputs were fed into the network and based on this, the network calculated the output. This output was compared with the actual output and the difference between these two outputs, called the network error, was calculated. By adjusting the internal weights through an iterative process, the network error was reduced through back-propagation, i.e., the network adjusted its weights, starting with the output layer and working back through the network.

3.6. Performance Evaluation

The models developed to estimate Chl-a and SS concentrations using EPM and NN were validated. The validation was based on five statistical parameters, namely, Pearson correlation coefficient (R, Equation (7)), RMSE (Equation (8)) [37], mean absolute error (MAE, Equation (9)), Bias (ψ, Equation (10)) and scattering of data points (|ψ|, Equation (12)) [38], where R is a measurement of the correlation between the observed and the predicted datasets. RMSE measures the difference between observed and predicted values. MAE is the measure of magnitude of the mean error, the quantity ψ determines the bias, while |ψ| indicates the scattering of data points. The following parameters were calculated using Microsoft Excel (Microsoft Corporation, Redmond, WA, USA).
R = n x i y i x i y i n x i 2 ( x i ) 2 n y i 2 ( y i ) 2
RMSE = i = 1 n   ( x i y i ) 2 n
MAE = 1 n i = 1 n | x i y i |
ψ = 1 n i = 1 n ψ i
ψ i = x i y i y i × 100
| ψ | = 1 n i = 1 n | ψ i |
where n is the number of observations, xi and yi are the estimated and observed concentrations of Chl-a and SS.

4. Results

4.1. Regional Modeling of Chl-a and SS Concentrations

4.1.1. Empirical Predictive Modeling

The empirical band ratio algorithms used for the estimation of water quality parameters from remote sensing data in Case 2 waters have different relationships for different geographical locations because of relationship between a specific water quality parameter and the image wavebands vary from one water type to another. For the coastal area of Hong Kong, a previous study [36] has proposed the ratio of band 3 and band 1 to be used for the estimation of Chl-a concentration when using Landsat TM/ETM+ data. Therefore, this study has adopted the same band ratio (Equation (13)) for the estimation of Chl-a concentration using Landsat TM/ETM+ and HJ-1 CCD datasets. Similarly, for the estimation of SS concentrations, [31] has proposed a SS concentration estimation model using Landsat TM/ETM+ and HJ-1 CCD sensors. Therefore, in this study the previously proposed model was adopted as it is (Equation (14)):
Chl a EPM = 0.46   B 3 / ( B 1 ) 2 1.87
SS EPM = e 0.76 ln ( B 2 × B 3 ) 0.58

4.1.2. Neural Network Modeling

To achieve a best possible output, the network was trained using the coincident satellite and in situ data corresponding to 200 observations from years 2000 to 2010. In the training phase a relationship between the surface reflectance of first four bands and the desired output (Chl-a and SS concentrations) was established based on the weight values for each connection. The weights were adjusted iteratively to minimize the error, computed between the modeled and the actual output. Finally, the modeled output was compared with the actual output the difference between these two was calculated. By adjusting the internal weights through an iterative process, the network error was reduced through back-propagation, i.e., the network adjusted its weights, starting with the output layer and working back through the network.

4.1.3. Validation of Regional EPM and NN Models

To test the robustness of the models developed using coincident data from year 2000–2010, the predicted Chl-a and SS concentrations were validated with additional in situ datasets at 40 stations for the years 2011 and 2012. For the regional EPMs, the Chl-a prediction model (Equation (13)) appeared to be significant with a higher correlation of 0.89 and lower RMSE of 0.93 µg/L and MAE of 0.94 µg/L (Figure 2a), while the SS concentration model (Equation (14)) resulted in a correlation of 0.85, RMSE of 2.60 mg/L, and MAE of 2.04 mg/L (Figure 2b).
For the regional NN models, the validation results for the Chl-a model showed a correlation of 0.88 between the in situ-measured and the NN-estimated Chl-a concentrations, and fit with a slope of 0.85. The model Chl-aNN can also estimate the Chl-a concentrations over a wide range, suggesting its suitability for routine monitoring of the Chl-a concentrations over the study area. While the NN-based SS concentration model (SSNN) showed a correlation of 0.77 between the in situ-measured and NN-estimated SS concentrations (Figure 3).

4.2. Class-Specific (Local) Modeling of Chlorophyll-a (Chl-a) and Suspended Solid (SS) Concentrations

To model the Chl-a and SS concentrations more precisely, a separate model was developed for each water quality class defined by [11].

4.2.1. Empirical Predictive Modeling

Like the empirical predictive models developed for the estimation of Chl-a and SS concentrations regionally (Section 4.1), different band combinations and transformations (as discussed in Section 3.5.1) were defined to select the best representation of Chl-a and SS concentrations over the coastal region of Hong Kong. For each class, among the defined variables (band combinations or transformations) a variable was selected which showed the highest correlation between the surface reflectance and in situ-measured Chl-a and SS concentrations, and was statistically significant (p-value = 0.05). The EPM equations which were developed and used for the estimation of Chl-a and SS concentrations for each class, are given in Table 3.

4.2.2. Neural Network Modeling

A multilayer perceptron neural network was implemented for the retrieval of class-specific Chl-a and SS concentrations from Landsat TM/ETM+ and HJ-1 CCD images. The network was trained using the coincident satellite and in situ data corresponding to 70% of the observations (Table 4) reported in Table 2 for each class. In the training phase a relationship between the surface reflectance of the first four bands and the in situ Chl-a and SS concentrations was established based on the weight values of each connection. The weights were adjusted iteratively to minimize the error. Finally, the modeled Chl-a and SS concentrations were compared with actual concentrations and the difference was calculated. By adjusting the internal weights, the network error was reduced through back-propagation.

4.2.3. Validation of the Class-Specific (Local) Models

To test the robustness of the developed class-specific empirical predictive and neural network models, the remaining 30% of the data was used for validation. Overall, for each class the neural networks performed better than the empirical predictive models. Figure 4 shows the relationship between the Chl-a and SS concentrations predicted using the class-specific empirical predictive and neural networks models, and the in situ validation dataset. A linear relationship was observed for all the class-specific models derived using neural networks with correlation coefficients ranging from 0.60 to 0.94 while, for the empirical predictive models the correlation coefficient was much lower (ranging from 0.10 to 0.87) and mostly underestimated the Chl-a and SS concentrations.

4.3. Statistical Performance Measures

Table 5 and Table 6 show the statistical performance of both the regional and class-specific models. The validation dataset was an additional dataset not used in the development of the models and was kept constant for regional/local as well as empirical predictive and neural network models. For the estimation of Chl-a concentrations with the regional models both, the neural network (ChlaNN) and empirical predictive models (Chl-aEPM) performed equally well, although the performance of Chl-aEPM was slightly better than Chl-aNN (Table 5). For SS concentrations, the neural network model SSNN performed poorly, with lower correlation and higher RMSE and MAE errors compared to the empirical predictive model SSEPM.
For the local class-specific models, it was observed that empirical predictive models showed poor statistical performance (Table 6). On the other hand, the class-specific neural networks performed much better than the corresponding empirical predictive models and showed satisfactory results against the validation data for all the classes. Therefore, for the regional estimation of Chl-a and local estimation of Chl-a and SS concentrations, the neural networks are recommended over the complex coastal region of Hong Kong.

5. Discussion and Conclusions

The primary objective of this study was to evaluate the empirical and machine learning methods for the estimation of coastal water quality parameters. The study has used a 13-year dataset of Landsat TM/ETM+ and HJ-1 CCD sensors, coincident with the in situ Chl-a and SS concentration data collected within 2 h of the satellite overpass. Further, two types of models, based on empirical predictive and neural network approaches, were developed for accurate estimation and mapping of Chl-a and SS concentrations at both regional (representing the entire coastal area of Hong Kong) and local (specific for each water class) scales. For regional estimation using neural network and empirical predictive models, the retrieval accuracies of 93% (97%) and 83% (78%) were observed for Chl-a (SS) concentrations, respectively. The neural networks also outperformed for estimation of Chl-a (60–94%) and SS (81–94%) concentrations than the empirical predictive models (3–63% for Chl-a and 52–87% for SS concentrations) for local class-specific retrievals. Although the neural networks have better regional estimation, it was observed that the estimated Chl-a and SS concentrations for each class were well correlated with in situ measurements, suggesting that a class-specific (local) neural network is suitable for the remote sensing-based routine monitoring of Chl-a and SS concentrations over the complex coastal environment of Hong Kong.
For all class-specific models (Chl-a or SS) the empirical predictive modeling performed poorly compared to the neural networks. Conversely, the neural networks performed well in estimating the Chl-a and SS concentrations for each water class. The adjustments of the weight values between the input, hidden, and output layers reduced the network error and led to a satisfactory performance of the neural networks. A comparison of the correlation coefficients of class-specific neural networks and empirical predictive models is presented in Figure 5. Overall, a higher correlation for both the training and validation datasets was observed for neural network models. It is evident that a lower correlation exists for the empirical predictive models, but there is a notable decline in Class 4 even for neural network models when estimating Chl-a concentrations (Figure 5a). This decline may result from higher variance (11.63) in Class 4 compared to other water classes (for classes 1, 2, 3, and 5 the variance is 3.54, 7.69, 6.45, and 4.92, respectively), which indicates that the data is spatially variable over this class.
Overall, this study has found machine learning methods (i.e., NN) to be 19% and 10% more efficient than empirical predictive models for the estimation of SS and Chl-a, respectively, in complex water bodies. Therefore, this study suggests the use of machine learning methods for the better/accurate estimation of coastal water quality parameters for the routine monitoring of coastal water quality parameters. Hence, the proposed method in this study will not only benefit the scientific community but will also help the policy-makers to protect the oceanic and coastal environments based on significantly reliable estimates of Chl-a and SS concentrations.

Acknowledgments

The authors would like to acknowledge The Hong Kong EPD for providing water quality data, the U.S Geological Survey for providing Landsat TM/ETM+ images, the CRESDA for providing the HJ-1 A/B images, the mission scientists and Principal Investigators who provided the MODIS and OMI data products used in this study. We would also like to thank Ms. Jinxin Yang (LSGI, Hong Kong Polytechnic University) for her help in downloading the HJ-1 A/B CCD images. We also thank the editors and anonymous reviewers whose constructive comments greatly improved the quality of the manuscript.

Author Contributions

Majid Nazeer and Muhammad Bilal conceived and designed the experiments; Majid Nazeer performed the experiments; Imran Shahzad and Muhammad Alsahli analyzed the data; and Ahmad Waqas contributed in analyzing and drafting the manuscript. All the authors read and revised the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Neumann, B.; Vafeidis, A.T.; Zimmermann, J.; Nicholls, R.J. Future coastal population growth and exposure to sea-level rise and coastal flooding--A global assessment. PLoS ONE 2015, 10, e0118571. [Google Scholar] [CrossRef] [PubMed]
  2. Zhou, F.; Guo, H.; Liu, L. Quantitative identification and source apportionment of anthropogenic heavy metals in marine sediment of Hong Kong. Environ. Geol. 2007, 53, 295–305. [Google Scholar] [CrossRef]
  3. Derraik, J.G. The pollution of the marine environment by plastic debris: A review. Mar. Pollut. Bull. 2002, 44, 842–852. [Google Scholar] [CrossRef]
  4. Gorgoglione, A.; Gioia, A.; Iacobellis, V.; Piccinni, A.F.; Ranieri, E. A Rationale for Pollutograph Evaluation in Ungauged Areas, Using Daily Rainfall Patterns: Case Studies of the Apulian Region in Southern Italy. Appl. Environ. Soil Sci. 2016, 2016, 1–16. [Google Scholar] [CrossRef]
  5. IOCCG. Remote Sensing of Ocean Colour in Coastal, and Other Optically-Complex, Waters; Reports of the International Ocean-Colour Coordinating Group: Dartmouth, NS, Canada, 2000. [Google Scholar]
  6. Nazeer, M.; Wong, M.S.; Nichol, J.E. A new approach for the estimation of phytoplankton cell counts associated with algal blooms. Sci. Total Environ. 2017, 590–591, 125–138. [Google Scholar] [CrossRef] [PubMed]
  7. Butt, M.J.; Nazeer, M. Landsat ETM+ Secchi Disc Transparency (SDT) retrievals for Rawal Lake, Pakistan. Adv. Space Res. 2015, 56. [Google Scholar] [CrossRef]
  8. Mohammad, A.; Price, K.P.; Buddemeier, R.; Fautin, D.G.; Egbert, S. Mapping Spatial and Temporal Distributions of Kuwait SST Using MODIS Remotely Sensed Data. Appl. Remote Sens. J. 2012, 2, 1–16. [Google Scholar]
  9. Chen, J.; Cui, T.; Qiu, Z.; Lin, C. A three-band semi-analytical model for deriving total suspended sediment concentration from HJ-1A/CCD data in turbid coastal waters. ISPRS J. Photogramm. Remote Sens. 2014, 93, 1–13. [Google Scholar] [CrossRef]
  10. Ha, N.T.T.; Thao, N.T.P.; Koike, K.; Nhuan, M.T. Selecting the Best Band Ratio to Estimate Chlorophyll-a Concentration in a Tropical Freshwater Lake Using Sentinel 2A Images from a Case Study of Lake Ba Be (Northern Vietnam). ISPRS Int. J. Geo-Inf. 2017, 6, 290. [Google Scholar] [CrossRef]
  11. Nazeer, M.; Nichol, J.E. Improved water quality retrieval by identifying optically unique water classes. J. Hydrol. 2016, 541, 1119–1132. [Google Scholar] [CrossRef]
  12. Bian, C.; Jiang, W.; Song, D. Terrigenous transportation to the Okinawa Trough and the influence of typhoons on suspended sediment concentration. Cont. Shelf Res. 2010, 30, 1189–1199. [Google Scholar] [CrossRef]
  13. Grashorn, S.; Lettmann, K.A.; Wolff, J.-O.; Badewien, T.H.; Stanev, E.V. East Frisian Wadden Sea hydrodynamics and wave effects in an unstructured-grid model. Ocean Dyn. 2015, 65, 419–434. [Google Scholar] [CrossRef]
  14. Brivio, P.A.; Giardino, C.; Zilioli, E. Determination of chlorophyll concentration changes in Lake Garda using an image-based radiative transfer code for Landsat TM images. Int. J. Remote Sens. 2001, 22, 487–502. [Google Scholar] [CrossRef]
  15. Duan, H.; Zhang, Y.; Zhang, B.; Song, K.; Wang, Z. Assessment of chlorophyll-a concentration and trophic state for Lake Chagan using Landsat TM and field spectral data. Environ. Monit. Assess. 2007, 129, 295–308. [Google Scholar] [CrossRef] [PubMed]
  16. Han, L.; Jordan, K.J. Estimating and mapping chlorophyll-a concentration in Pensacola Bay, Florida using Landsat ETM+ data. Int. J. Remote Sens. 2005, 26, 5245–5254. [Google Scholar] [CrossRef]
  17. Kabbara, N.; Benkhelil, J.; Awad, M.; Barale, V. Monitoring water quality in the coastal area of Tripoli (Lebanon) using high-resolution satellite data. ISPRS J. Photogramm. Remote Sens. 2008, 63, 488–495. [Google Scholar] [CrossRef]
  18. Mahasandana, S.; Tripathi, N.K.; Honda, K. Sea surface multispectral index model for estimating chlorophyll a concentration of productive coastal waters in Thailand. Can. J. Remote Sens. 2009, 35, 287–296. [Google Scholar] [CrossRef]
  19. Wang, F.; Han, L.; Kung, H.-T.; Van Arsdale, R.B. Applications of Landsat-5 TM imagery in assessing and mapping water quality in Reelfoot Lake, Tennessee. Int. J. Remote Sens. 2006, 27, 5269–5283. [Google Scholar] [CrossRef]
  20. Zhang, Y.; Lin, H.; Chen, C.; Chen, L.; Zhang, B.; Gitelson, A. a Estimation of chlorophyll-a concentration in estuarine waters: Case study of the Pearl River estuary, South China Sea. Environ. Res. Lett. 2011, 6, 24016. [Google Scholar] [CrossRef]
  21. Xi, H.; Zhang, Y. Total suspended matter observation in the Pearl River estuary from in situ and MERIS data. Environ. Monit. Assess. 2011, 177, 563–574. [Google Scholar] [CrossRef] [PubMed]
  22. Tian, L.; Wai, O.; Chen, X.; Liu, Y.; Feng, L.; Li, J.; Huang, J. Assessment of Total Suspended Sediment Distribution under Varying Tidal Conditions in Deep Bay: Initial Results from HJ-1A/1B Satellite CCD Images. Remote Sens. 2014, 6, 9911–9929. [Google Scholar] [CrossRef]
  23. Chen, X.; Li, Y.S.; Liu, Z.; Yin, K.; Li, Z.; Wai, O.W.; King, B. Integration of multi-source data for water quality classification in the Pearl River estuary and its adjacent coastal waters of Hong Kong. Cont. Shelf Res. 2004, 24, 1827–1843. [Google Scholar] [CrossRef]
  24. Chen, X.; Li, Y.-S.; Liu, Z.; Li, Z.; Wai, O.W.; King, B. Water quality management in the estuary of Pearl River and Hong Kong’s coastal waters based on SeaWiFS, Landsat TM sensor data and in situ water quality sampling data. In Third International Asia-Pacific Environmental Remote Sensing Remote Sensing of the Atmosphere, Ocean, Environment, and Space; Frouin, R.J., Yuan, Y., Kawamura, H., Eds.; International Society for Optics and Photonics: Washington, DC, USA, 2003; pp. 589–599. [Google Scholar]
  25. Le, C.; Hu, C.; Cannizzaro, J.; Duan, H. Long-term distribution patterns of remotely sensed water quality parameters in Chesapeake Bay. Estuar. Coast. Shelf Sci. 2013, 128, 93–103. [Google Scholar] [CrossRef]
  26. Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2011, 66, 247–259. [Google Scholar] [CrossRef]
  27. Chander, G.; Markham, B.L.; Helder, D.L. Summary of current radiometric calibration coefficients for Landsat MSS, TM, ETM+, and EO-1 ALI sensors. Remote Sens. Environ. 2009, 113, 893–903. [Google Scholar] [CrossRef]
  28. Nazeer, M.; Nichol, J.E.; Yung, Y.-K. Evaluation of atmospheric correction models and Landsat surface reflectance product in an urban coastal environment. Int. J. Remote Sens. 2014. [Google Scholar] [CrossRef]
  29. HKEPD. Marine Water Quality in Hong Kong in 2015. Hong Kong, 2016. Available online: http://wqrc.epd.gov.hk/pdf/water-quality/annual-report/MarineReport2015eng.pdf (accessed on 12 October 2017). [Google Scholar]
  30. Li, G.; Li, X.; Li, G.; Wen, W.; Wang, H.; Chen, L.; Yu, J.; Deng, F. Comparison of Spectral Characteristics Between China HJ1-CCD and Landsat 5 TM Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 139–148. [Google Scholar] [CrossRef]
  31. Nazeer, M.; Nichol, J.E. Combining Landsat TM/ETM+ and HJ-1 A/B CCD Sensors for Monitoring Coastal Water Quality in Hong Kong. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1898–1902. [Google Scholar] [CrossRef]
  32. Song, C.; Woodcock, C.E.; Seto, K.C.; Lenney, M.P.; Macomber, S.A. Classification and Change Detection Using Landsat TM Data. Remote Sens. Environ. 2001, 75, 230–244. [Google Scholar] [CrossRef]
  33. Vermote, E.F.; Tanré, D.; DeuzéHerman, J.L.; Herman, M.; Morcrette, J.J.; Kotchenova, S.Y. Second Simulation of a Satellite Signal in the Solar Spectrum—Vector (6SV), 6S User Guide, Version 3. 2006. Available online: https://pdfs.semanticscholar.org/4cff/1aa6101a41a3d6fca21805f8e4d756846f40.pdf (accessed on 10 August 2017).
  34. Acker, J. G.; Leptoukh, G. Online Analysis Enhances Use of NASA Earth Science Data. Eos Trans. Am. Geophys. Union 2007, 88, 14. [Google Scholar] [CrossRef]
  35. Bailey, S.; Werdell, P. A multi-sensor approach for the on-orbit validation of ocean color satellite data products. Remote Sens. Environ. 2006, 102, 12–23. [Google Scholar] [CrossRef]
  36. Nazeer, M.; Nichol, J.E. Development and application of a remote sensing-based Chlorophyll-a concentration prediction model for complex coastal waters of Hong Kong. J. Hydrol. 2016, 532. [Google Scholar] [CrossRef]
  37. Dorji, P.; Fearns, P.; Broomhall, M. A Semi-Analytic Model for Estimating Total Suspended Sediment Concentration in Turbid Coastal Waters of Northern Western Australia Using MODIS-Aqua 250 m Data. Remote Sens. 2016, 8, 556. [Google Scholar] [CrossRef]
  38. Zibordi, G.; Mélin, F.; Berthon, J.-F.; Canuti, E. Assessment of MERIS ocean color data products for European seas. Ocean Sci. 2013, 9, 521–533. [Google Scholar] [CrossRef]
Figure 1. Hong Kong Environmental Protection Department’s (EPD) monitoring stations and water classes delineated by [11].
Figure 1. Hong Kong Environmental Protection Department’s (EPD) monitoring stations and water classes delineated by [11].
Ijgi 06 00360 g001
Figure 2. Validation results of regional empirical predictive models for (a) Chl-a and (b) SS concentrations.
Figure 2. Validation results of regional empirical predictive models for (a) Chl-a and (b) SS concentrations.
Ijgi 06 00360 g002
Figure 3. Validation of the regional neural network models for (a) Chl-a and (b) SS concentrations estimation using 40 coincident data points (for 2011 and 2012).
Figure 3. Validation of the regional neural network models for (a) Chl-a and (b) SS concentrations estimation using 40 coincident data points (for 2011 and 2012).
Ijgi 06 00360 g003
Figure 4. Validation results for the class-specific empirical predictive (EPM) and neural network (NN) models for Chl-a (ae) and SS concentrations (fj).
Figure 4. Validation results for the class-specific empirical predictive (EPM) and neural network (NN) models for Chl-a (ae) and SS concentrations (fj).
Ijgi 06 00360 g004
Figure 5. Comparison of correlation coefficients (in absolute terms) for training and validation datasets for class-specific empirical predictive models (EPM) and neural networks (NN) (a) for Chl-a models and (b) for SS models.
Figure 5. Comparison of correlation coefficients (in absolute terms) for training and validation datasets for class-specific empirical predictive models (EPM) and neural networks (NN) (a) for Chl-a models and (b) for SS models.
Ijgi 06 00360 g005
Table 1. Descriptive statistics of the in situ data.
Table 1. Descriptive statistics of the in situ data.
VariableMaxMinMedianAverageStDev
Chl-a (μg/L)13.000.301.902.982.84
SS (mg/L)56.000.504.405.736.07
Table 2. Distribution of satellite and in situ data match-ups for cloud-free scenes over study area from January 2000 to December 2012.
Table 2. Distribution of satellite and in situ data match-ups for cloud-free scenes over study area from January 2000 to December 2012.
Water ClassNo. of Match-UpsChl-a Range (µg/L)SS Range (mg/L)
Class 1240.4–8.11.1–5.0
Class 2400.3–12.01.0–7.6
Class 3330.7–11.00.5–11.0
Class 4740.5–13.00.7–22.0
Class 5690.6–12.02.0–56.0
Total2400.3–13.00.5–56.0
Table 3. Class-specific empirical predictive equations and correlation coefficients for Chl-a and SS concentrations.
Table 3. Class-specific empirical predictive equations and correlation coefficients for Chl-a and SS concentrations.
Water ClassNFor Chl-a ConcentrationsFor SS Concentrations
Chl-a =RSS =R
Class 117−0.15 × √(B4/B3) + 1.260.680.11 × (B2/B1) + 0.730.56
Class 2280.04 × (B3/B2) + 0.390.690.58 × (B2) + 3.640.81
Class 3233.48 × (B2×B4) + 0.130.890.56 × Av (B1, B3) + 1.450.73
Class 4520.06 × (B2/B1) + 0.870.373.43 × B3/(B4)2 − 15.930.59
Class 5480.12 × (B2/B1) + 0.900.610.02 × (B3/B1) + 0.490.72
Table 4. Class-specific correlation coefficient values for neural network based Chl-a and SS concentrations.
Table 4. Class-specific correlation coefficient values for neural network based Chl-a and SS concentrations.
Water ClassNR (for Chl-a)R (for SS)
Class 1170.990.88
Class 2280.990.99
Class 3230.990.94
Class 4520.840.72
Class 5480.990.99
Table 5. Statistical performance measure of the Chl-a and SS concentration estimation models.
Table 5. Statistical performance measure of the Chl-a and SS concentration estimation models.
ModelNRRMSEMAEψ|ψ|
Chl-aEPM400.890.93 (µg/L)0.94 (µg/L)−57 (%)109 (%)
Chl-aNN400.881.31 (µg/L)0.99 (µg/L)−15 (%)63 (%)
SSEPM400.852.60 (mg/L)2.04 (mg/L)119 (%)135 (%)
SSNN400.774.59 (mg/L)2.72 (mg/L)75 (%)47 (%)
Table 6. Summary of the validation parameters for Chl-a and SS concentrations computed using the validation dataset for each class.
Table 6. Summary of the validation parameters for Chl-a and SS concentrations computed using the validation dataset for each class.
Water ClassNFor Chl-a (SS) Concentrations
RRMSEMAEψ (%)|ψ| (%)
Class 17−0.10 (0.52)3.52 (2.42)2.87 (2.06)−65 (−66)65 (66)
0.91 (0.82)1.53 (1.19)1.39 (1.00)11 (33)41 (37)
Class 2120.10 (0.77)3.33 (4.26)2.10 (4.06)−65 (223)65 (223)
0.94 (0.86)1.18 (0.99)0.92 (0.75)−40 (14)68 (34)
Class 3100.63 (0.61)49.43 (2.53)35.68 (1.57)1171 (43)1171 (62)
0.92 (0.80)1.17 (1.75)0.97 (1.10)5 (13)36 (31)
Class 4220.03 (0.65)5.08 (10.51)3.46 (9.81)−54 (−234)59 (234)
0.82 (0.81)2.41 (2.17)1.71 (1.66)−13 (4)71 (36)
Class 5210.39 (0.87)2.89 (8.97)1.59 (7.46)−35 (−91)44 (91)
0.60 (0.94)2.11 (1.97)1.26 (1.48)−14 (3)48 (21)
Notes: 1. Empirical predictive modeling results are presented in italic text while neural network modeling results are presented in bold text; 2. The units of RMSE and MAE for Chl-a and SS are µg/L and mg/L, respectively.

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Back to TopTop