Comparison of machine learning algorithms for retrieval of water quality indicators in case­II waters: a case study of Hong Kong

: Anthropogenic activities Coastal case-II of different spatiotemporal variability The present study evaluates the potential of remote sensing using machine learning techniques, improving water quality the coastal Kong. Concentrations of suspended solids (SS), chlorophyll-a (Chl-a), and turbidity were estimated with several machine learning techniques including Artiﬁcial Neural Network (ANN), Random Forest (RF), Cubist regression (CB), and Support Vector Regression (SVR). Landsat (5,7,8) reﬂectance data were compared with in situ reﬂectance data to evaluate the performance of machine learning models. The highest accuracies of the water quality indicators were achieved by ANN for both, in situ reﬂectance data (89%-Chl-a, 93%-SS, and 82%-turbidity) and satellite data (91%-Chl-a, 92%-SS, and 85%-turbidity. The water quality parameters retrieved by the ANN model was further compared to those retrieved by “standard Case-2 Regional/Coast Colour” (C2RCC) processing chain model C2RCC-Nets. The root mean square errors (RMSEs) for estimating SS and Chl-a were 3.3 mg/L and 2.7 µ g/L, respectively, using ANN, whereas RMSEs were 12.7 mg/L and 12.9 µ g/L for suspended particulate matter (SPM) and Chl-a concentrations, respectively, when C2RCC was applied on Landsat-8 data. Relative variable importance was also conducted to investigate the consistency between in situ reﬂectance data and satellite data, and results show that both datasets are similar. The red band (wavelength ≈ 0.665 µ m) and the product of red and green band (wavelength ≈ 0.560 µ m) were inﬂuential inputs in both reﬂectance data sets for estimating SS and turbidity, and the ratio between red and blue band (wavelength ≈ 0.490 µ m) as well as the ratio between infrared (wavelength ≈ 0.865 µ m) and blue band and green band proved to be more useful for the estimation of Chl-a concentration, due to their sensitivity to high turbidity in the coastal waters. The results indicate that the NN based machine learning approaches perform better and, thus, can be used for improved water quality monitoring with satellite data in optically complex coastal waters. Artiﬁcial Neural Network (ANN), Random Forest (RF), Cubist regression (CB), and Support Vector Regression (SVR), using independent, in situ reﬂectance data and satellite-derived reﬂectance data (Landsat L5, L7, and for water quality over (2) a further comparison of machine learning models with EPMs; and (3) sensitivity analysis of spectral bands for modeling water quality based on variable importance analysis.


Introduction
The coastal marine ecosystem is both complex and vulnerable [1] as it is generally close to areas with high population density. As documented in previous studies [2,3], population density within 100 km of coastlines is approximately three times higher than the average density of the global population and a further increase is expected. Increased anthropogenic activities along the coasts have resulted in the degradation of water quality [4], including runoff of agricultural fertilizers into rivers, resulting in high suspended solids with large nutrient inflows that can cause eutrophication. Eutrophication can further contribute to an increase in algal bloom events [5]. These blooms can block sunlight, resulting in an anoxic condition, in which dissolved oxygen is depleted within the coastal environment. In addition, some of these blooms are toxic, with adverse effects on aquatic life and humans [6,7]. Due to the adverse consequences of water pollution, there is a need to monitor potential changes in water quality in any environmental impact assessment [8,9]. Water quality indicators (WQIs), such as chlorophyll-a (Chl-a), suspended solids (SS), and turbidity, have been used as indicators for monitoring coastal and inland water quality [10][11][12][13].
The concentration of Chl-a, a measure of phytoplankton biomass, is a key water quality indicator as it is the base of the marine food chain, and it can also be used to indicate algal blooms [7]. Chl-a is an optically-active parameter as clean water absorbs most of the visible, and all infrared (IR) radiation, while nutrient-rich water with the presence of Chl-a, generally reflects green and IR radiation back to the atmosphere [14]. Similarly, SS is also optically-active parameter as a high concentration of SS increases water leaving radiance across the whole visible spectrum. Routine monitoring of SS is critical as the high concentrations of SS have adverse effects on benthic invertebrates [15].
Traditional methods using field measurements for measuring WQI offer high accuracy. However, these methods are labor-intensive and time-consuming, and hence are not able to provide efficient and concurrent water quality measurements at a regional scale [16]. On the other hand, satellite remote sensing-based methods have potential to measure optically-active WQI, such as Chl-a, SS, colored dissolved organic matter (CDOM), and turbidity, at regular interval and over large areas. In situ sensing technologies, such as multispectral or hyperspectral radiometers, are of great importance as these are required for vicarious calibration and in situ spectral data along with WQI data are extensively used to develop or validated analytical, semianalytical, or empirical models for inland/marine water quality monitoring using corresponding satellite bands [17][18][19]. Previous studies have evaluated satellite data for monitoring Chl-a and SS. For example, the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) with 1.1-km resolution has been widely used to estimate the spatial distribution of Chl-a [20,21]. Miller and McKee [12] used a 250 m Moderate Resolution Imaging Spectroradiometer (MODIS) to map total suspended solids concentration and Moses, Gitelson, Berdnikov and Povazhnyy [10] combined MODIS and the Medium Resolution Imaging Spectrometer (MERIS) to map Chl-a in case-II waters. Other studies have used data of finer-resolution sensors, such as the Landsat 5 (L5) Thematic Mapper (TM), Landsat 7 (L7) Enhanced Thematic Mapper Plus, (ETM+) the Landsat 8 (L8) Operational Land Imager (OLI), the Earth Observation Advance Land Imager (EO-1 ALI), Huanjing-1 (HJ-1) A/B, and Sentinel-2 A/B to design a more comprehensive framework for water quality monitoring over inland, estuarine, and coastal environments [11,[22][23][24].
However, these studies have several limitations, as they commonly apply linear or multivariate regression to estimate water quality, this may be suitable for a specific environment (e.g., inland lake and case-I waters) but may not work well in coastal environments. This is because multiple factors, such as tides and ocean currents, can influence the flow of water pollutants and the interaction among these factors cannot be explained by a simple linear relationship. Additionally, linear or multivariate regression is highly dependent on station-based water quality data for model development.
Furthermore, the empirical predictive models (EPMs) developed from these linear/multivariate regressions cannot be simply applied to other areas of interest, because of the complexity of association among colored dissolved organic matter, Chl-a concentration, and type and size of SS, as the spectral response of water depends on these factors [25][26][27]. Thus, several studies [4,28,29] have examined alternatives such as machine learning techniques and radiative transfer functions to improve water quality modeling.
Compared with machine learning techniques, radiative transfer models such as Hydrolight and successive order of scattering (SOS) are physical-based, and therefore depend on Inherent Optical Properties (IOP) of water and extensive field data [30]. In addition, deriving IOP from satellite data is challenging. In contrast, the decision-making process of machine learning may require fewer data and assumptions for training purposes (Kim et al., 2014). This implies that such techniques are more flexible for application to different types of coastal environments. For example, some studies have developed machine learning based models for predicting Chl-a and CDOM in temperate regions [4,31,32] and to map phytoplankton cell counts on a subtropical coast [28]. Although machine learning techniques may be useful to improve the estimation of Chl-a and related WQIs in a coastal environment, there are two issues that need to be further addressed: (i) previous studies have used machine learning techniques for estimating WQIs in temperate regions. Since these machine learning techniques were based on different assumptions, it is necessary to evaluate such techniques for predicting water quality to determine which method may be better for mapping WQIs in a coastal environment and (ii) subtropical areas are influenced by monsoons, typhoons, and high marine biological productivity, resulting in a more complex coastal system compared to temperate regions. Further investigation into remote sensing of water quality across the subtropical region is needed, to provide improved and routine monitoring. Based on these two core issues, the present study aims to develop a systematic approach to evaluate different machine learning algorithms for estimating water quality of subtropical case-II waters. The main objectives of this study include (1) an evaluation of four machine learning methods, including Artificial Neural Network (ANN), Random Forest (RF), Cubist regression (CB), and Support Vector Regression (SVR), using independent, in situ reflectance data and satellite-derived reflectance data (Landsat L5, L7, and L8) for water quality prediction over subtropical coastal waters; (2) a further comparison of machine learning models with EPMs; and (3) sensitivity analysis of spectral bands for modeling water quality based on variable importance analysis.

Study Area
Hong Kong is a subtropical city located on the Pearl River Estuary's eastern coast with 1106 km 2 of land and 1649 km 2 of marine areas ( Figure 1). Development of industries, accompanying the urbanization of the adjacent Guangdong province of China, has resulted in adverse environmental degradation in water quality along the coastlines [8]. Due to the combination of anthropogenic activities and variable geophysical environment, the coastal waters of Hong Kong are physically and chemically complex. For example, water quality in the western areas of Hong Kong (e.g., Deep Bay and Northwestern zone) are influenced by the turbidity of the Pearl River discharges and the high loads of microalgae, usually associated with high nutrients (Figure 2). Conversely, the eastern areas (e.g., Tolo Harbor and Mirs Bay) are mostly influenced by the Pacific currents, and the central areas are influenced by both the Pacific currents, and industrial/residential effluents from Hong Kong and the Pearl River [13]. In addition, SS and turbidity along the coasts of Hong Kong vary spatiotemporally with the river discharge, winds, and tidal fluctuations. In late summer to early autumn the eastern and southern waters are affected by frequent algal blooms, due to both natural and anthropogenic causes [33]. Due to the high variability in this complex coastal environment, using an empirical model such as linear or multivariate regression may not be promising for predicting WQIs across these coastal waters.  Chl-a near NM1, (c) turbid water near NM6, and (d) highly turbid water near the NM5 station of EPD (shown in Figure 1).

Station-Based Water Quality Observations
Station-based data of water quality from 1999 to 2015 were retrieved from the marine WQIs online database of the Hong Kong Environmental Protection Department (EPD). These WQIs were collected from 10 monitoring zones defined by EPD ( Figure 1). The EPD has a dedicated marine monitoring vessel which uses Differential Global Positioning Systems (DGPS) and an advanced conductivity-temperature-depth (CTD) profiler to take measurements and collects samples simultaneously once a month from 76 fixed monitoring stations. EPD measures 24 WQIs from three depths: surface water (1 m below the sea surface), middle water (half the sea depth), and bottom water (1 m above the seabed). Samples are collected in a 500 mL Nalgene bottle and analyzed for the an empirical model such as linear or multivariate regression may not be promising for predicting WQIs across these coastal waters.   Figure 1).

Station-Based Water Quality Observations
Station-based data of water quality from 1999 to 2015 were retrieved from the marine WQIs online database of the Hong Kong Environmental Protection Department (EPD). These WQIs were collected from 10 monitoring zones defined by EPD ( Figure 1). The EPD has a dedicated marine monitoring vessel which uses Differential Global Positioning Systems (DGPS) and an advanced conductivity-temperature-depth (CTD) profiler to take measurements and collects samples simultaneously once a month from 76 fixed monitoring stations. EPD measures 24 WQIs from three depths: surface water (1 m below the sea surface), middle water (half the sea depth), and bottom water (1 m above the seabed). Samples are collected in a 500 mL Nalgene bottle and analyzed for the  Figure 1).

Station-Based Water Quality Observations
Station-based data of water quality from 1999 to 2015 were retrieved from the marine WQIs online database of the Hong Kong Environmental Protection Department (EPD). These WQIs were collected from 10 monitoring zones defined by EPD ( Figure 1). The EPD has a dedicated marine monitoring vessel which uses Differential Global Positioning Systems (DGPS) and an advanced conductivity-temperature-depth (CTD) profiler to take measurements and collects samples simultaneously once a month from 76 fixed monitoring stations. EPD measures 24 WQIs from three depths: surface water (1 m below the sea surface), middle water (half the sea depth), and bottom water (1 m above the seabed). Samples are collected in a 500 mL Nalgene bottle and analyzed for the measurements of Chl-a and SS concentrations. The American Public Health Association (APHA) 20 ed 10200H 2 spectrophotometric method based in-house GL-OR-34 method is used to measure Chl-a concentration, the APHA 22ed 2540D weighing method based in-house GL-PH-23 method is used to estimate the SS concentration, and turbidity is measured on-site by the OBS-3 turbidity sensor linked to a SEACAT 19+ CTD and Water Quality Profiler.

Satellite Data
For this study, a total of 38 cloud-free scenes of L5, L7, and L8 from 1999 to 2015, covering all seasons of the year, were retrieved from the Earth Explorer. Details of satellite data are available in supplementary material (Table S1). Only visible and near-infrared (VNIR) wavelength (bands 1-4 of L5/L7 and 2-5 of L8) of same date image with EPD station data were considered as water absorbs most energy in the shortwave near-infrared (SWIR) region [34] and only "surface water" data of the station-based WQI data were used for matching with the satellite data.

In Situ Water Surface Reflectance
In situ surface reflectance of water (hereafter in situ SR) data were collected with a CROPSCAN Multispectral Radiometer (MSR). The CROPSCAN MSR is a handheld multispectral radiometer able to retrieve spectral information in 16 channels. Such in situ reflectance data can be widely used for studying vegetation health and for estimating optical properties of water [35]. These in situ SR data have wavelengths matching the multispectral bands of Landsat (bands 1-4 of L5/L7 and 2-5 for L8) ( Table 1). The above water in situ SR data were collected between 10:00 and 14:00 (local time) from 42 sites under clear weather conditions during seven field visits (06, 08, 09, 13, 15, 16, and 17 Oct 2014). Multiple spectra (at least 10 values from each sampling site) were collected and averaged for data modeling ( Figure 1). The corresponding WQIs were measured using previous stated standard methods. Table 2 shows the corresponding WQI data of the same sites.

Data Preprocessing
First, the radiometric correction was applied to all satellite images for waveband standardization. For satellite images of L5, L7, and L8, uncalibrated digital numbers (DN) were converted to the Top of Atmosphere (TOA) spectral radiance, using radiometric rescaling coefficients provided in the image metadata file [23,36]. Atmospheric correction was also conducted to reduce atmospheric interference. An approach for atmospheric correction, the "Second Simulation of the Satellite Signal in the Solar Spectrum (6S)" [37,38] with the maritime aerosol model, was applied since this setting of 6S has been demonstrated effectively over complex coastal waters of Hong Kong by [38]. In addition, the Normalized Difference Water Index (NDWI) for open waters [39] was applied to all images for separating waterbodies from land or shadow casted by terrain, and to map the coastlines, based on a contrast threshold of NIR and visible green radiation.

Estimation of Water Quality with Machine Learning Techniques
Two pairs of datasets-(i) in situ SR along with WQI data and (ii) satellite-derived SR along with WQI data-were used independently to evaluate four machine learning techniques: support vector regression (SVR), random forest (RF), artificial neural network (ANN), and Cubist regression trees (CB), based on their accuracies in estimating WQI concentration.

• Support Vector Regression (SVR)
A regression version of support vector machines, SVR, is a vector-based statistical learning technique widely used for vegetation mapping [40], land use studies [41], landscape modeling, water quality research [4], and quantification of soil properties [42]. SVR has demonstrated strong predictability even when training samples are limited [4]. Mathematically, SVR can be represented on the network output (S i ): where w i and b are coefficients that are determined by minimizing the errors between target variable and network output and X i is nonlinear mapping function. SVR is implemented with a nonlinear mapping function, a kernel function: K(X i , X) The kernel function and a hyperplane separate the input data points linearly and transform the input data to a high dimensional space. Thus model accuracy highly depends on the selection of kernel function [43]. The kernel functions include polynomial, linear, and Gaussian radius bias kernel functions. The application domain knowledge and variance of input (X) values of the training data are important for selecting a kernel type and kernel function parameters. Iteratively, adjusting the hyperplanes and reducing the errors associated with them is an optimal solution to select the kernel function [4]. The kernel function in this study was selected based on its performances for all WQIs. Weka 3.8 [44] was used for applying SVR in this study.

• Artificial Neural Network (ANN)
ANN is a decision-based machine learning technique which uses highly interconnected nodes to solve a particular problem [45]. It has been confirmed as a means for quantitative predictive modeling as ANN can handle dynamic, nonlinear, and noisy data. During the training phase, the network was trained using a supervised learning technique. The information from input layer was distributed to the hidden layers where summation and activation functions were performed. The results from hidden layer transfer to output layer which also apply summation and activation function for the estimation of output WQI value is given as follows.
where w i and z i are the weights and inputs between the hidden layer and the output layer, b is the bias associated with the output layer, i is the number of nodes in hidden layer, S is the Sigmoid activation function (Equation (3)) to handle the nonlinearity, and a is the slope parameter of the sigmoid function. This activation function transfers the weighted summed inputs to output layer. In order to achieve the appropriate results, the selected size of network and number of hidden layers and neurons/nodes are of critical importance, a large network may result in overfitting the target variable [46]. In the present study, ANN model was implemented in Python using the Levenberg-Marquardt algorithm and tangent sigmoid as the activation function. The input layer consists of reflectance bands and band combinations; the output layer has the values of target variable WQI in the present case.

• Cubist Regression Trees (CB)
Cubist regression (CB), developed based on a combination of the ideas of Quinlan [47,48], is a rule-based regression technique. CB uses a rule-based regression tree system based on instance criteria which gives multivariate regression output (i.e., each multivariate regression is based on a specific rule). Then, an explicit set of predictor variables choose a definite prediction model based on the rule/rules that best fits the predictors [49]. Since it produces rule-based multivariate regression, it is more interpretable than RF. Cubist is a commercial, proprietary product developed by RlueQuest Research, Inc. It became popular and widely used in regression and classification methods after it was ported into R by Kuhn et al. [50] in 2013. The Cubist model was implemented in R software for the present study.

• Random Forest (RF)
In brief, RF is an ensemble learning method based on multiple decision trees. It uses a random selection of the training dataset with a Gini Index to create binary splits and multiple classification and regression trees, in order to predict values of the dependent class/variable [51]. RF for regression and classification task was used by multiple studies using earth observation data [4,41,52]. The RF model of Weka 3.8 [53] was used to retrieve WQIs in this study.

Input Selection for Machine Learning Algorithm
For accurately matching satellite images with EPD station-based WQI data, a spatial window with 3 × 3 pixels was used to extract data located at each sampling station, rather than considering a single pixel for data extraction. Furthermore, EPD stations located at pixels potentially affected by scan line error on ETM+ images, wake effects were not considered. This process resulted in 120 observations of satellite-derived SR corresponding to three WQIs (Table 3). In this study, concentrations of Chl-a, SS, and turbidity, retrieved from this database, were used for training and validation of all machine learning models. Landsat bands are calibrated for land-based applications however, researchers have evaluated the potential of TM, ETM+, and OLI data to monitor coastal and inland WQIs [11,34,54,55]. Most studies used the reflectance ratio approach for developing linear, exponential, or logistic regressions as a proxy for WQI [11,56,57]. We have assessed the correlations among three optically-active WQIs and eighteen different band combinations of water reflectance, for both reflectance datasets. Only band combinations with a high R > 0.50 and p-values < 0.01 were considered as input variable in machine learning model. Both in situ SR data and satellite-derived SR data were used to evaluate the appropriate machine learning model for WQIs, as in situ SR data represent water leaving surface reflectance data from field measurements, while satellite-derived SR represents water leaving surface reflectance data on satellite image. The images cover a larger spatial extent, but reflectance values depend on the accuracy of the atmospheric correction method.

Empirical Predictive Modeling (EPM)
For consistency, same inputs were used to build multivariate regressions. Stepwise regression (backward, forward, and bidirectional elimination) was applied to select the best model by considering Akaike Information Criterion (AIC), coefficient of determination (R 2 ), mean absolute error (MAE), and root mean square error (RMSE). The backward selection is the best approach when number of samples n is larger than number of predictors p. Whereas a forward selection approach is usually applied on high dimensional datasets, but this approach is also a good choice to handle the multicollinearity issue [58]. Interassociations exist in our input data, as when SS loads are high, water reflectance increases in all bands linearly, and when Chl-a concentration is high (especially during algal bloom/red tide event), water reflectance in green/red and IR bands increases. In addition, the accuracy of regression models was further compared with all machine learning models.

Validation of Water Quality Predictions
Leave-one-out cross-validation (LOOCV) was chosen to evaluate the all predictive models for in situ SR data due to limited number of training samples (only 42 match-ups). LOOCV provides a near unbiased prediction with less manual involvement [59] and therefore is often used for model selection and evaluation. LOOCV is a specific type of k-fold cross-validation. Considering k = N (number of observations) means that the model was implemented N times, leaving one observation each time for validation. However, 10-fold cross-validation (where k = 10) was used for satellite-derived SR dataset (190 samples). In each validation in 10-fold, the coefficient of correlation (R), R 2 , RMSE, and MAE were calculated, and the final values were averaged over all folds. The R 2 , R, MAE, and RMSE were used to determine the accuracy of each model. The appropriate model suggested by both SR data sets was further used to derive water quality maps using atmospherically corrected Landsat image and the results were compared with station based WQI values recorded by EPD. The model outputs were further compared with WQI retrieved by ESA's multilayer neural network (MLNN) method "standard Case-2 Regional/Coast Colour" (C2RCC) processing chain model C2RCC-Nets. C2CRR-Nets is case-II processor originally developed by Doerffer and Schiller [60]. It uses a large database of radiative transfer simulations of water leaving radiances, as well as top-of-atmosphere radiances and neural networks for inversion modeling. It is now available through SNAP software [61].

Evaluation of Model Parameters
The RF model has an inherent procedure of calculating variable importance. In this study, we have evaluated the relative importance of input variables (bands and band combinations) based on a relative importance analysis with the RF model. This analysis evaluated all input variables using a mean decrease in impurity (MDI) or impurity reduction contribution by input variables while data splitting procedure. Mean square error reduction (MSR reduction) has been adopted to evaluate the relative importance of each variable. A high value of MSR reduction shows the importance of the variable. More details of MDI and impurity reduction can be referred to Breiman [51].
The Cubist model also provides variable usage information based on rules and conditions used in model building. This information can also be used to evaluate the input variable importance.

Selection of Band Combinations for Data Input
Based on a comparison of Pearson's R coefficients of 18 band combinations for both SR data sets, the combinations of (B3) 2 , B3/B1, B1*B3, and B2*B3 are highly correlated with SS and turbidity. Therefore, these band combinations along with the first four bands were used as input variables in four machine learning models, as well as for developing a multivariate regression model for predicting SS and turbidity for both SR data sets.
In addition, Chl-a has lower correlation with all individual bands of satellite-derived SR data, but the ratios of B3 and (B1) 2 and B4 and (B1) 2 are highly correlated. The high correlation between Chl-a and B4/(B1) 2 is consistent with previous studies as clear water has low reflectance in blue and green regions and absorbs red and IR light, although water with Chl-a can reflect a relatively high amount of green and IR radiation [62]. However, because B3 and B1 are equally absorptive for Chl-a, but SS scatters more in the red region than blue, this ratio is helpful to separate Chl-a from SS in turbid coastal waters. For both SR data sets, the variables B1-B4, B3/(B1) 2 , and B4/(B1) 2 ratios, six inputs variables (Table 4) were used to train the machine learning models and to develop multivariate regression for estimating Chl-a across the coastal areas of Hong Kong. Table 4. Bands and band combinations used in machine learning and stepwise regression models for in situ and satellite-derived SR data.

Model Selection
Model parameters, such as number of hidden layers and number of nodes (neurons) in hidden layer greatly, affect the model accuracy in the case of ANN. In order to select the best fit model, a cross-validation (k-fold) method was used to select the best fit model. Among all tested models, for in situ SR data one hidden layer with eight nodes model has resulted in the highest R 2 and lowest RMSE for all WQIs (Figure 3a-c). Whereas, for satellite SR data, in the network design, one hidden layer with fifteen nodes was selected for Chl-a and turbidity model (Figure 3d, f), and one hidden layer with 20 nodes was selected for the SS-model (Figure 3e) based on the highest cross-validation R 2 and the lowest RMSE. In the present study, for selecting SVR model, multiple kernel functions were tested using cross-validation, and finally, a polynomial kernel function was adopted with its outperformance. In addition, Chl-a has lower correlation with all individual bands of satellite-derived SR data, but the ratios of B3 and (B1) 2 and B4 and (B1) 2 are highly correlated. The high correlation between Chl-a and B4/(B1) 2 is consistent with previous studies as clear water has low reflectance in blue and green regions and absorbs red and IR light, although water with Chl-a can reflect a relatively high amount of green and IR radiation [62]. However, because B3 and B1 are equally absorptive for Chla, but SS scatters more in the red region than blue, this ratio is helpful to separate Chl-a from SS in turbid coastal waters. For both SR data sets, the variables B1-B4, B3/(B1) 2 , and B4/(B1) 2 ratios, six inputs variables (Table 4) were used to train the machine learning models and to develop multivariate regression for estimating Chl-a across the coastal areas of Hong Kong. Table 4. Bands and band combinations used in machine learning and stepwise regression models for in situ and satellite-derived SR data.

Model Selection
Model parameters, such as number of hidden layers and number of nodes (neurons) in hidden layer greatly, affect the model accuracy in the case of ANN. In order to select the best fit model, a cross-validation (k-fold) method was used to select the best fit model. Among all tested models, for in situ SR data one hidden layer with eight nodes model has resulted in the highest R 2 and lowest RMSE for all WQIs (Figure 3a-c). Whereas, for satellite SR data, in the network design, one hidden layer with fifteen nodes was selected for Chl-a and turbidity model (Figure 3d, f), and one hidden layer with 20 nodes was selected for the SS-model (Figure 3e) based on the highest cross-validation R 2 and the lowest RMSE. In the present study, for selecting SVR model, multiple kernel functions were tested using cross-validation, and finally, a polynomial kernel function was adopted with its outperformance.  First row shows different models for Chl-a (a), SS (b), and turbidity (c) when in-situ SR data was considered; and second row shows different models for Chl-a (d), SS (e), and turbidity (f) when satellite SR data was considered. The symbol ":" on the x-axis indicates the model with two hidden layers, while the number shows the number of neurons in each layer.

Evaluation of Machine Learning Regression by Using In-Situ SR Data
Based on using in situ water reflectance data collected by the MSR, four machine learning techniques (RF, SVR, ANN, and CB) were applied to estimate three optically-active WQIs. The correlation between the observed and predicted WQI derived by these machine learning approaches is shown in Figure 4; the comparison of all machine learning model against the data range is shown in Table 5.
For Chl-a concentration, there are high correlations for all models, in which ANN has the best performance with an R of 0.89, MAE of 0.20 µg/L, and RMSE of 0.27 µg/L, followed by SVR with an R of 0.76, MAE of 0.54 µg/L, and RMSE of 0.66 µg/L. The lowest performance was observed for the RF with an R of 0.71, MAE and RMSE of 0.57 µg/L, and 0.72 µg/L, respectively. CB showed slightly better results than RF with a R of 0.76, MAE with 0.54 µg/L, and RMSE with 0.66 µg/L (Figure 4). Similar findings were obtained when four machine learning regressions were applied to estimate SS concentration and turbidity. The ANN showed the best results with an R of 0.93 for SS prediction and an R of 0.82 for turbidity prediction; with the lowest values of MAE and RMSE of 0.68 mg/L and 0.70 mg/L for SS prediction, respectively, and 0.82 NTU and 0.94 NTU for turbidity prediction. The Cubist and RF model underperformed compared to the multivariate regression model (Table 6). Forward selection performed the best in case of Chl-a, and backward and bidirectional selection model perform superior in case of SS concentration estimation

Evaluation of Machine Learning Regression Using Satellite-Derived SR Data
A further test with satellite-derived SR data was conducted to evaluate the machine learning techniques for selecting a robust model. Figure 5 compares the predicted and measured WQIs when satellite-derived SR data were used as input variables in the machine learning models. With satellite-derived SR as input, ANN outperformed the others while predicting all three WQIs, obtaining R of 0.91, 0.92, and 0.85 for the estimation of Chl-a, SS, and turbidity, respectively. The prediction errors of the ANN models are also lower than those of SVR, CB, and RF. For the ANN model, the MAE and RMSE for Chl-a prediction are 1.13 µg/L and 1.40 µg/L, respectively; for SS the predictions are 1.83 mg/L and 2.06 mg/L, respectively; and for turbidity, the predictions are 2.61 NTU and 3.10 NTU, respectively. Compared to ANN, the performance of SVR is the second highest with R = 0.89 of 0.77 for Chl-a, SS predictions, respectively. The SVR model also showed the second-best results in case of turbidity with R = 0.75 with RMSE = 3.97. Cubist model also showed reliable results with R ≥ 0.75 for all Chl-a, SS, and turbidity. The RF model exhibits the poorest performance in comparison to other machine learning methods, like the comparison based on in situ SR data. The reason behind comparatively below average performance of RF could be due to the small training dataset [4,63,64]. On the other hand, SVR showed good performance and good agreement with the previous studies, in which SVR worked well with a small sample size of input variables [43]. Table 7 shows the comparison of different machine learning models along with data used for training, testing, and validation.
Machine learning methods outperformed the multivariate regression models (Table 8). Overall, all machine learning models underestimated WQIs with high concentrations, due to the fewer training samples available with high WQI concentration ( Figure 6). This underestimation is small in ANN, compared to that of SVR, CB, and RF. Remote Sens. 2019, 11, x FOR PEER REVIEW 12 of 24

Evaluation of Machine Learning Regression Using Satellite-Derived SR Data
A further test with satellite-derived SR data was conducted to evaluate the machine learning techniques for selecting a robust model. Figure 5 compares the predicted and measured WQIs when satellite-derived SR data were used as input variables in the machine learning models. With satellitederived SR as input, ANN outperformed the others while predicting all three WQIs, obtaining R of    Figure 6. Data distribution of station-based and predicted water quality indicators by all machine learning models using satellite-derived reflectance data.

The Relative Importance of Model Parameters
The RF model uses an out-of-bag data approach to calculate the relative importance of input variables. Variable usage in rule and condition building in the Cubist regression model also suggests variable importance. Therefore, the relative importance of input bands and band combinations was calculated for both SR datasets based on the MSE reduction method of the RF model and percentage usage in Cubist model (Tables 9 and 10). Based on the data retrieved from the MSR to estimate Chla concentrations, MSE reduction is the highest for B3 (Red, wavelength ≈ 0.665 µm), as water with Chl-a shows a prominent absorption in band 3 [62]. The second most influential band is B2 (green, wavelength ≈ 0.560 µm) as MSE reduction is second highest for B2. MSE reduction is also high for B3/(B1) 2 . B3, B2, and B3/(B1) 2 are also used in rule setting of the Cubist model when in situ reflectance data were used. This implies that the B3, B2, and B3/(B1) 2 are the influential inputs for predicting Chla using data from a hand-held spectroradiometer. Interestingly, almost similar results are observed for the RF and Cubist models when satellite-derived SR is used to predict Chl-a (Tables 9 and 10). The relative importance analysis by RF indicated that the ratio of B3 and (B1) is the most influential inputs and Cubist model also shows the same results as this band combination is used in rules and condition building of model. Dinoflagellates and diatoms-two main algal classes present in coastal waters of Hong Kong [33]-show high specific absorption (absorption coefficient at a given wavelength, normalized to chlorophyll concentration in the sample) in the red region (0.63-0.69 µm) but low absorption in blue (wavelength ≈ 0.490 µm) [62]. Therefore, the ratio of B3 to (B1) is well able to discriminate Chl-a concentration due to dinoflagellates and diatoms in complex waters.
For predicting SS concentration, MSE reduction is the highest for B3 as suggested by RF and B3 is also used in both rule and condition building of a Cubist model for SS when in situ SR data was used. MSE reduction is the highest for B3*B2 and this band combination also shows 98% importance for rule building in Cubist model when satellite SR data is used. These results are consistent with a local study for estimating SS in Deep Bay, Hong Kong [57], which found that B3 and B2 were highly correlated with SS concentration. B3*B1, (B3) 2 , and B4 are other important variables suggested by RF and Cubist for both reflectance datasets. For turbidity prediction, MSE reduction is the highest for band 3 and its square when using in situ and satellite-derived SR data, respectively. Similar results have been found for variable importance in Cubist regression (Table 10). This implies that band 3 and its combinations with other visible bands are sensitive for predicting SS and turbidity.

The Relative Importance of Model Parameters
The RF model uses an out-of-bag data approach to calculate the relative importance of input variables. Variable usage in rule and condition building in the Cubist regression model also suggests variable importance. Therefore, the relative importance of input bands and band combinations was calculated for both SR datasets based on the MSE reduction method of the RF model and percentage usage in Cubist model (Tables 9 and 10). Based on the data retrieved from the MSR to estimate Chl-a concentrations, MSE reduction is the highest for B3 (Red, wavelength ≈ 0.665 µm), as water with Chl-a shows a prominent absorption in band 3 [62]. The second most influential band is B2 (green, wavelength ≈ 0.560 µm) as MSE reduction is second highest for B2. MSE reduction is also high for B3/(B1) 2 . B3, B2, and B3/(B1) 2 are also used in rule setting of the Cubist model when in situ reflectance data were used. This implies that the B3, B2, and B3/(B1) 2 are the influential inputs for predicting Chl-a using data from a hand-held spectroradiometer. Interestingly, almost similar results are observed for the RF and Cubist models when satellite-derived SR is used to predict Chl-a (Tables 9  and 10). The relative importance analysis by RF indicated that the ratio of B3 and (B1) 2 is the most influential inputs and Cubist model also shows the same results as this band combination is used in rules and condition building of model. Dinoflagellates and diatoms-two main algal classes present in coastal waters of Hong Kong [33]-show high specific absorption (absorption coefficient at a given wavelength, normalized to chlorophyll concentration in the sample) in the red region (0.63-0.69 µm) but low absorption in blue (wavelength ≈ 0.490 µm) [62]. Therefore, the ratio of B3 to (B1) 2 is well able to discriminate Chl-a concentration due to dinoflagellates and diatoms in complex waters.
For predicting SS concentration, MSE reduction is the highest for B3 as suggested by RF and B3 is also used in both rule and condition building of a Cubist model for SS when in situ SR data was used. MSE reduction is the highest for B3*B2 and this band combination also shows 98% importance for rule building in Cubist model when satellite SR data is used. These results are consistent with a local study for estimating SS in Deep Bay, Hong Kong [57], which found that B3 and B2 were highly correlated with SS concentration. B3*B1, (B3) 2 , and B4 are other important variables suggested by RF and Cubist for both reflectance datasets. For turbidity prediction, MSE reduction is the highest for band 3 and its square when using in situ and satellite-derived SR data, respectively. Similar results have been found for variable importance in Cubist regression (Table 10). This implies that band 3 and its combinations with other visible bands are sensitive for predicting SS and turbidity.
Results show that the relative variable importance method is able to identify the most influential bands for Chl-a, SS, and turbidity. This can further help to select the spectral range to identify phytoplankton groups or to study coastal sediments in detail using in situ, airborne, or spaceborne hyperspectral data consisting of a large number of bands.

Comparison of ANN with In Situ and C2RCC-Nets Derived Data
Based on the accuracy of the model comparison, ANN was demonstrated to be the best model for predicting WQIs in this subtropical area. Therefore, ANN was further applied to the satellite images for mapping the spatial concentration of WQIs across the coastal waters of Hong Kong. Station-based values of WQIs measured within a one day gap were considered to validate the predicted WQI concentration, as the EPD station data within a one day gap were available considering the cloud-free and glint-free conditions of satellite data. The Chl-a and SS data estimated by the ANN model were also compared with Chl-a and suspended particulate matter (SPM) concentration retrieved by the C2RCC-Nets model (Figure 7). The ANN model developed in this study has underestimated the high values of WQI with RMSE 3.5 mg/L for SS (22-31 mg/L) and RMSE 5.0 µg/L for Chl-a concentration (22-30 µg/L), the reason for this could be due to less data records against the high values of WQI used during training of model. C2RCC-Nets overestimated the WQIs and the RMSE is high, especially for SS and Chl-a > 20. The RMSE for SS is 31.1 mg/L and for Chl-a 11.1 µg/L. This trend is also visible during algal bloom event occurred in the Deep Bay on 2nd Jan 2017 and southwestern waters on 4th Jan 2017 due to phytoplankton specie Phaeocystis globose [65]. Both events prolonged to nearly three weeks and captured in OLI image dated 8th Jan 2017 (Figure 8).
Based on the accuracy of the model comparison, ANN was demonstrated to be the best model for predicting WQIs in this subtropical area. Therefore, ANN was further applied to the satellite images for mapping the spatial concentration of WQIs across the coastal waters of Hong Kong. Station-based values of WQIs measured within a one day gap were considered to validate the predicted WQI concentration, as the EPD station data within a one day gap were available considering the cloud-free and glint-free conditions of satellite data. The Chl-a and SS data estimated by the ANN model were also compared with Chl-a and suspended particulate matter (SPM) concentration retrieved by the C2RCC-Nets model (Figure 7). The ANN model developed in this study has underestimated the high values of WQI with RMSE 3.5 mg/L for SS (22-31 mg/L) and RMSE 5.0 µg/L for Chl-a concentration (22-30 µg/L), the reason for this could be due to less data records against the high values of WQI used during training of model. C2RCC-Nets overestimated the WQIs and the RMSE is high, especially for SS and Chl-a > 20. The RMSE for SS is 31.1 mg/L and for Chl-a 11.1 µg/L. This trend is also visible during algal bloom event occurred in the Deep Bay on 2nd Jan 2017 and southwestern waters on 4th Jan 2017 due to phytoplankton specie Phaeocystis globose [65]. Both events prolonged to nearly three weeks and captured in OLI image dated 8th Jan 2017 ( Figure  8).

Limitations and Future Directions
The water quality data provided by the Hong Kong EPD were used in the present study. The environmental agency has routine measurements of biophysical and chemical properties of water, b c a Figure 9. Spatial distribution of chlorophyll-a (µg/L) (a), suspended solids (mg/L) (b), and turbidity (NTU) (c) using ANN.
The overall map shows a high concentration of SS (7.5-22.6 mg/L) in the Deep Bay and the northwestern zones, while turbidity is also high (12.0-26.3 NTU) across these zones. The sediment-laden shallow water of Deep Bay, with an approximately 2-8 m depth, is affected by discharges from Shenzhen River, discharges from Hong Kong rivers, and discharges from some local unsewered villages and land runoff [66]. The Victoria Harbour also showed high concentrations of SS and turbidity, as ship traffic is high over this area and it also receives residential and industrial effluents from adjacent urban areas [33].
Chl-a concentrations are high near the shorelines of Hong Kong, where concentrations of SS and turbidity are also high, due to nutrient-rich waters discharging from the land [33]. The high concentrations of Chl-a (8.0-15.6 µg/L) are observed in the shallow waters of Deep Bay, owing to nutrient-rich agricultural discharges. The maps also indicate higher concentrations of Chl-a in the Tolo Harbour and some areas of Mirs Bay, and these WCZs are often affected by algal blooms specifically red tides [66]. In the Victoria Harbour, moderate values of Chl-a, ranging from 7.0 to 10.0 µg/L, are observed. It is noticeable that the model predictions of this study are consistent with station-based measurements of Chl-a, SS, and turbidity, even considering the one-day gap with the satellite overpass during normal coastal conditions.

Limitations and Future Directions
The water quality data provided by the Hong Kong EPD were used in the present study. The environmental agency has routine measurements of biophysical and chemical properties of water, but not of bio-optical properties, which would be a useful indicator for various marine applications such as biodiversity, marine ecology, sediment monitoring, and recreation. This study has acquired and collected limited bio-optical data from numerous field visits; limited satellite reflectance data was available corresponding to high concentrations of WQIs. Further studies may be conducted when such data are available, for better evaluation of remote sensing methods to estimate water quality across this subtropical coastal area.

Conclusions
The study examined four machine learning approaches for retrieval of water quality indicators (Chl-a, SS, and turbidity) over the coastal waters of Hong Kong using water reflectance from both a hand-held spectroradiometer and satellite data and mapped the spatial extent of these parameters. Such maps can be used to identify hotspots for algal blooms and point pollution sources relating to high nutrient concentrations. Based on the results of cross-validation, ANN was outperformed for water quality estimation as ANN exhibits the best performance than other three machine learning approaches, irrespective of the input data used (i.e., in situ reflectance or Landsat reflectance data), resulting in R ≈ 0.9 and RMSE ≈ 0.2-1.4 for Chl-a, R ≈ 0.9 and RMSE 0.7-2.6 for SS, and R ≈ 0.85 and RMSE ≈ 0.9-3.1 for turbidity. Spatially synoptic mapping of three WQIs-Chl-a, SS, and turbidity concentrations-were derived using the ANN approach. Outputs of ANN model and standard Case-2 Regional/Coast Colour (C2RCC) processing chain model C2RCC-Nets, using a separate set of satellite data, was further compared with station-based water quality data. The coefficient of determinations are 0.70 and 0.71 for estimating SS and Chl-a, respectively, using locally calibrated ANN and R 2 of 0.51 and 0.22, respectively, were found using C2RCC-Nets for estimating SS and Chl-a, respectively.
In addition, the relative importance of each predictor variable was also examined for both reflectance data sets, in order to evaluate the contribution of each variable (wave band) for water quality prediction. In summary, both in situ and satellite-derived reflectance datasets showed similar patterns in identifying sensitive variables to predict water quality parameters. The green band and red bands are more sensitive for predicting Chl-a, and the red band and its combination with blue and green bands are sensitive for predicting SS and turbidity. The effectiveness of sensitive bands also depends on to the absorption and scattering properties of phytoplankton classes (dinoflagellates and diatoms) present in Hong Kong waters. This approach can help to select a suitable spectral range for a detailed study in other regions where the phytoplankton species may be different.
The derived spatial distributions indicated that the observed high concentrations of SS and turbidity result from residential and industrial effluents and nutrient-rich discharge from agricultural land in shallow waters. This study suggests that machine learning approaches with satellite data have promising potential for regular water quality monitoring over large complex coastal areas. The focus of future research is to investigate the seasonal and annual patterns of Chl-a and SS using hyperspectral data.