Monitoring Coastal Chlorophyll-a Concentrations in Coastal Areas Using Machine Learning Models

Harmful algal blooms have negatively affected the aquaculture industry and aquatic ecosystems globally. Remote sensing using satellite sensor systems has been applied on large spatial scales with high temporal resolutions for effective monitoring of harmful algal blooms in coastal waters. However, oceanic color satellites have limitations, such as low spatial resolution of sensor systems and the optical complexity of coastal waters. In this study, bands 1 to 4, obtained from Landsat-8 Operational Land Imager satellite images, were used to evaluate the performance of empirical ocean chlorophyll algorithms using machine learning techniques. Artificial neural network and support vector machine techniques were used to develop an optimal chlorophyll-a model. Four-band, four-band-ratio, and mixed reflectance datasets were tested to select the appropriate input dataset for estimating chlorophyll-a concentration using the two machine learning models. While the ocean chlorophyll algorithm application on Landsat-8 Operational Land Imager showed relatively low performance, the machine learning methods showed improved performance during both the training and validation steps. The artificial neural network and support vector machine demonstrated a similar level of prediction accuracy. Overall, the support vector machine showed slightly superior performance to that of the artificial neural network during the validation step. This study provides practical information about effective monitoring systems for coastal algal blooms.


Introduction
Currently, harmful algal blooms (HABs) are among the problematic environmental issues worldwide [1,2].The frequency and intensity of HAB events has dramatically increased globally since the 1970s [3,4].Since that time, fish-killing events by red tides have been observed in Korea; a red tide event of C. polykrikoides was first recorded during the 1980s and the frequency of red tide events has gradually increased up to the 2000s [5].HABs lead to severe damage to the aquaculture industry, resulting in shellfish and fish kills, and may even threaten human health [6][7][8][9].Huge economic losses are caused by HABs amounting to approximately $1 billion per year in Europe and $100 million per year in the USA [10].Economic losses suffered by Korea resulting from HABs since the 1980s amount to $121 million [11].
Ecological and biological studies have been conducted to monitor HABs and to address their critical effects [12][13][14].HAB monitoring is essential for effective decision making and to develop management strategies [15].In particular, remote-sensing techniques using multispectral optical satellite sensors have been widely used in the oceanic research field because of their extensive spatial coverage and frequent temporal resolution [16,17].For example, ocean chlorophyll (OC) algorithms have been applied to monitor the chlorophyll-a (chl-a) concentration [18].For the Korean coastal area, two-stage filtering using sea-surface temperature and the 667-nm band of Moderate Resolution Imaging Spectroradiometer (MODIS) images have been used to detect a C. polykrikoides bloom as presented in Kim, et al. [19].Son, et al. [20] applied the normalized water leaving radiance to classify non-bloom and bloom water.
The optical complexity of coastal waters, however, might result in difficulty using satellite images because coastal and estuarine waters contain a significantly high concentration of terrigenous material and inorganic particles [21,22].In addition, the resolution of satellite sensors has limitations in covering complex coastlines because one pixel may include both coastal water and the land surface [23].Satellite systems such as MODIS and the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) have relatively coarse spatial resolution; thus, they may not be applicable to monitor coastal water quality.The only possible means is monitoring via shipboard field sampling [24,25].
For this study, an intensive two-year field survey in the South Sea of Korea was conducted from 2016 to 2017.We collected chl-a samples from 62 sampling stations near coastal areas.The main objective of this study was to investigate a proper method to retrieve the chl-a concentration in coastal waters.First, OC algorithms were applied to Landsat-8 Operational Land Imager (OLI) to evaluate land imager efficiency in the retrieval of the chl-a concentration.Then, an artificial neural network (ANN) and a support vector regression (SVR) model were developed using intensive field observation and remote sensing reflectance (Rrs) data to estimate the chl-a concentration.Via this study, we propose the most desirable method for monitoring chl-a in coastal areas.

Site Description
The research area in this study is the middle of the South Sea of Korea (Figure 1).The area extends from 34 • 00 N-35 • 30 N and 127 • 00 E-129 • 30 E. This area could be characterized as a complex coastline with many aquaculture farms, influenced by an oceanic current; i.e., the Tsushima Current.The Tsushima Current is a branch of the Kuroshio Current and transports high levels of nutritious warm water to the South Sea of Korea [26,27].The first bloom event of C. polykrikoides was recorded during the early 1980s at a semi-enclosed bay; i.e., Jinhae Bay.Since the 1980s, many coastal areas in the South Sea of Korea have been affected by C. polykrikoides blooms [28][29][30].

Satellite and Field Sampling Data
We used Landsat-8 OLI, which has 11 bands in total.Bands 1 to 4 are in the visible light zone, band 5 is near infrared, bands 6 and 7 are in the shortwave infrared zone, bands 8 and 9 are the panchromatic and cirrus band, respectively, and band 10 and band 11 are in the thermal infrared zone [31].Their wavelength widths and spatial resolutions are represented in Table 1.The research area covers two orbit pathways of Landsat-8; the path/row numbers of the research area are 115/36 for the west side and 114/36 for the east side.The Landsat-8 level 1B image data were acquired from the EarthExplorer website (https://earthexplorer.usgs.gov/) of the U. S. Geological Survey (USGS).To collect in situ chl-a data; biweekly field sampling was conducted from June 2016 to September 2017.Surface water samples were collected using 15-L buckets and then filtered using Whatman GF/F glass fiber filters (47 mm in diameter with a pore size of 0.45 µm) for chl-a analysis.The filter papers were stored frozen at −20 • C for 24 h, in a dark environment and chl-a was extracted with 90% acetone from the filter paper.The chl-a concentration was analyzed using a Turner BioSystems fluorescence analyzer (Sunnyvale, CA, USA).Satellite images, which had the closest acquisition date to that of the field survey, were selected and processed.Reflectance values were extracted from the satellite images by each sampling station; a total of 147 data were extracted for model construction.Table 2 shows information on the general statistics of the chl-a, field sampling period, and Landsat-8 image counterparts.
Water 2018, 10, x FOR PEER REVIEW 3 of 17 analyzer (Sunnyvale, CA, USA).Satellite images, which had the closest acquisition date to that of the field survey, were selected and processed.Reflectance values were extracted from the satellite images by each sampling station; a total of 147 data were extracted for model construction.Table 2 shows information on the general statistics of the chl-a, field sampling period, and Landsat-8 image counterparts.

Atmospheric Correction of Satellite Data
The satellite images were first radiometrically corrected to convert digital numbers (DNs) to top-of-atmosphere (TOA) radiance, after which Rrs was calculated.The atmospheric correction for the Landsat-8 OLI images was performed using the case-2 regional coast color (C2RCC) algorithm with the Sentinel Application Platform (SNAP) software, which is accessible via the European Space Agency (ESA) website (http://step.esa.int/main/toolboxes/snap/)[32].C2RCC is an atmospheric correction algorithm, based on a neural network, for coastal and turbid inland water bodies with complex optical properties.C2RCC uses a database of simulated water leaving reflectance and TOA radiance.Training of networks aimed toward inversion of spectra for the atmospheric correction [33].

Model for Predicting Chl-a Concentration
In this study, we analyzed the performance of two types of models (i.e., OC algorithms and machine learning algorithms) for quantifying chl-a concentration.The OC and machine learning algorithms were constructed using Landsat-8 images to estimate the chl-a concentration.The ANN and SVR were constructed using MATLAB version R2017a (The MathWorks Inc., Natick, MA, USA).We used bands 1 to 4 and their ratios as input data.Three types of input datasets were used for the ANN and SVR models; surface Rrs values from bands 1 to band 4, a four-band-ratio (band 1/band 3 or 4 and band 2/band 3 or 4), and a mixed dataset containing both the four bands and four-band-ratio.

ANN Model
The ANN model is among the widely applied tools for modeling complex environmental processes and water quality modeling [35][36][37][38][39][40][41].The ANN model is constructed via three layers (i.e., the input, hidden, and output layers) that are composed of nodes.All layers in ANN are connected by weight and bias.The number of hidden nodes and hidden layers are very important model parameters because too many layers and nodes may result in over-fitting of the ANN model [42].The input layer consists of input band datasets, and the output layer has a target variable, the chl-a concentration.A hidden layer forms an internal structure of the neural network as the input data pass through [43].The data, starting from an input layer, are transferred to a hidden layer, finally reaching an output layer.Before the nodal data are transferred to the next layer, the data are multiplied and added by weight and bias.The structure of ANN can be represented by a generalized mathematical expression as in Equations ( 6) and ( 7) [44] as follows: where H i q is the hidden layer output, g i 1 is the neural network output, x i p is the pth element of the ith input variable, w h pq is the weight of the connection between the pth node of the input layer and the qth node of the hidden layer, and w 0 q1 is the weight between the qth node of the hidden layer and the output layer node.Term b is the bias term, and p and q represent the node number in the input and hidden layers.f 1 is the activation function for the input vector and f 2 is the output function calculating the scalar output.During a training process, the initial weights are randomly provided.The Mean squared error (MSE) is calculated between the final output and the observed target variable after the signal passes through the structure.After training, backpropagation is applied to update the weights via backward signal transferring in which the Levenberg-Marquardt algorithm was used [45].This algorithm is fast and stable in convergence to minimize a non-linear function [46].There are three activation functions for developing ANN in MATLAB.The activation functions include tangent-sigmoid, log-sigmoid, and linear functions.A trial-and-error method was applied to select the proper activation functions; the tangent-sigmoid function was finally chosen for this study.Other parameters in ANN include learning rate and momentum constant.Parameters such as learning rate, momentum constant, and the number of layer and nodes in a layer affects ANN performance and should be optimized [47].The learning rate, momentum constant, and the number of nodes in the hidden layer were optimized using a pattern-search algorithm in the MATLAB toolbox.

SVR Model
The support vector model (SVM) is a useful tool for pattern recognition and non-linear regression; major applications include classification, regression, and time series prediction [39,48,49].SVR is the regression version of an SVM and was used for this study [50].SVR is generally used for regression of continuous variables.The basic theory of SVR can be represented by a mathematical equation on the network output (s i ) as follows: where w i and b are the coefficients that are determined by minimizing the error between the network output and the target variable.ϕ(X i ) is a nonlinear mapping function.To simplify calculation of the nonlinear mapping function, a kernel function, κ(X i , X), is applied.Before developing the SVR model, three kernel functions were compared to maximize the performance of the SVR models.The kernel functions include linear, polynomial, and Gaussian radius bias kernel functions.All three kernel functions were pre-applied to choose an optimal kernel for the SVR model.Finally, the Gaussian radius bias kernel was selected for the study.Additionally, three parameters, box constraint, epsilon, and sigma, were used to construct the SVR model in MATLAB.A box constraint leads to a strict separation of the data by applying a cost to the error.Epsilon is a complexity factor, adjusting the number of support vectors.Sigma is a scale parameter that is relevant to model stability.For the SVR model, these three parameters were determined via pattern-search optimization.

Evaluation of Model Performance
Cross-validation was used to evaluate the overall performance of the models.20% of the data were used for validation and the remaining 80% represents the training data.The accuracy of each model was mainly evaluated via the coefficient of determination (R 2 ) and the root mean squared error (RMSE).The RMSE value can be calculated by the following equation: where, n is the number of data, i shows i th chl-a observation, and e i is the residual between the observed and estimated chl-a [51].

Atmospherically Corrected Rrs Spectra
In Figure 2 atmospherically corrected Rrs spectra are described.For a high chl-a concentration (>2.5 mg m −3 ), low Rrs was observed to be dominant in bands 1 and 2. For medium concentrations (1 to 2.5 mg m −3 ), mid-range values of Rrs spectra with large variations were observed.The spectral shapes were similar to those of previous studies that used ocean color satellites [21,[52][53][54][55].The spectral features of the oceanic chl-a concentration have also been reported in other studies.Ahn and Shanmugam [21] reported low water-leaving radiance in the 412-510-nm wavelength range for red tide waters.A study of the bio-optical properties of the Antarctic Peninsula waters showed that a high concentration of chlorophyll retrieves low blue band Rrs and lowers the blue-green ratio [55].Bricaud, Morel, Babin, Allali and Claustre [54] also showed that a higher chl-a concentration resulted in a lower reflectance in the blue wavelength.
Water 2018, 10, x FOR PEER REVIEW 7 of 17 where, n is the number of data, i shows i th chl-a observation, and ei is the residual between the observed and estimated chl-a [51].

Atmospherically Corrected Rrs Spectra
In Figure 2 atmospherically corrected Rrs spectra are described.For a high chl-a concentration (>2.5 mg m −3 ), low Rrs was observed to be dominant in bands 1 and 2. For medium concentrations (1 to 2.5 mg m −3 ), mid-range values of Rrs spectra with large variations were observed.The spectral shapes were similar to those of previous studies that used ocean color satellites [21,[52][53][54][55].The spectral features of the oceanic chl-a concentration have also been reported in other studies.Ahn and Shanmugam [21] reported low water-leaving radiance in the 412-510-nm wavelength range for red tide waters.A study of the bio-optical properties of the Antarctic Peninsula waters showed that a high concentration of chlorophyll retrieves low blue band Rrs and lowers the blue-green ratio [55].Bricaud, Morel, Babin, Allali and Claustre [54] also showed that a higher chl-a concentration resulted in a lower reflectance in the blue wavelength.

Retrieval Results Using the OC Algorithms
The OC algorithms were applied using Landsat-8 reflectance data to quantify chl-a concentrations.The performance parameters of the OC algorithms are listed in Table 3. OC1a and OC1b showed similar R 2 values of 0.2972 and 0.2957, respectively.OC1c and OC1d also showed similar performance, with R 2 values of 0.2992 and 0.2930, respectively.Among the OC algorithms, OC1c showed the best performance.The performances of OC2b and OC2 were poor as their respective R 2 values were 0.0194 and 0.0620.OC3d yielded an R 2 of 0.2960, similar to that of the OC1 group.Compared to the other OC algorithms, the OC2 group showed poor estimation performance.The scatter plots of in situ and estimated chl-a concentrations for the OC algorithms are shown in Figure 3.For all ranges, the OC algorithm tended to underestimate chl-a concentrations, except for the OC2 group.As shown in Figure 3e and 3f, the OC2 group showed overestimation at low chl-a concentrations (<3 mg m −3 ).The OC1 group and OC3d showed comparable estimations of chl-a, while

Retrieval Results Using the OC Algorithms
The OC algorithms were applied using Landsat-8 reflectance data to quantify chl-a concentrations.The performance parameters of the OC algorithms are listed in Table 3. OC1a and OC1b showed similar R 2 values of 0.2972 and 0.2957, respectively.OC1c and OC1d also showed similar performance, with R 2 values of 0.2992 and 0.2930, respectively.Among the OC algorithms, OC1c showed the best performance.The performances of OC2b and OC2 were poor as their respective R 2 values were 0.0194 and 0.0620.OC3d yielded an R 2 of 0.2960, similar to that of the OC1 group.Compared to the other OC algorithms, the OC2 group showed poor estimation performance.The scatter plots of in situ and estimated chl-a concentrations for the OC algorithms are shown in Figure 3.For all ranges, the OC algorithm tended to underestimate chl-a concentrations, except for the OC2 group.As shown in Figure 3e,f, the OC2 group showed overestimation at low chl-a concentrations (<3 mg m −3 ).The OC1 group and OC3d showed comparable estimations of chl-a, while OC2 and OC2b made it difficult to ascertain significant correlations between the estimated and observed chl-a; all of the estimated chl-a concentrations were less than 2.5 mg m −3 .
Water 2018, 10, x FOR PEER REVIEW 8 of 17 OC2 and OC2b made it difficult to ascertain significant correlations between the estimated and observed chl-a; all of the estimated chl-a concentrations were less than 2.5 mg m −3 .OC algorithms have shown high performance in ocean chl-a studies using oceanic color sensors such as SeaWiFS, MODIS-Aqua, etc. [56][57][58].In this study, OC algorithm applications using Landsat-8 images did not show good performance as compared to previous studies using oceanic color sensors [56][57][58].This result can be attributed to atmospheric correction using ACOLITE.Overestimation of the green wavelength (563 nm) can be observed by calculating the low aerosol contribution [59].Because the OC algorithms use the 563-nm band as a denominator of the band ratio, a relatively high magnitude of green band can cause underestimation of the chl-a concentration.The spectral resolution of Landsat-8 is lower than that of other satellite sensors typically used in oceanic studies.While bands 2, 3, and 4 of the Landsat-8 images have 65-nm, 75-nm, and 50-nm ranges of band width, respectively, MODIS has a 10-to 20-nm range in width through the 443 nm to 555 nm bands.Visible Infrared Imaging Radiometer Suite (VIIRS) also has an approximately 20-nm bandwidth range [60].SeaWiFS, commonly used for developing OC algorithms, has a 20-nm range [61].A broader bandwidth might decrease the sensitivity of surface reflectance in terms of the chl-a concentration [62].

Determination of Optimized Model Parameters
The parameters optimized for the ANN and SVR models are shown in Table 4.The learning rate and momentum constant for the four-band dataset was 0.5000 and 0.5625, respectively, and the number of hidden layer nodes was 6.For the four-band-ratio dataset, the optimized learning rate was 0.1250 and the momentum constant was 0 and the hidden layer node number was 7. The ANN model, constructed using a mixed dataset, had a learning rate of 0.4980, a momentum rate of 0.9990, and 4 hidden layers, which are similar to that of the four-band dataset.The epsilon, kernel scale, and box constraint were optimized, as described in Table 4.The epsilon values were 0.0583, 0.0505, and 0.0999 for the four-band, four-band-ratio, and mixed datasets, respectively.
The kernel scale was 2.0005 for the four-band dataset, 500.0005for the four-band-ratio dataset, and 2.9848 for the mixed dataset.The four-band-ratio showed the highest kernel scale value.The box constraint for the four-band dataset was 206.5474 and was the lowest.The four-band-ratio and mixed datasets showed similar values, 533.9840 and 511.2826, respectively.

Retrieval Results Using the Machine Learning Algorithms
The performance of the optimized ANN and SVR models is shown in Table 5.The R 2 and RMSE values for both the training and validation sets were used to compare the performances of the ANN and SVR models.Both the ANN and SVR models had three different models in terms of the three different inputs: four-band, four-band-ratio, and mixed datasets.The overall performances of the SVR models were better than those of the ANN models.In the case of the ANN model, the R 2 values for the training and validation steps of the four-band dataset were 0.4368 and 0.6322 and the RMSE values were 1.0444 mg m −3 and 1.2187 mg m −3 , respectively.The R 2 values of the four-band-ratio dataset were 0.6663 and 0.3886 while the RMSE values were 0.8626 mg m −3 and 1.3619 mg m −3 for the training and validation steps, respectively.For the mixed dataset, the R 2 and RMSE values for the training and validation steps were 0.6621 and 0.2199, and 0.8713 mg m −3 and 1.5943 mg m −3 , respectively.The four-band-ratio dataset showed the highest R 2 values for the training step and the four-band dataset was the highest for the validation step.However, the training R 2 value of the four-band dataset was less than the validation.In terms of RMSE, the four-band and four-band-ratio datasets showed similar overall performance but the difference between the training and validation was lower in the four-band dataset.The scatter plots for the ANN models are shown in Figure 4.The four-band model overestimated the chl-a concentration in the low chl-a region (<2 mg m −3 ) and underestimated mid-to-high concentrations (>3 mg m −3 ) during both the training and validation steps.In the case of the four-band-ratio model, the under-and over-estimation trend was weak during the training step and showed a relatively high R 2 value.However, during the validation step a strong overestimation trend was apparent at low concentrations (<2 mg m −3 ).In the mixed-dataset model, a high concentration of 6 mg m −3 was well estimated compared to that of the other datasets.
Overall, the optimized SVR models developed using the four-band and mixed datasets showed better performance than the optimized ANN models.The estimation trends were similar to the ANN models, underestimation at high concentrations (>3 mg m −3 ) and overestimation at low concentrations (<2 mg m −3 ).For the four-band model, the R 2 values were 0.7119 and 0.7648 for the training and validation, respectively.However, the R 2 values during the validation step were higher than those during the training step for both models.The RMSE values were also the best for training and validation at 0.7442 mg m −3 and 0.9633 mg m −3 , respectively.The four-band-ratio dataset showed the worst performance in terms of both R 2 and RMSE; 0.0082 and 0.0056 for R 2 and 1.5337 mg m −3 and 1.5849 mg m −3 for RMSE for the training and validation steps, respectively.Repeated training was conducted, however, the SVR model using the four-band-ratio dataset was not well-trained.The R 2 and RMSE values of the mixed dataset were 0.6948 and 0.8294 mg m −3 for the training step and 0.6948 and 0.9933 mg m −3 for the validation step, respectively.Except for the four-band ratio, the performances of the SVR model in both training and validation were superior to those of the ANN model.Figure 5 shows the scatter plots of the SVR results using each dataset.Overestimation was dominant at low concentrations (<2 mg m −3 ) and underestimation was prevalent at medium and high concentrations (>3 mg m −3 ).Table 5 shows that the SVR model is superior to the ANN model in terms of the R 2 and RMSE values.In addition, the SVR model results in a lesser performance difference between the training and validation steps.Dzwonkowski and Yan [63] used five reflectance bands between 443 nm and 670 nm obtained from SeaWiFS and a neural network to estimate chl-a in coastal waters.Vilas, et al. [64] developed three different ANN models using Medium Resolution Imaging Spectrometer (MERIS) images.For SVRs, there are few research studies of oceanic chl-a.Zhan,et al. [65] used a SeaBAM dataset, used for developing OC algorithms, to estimate chl-a using an SVR model.The performance range was approximately from 0.5 to 0.9.As described in Section 3.1, Landsat-8 OLI, which has a broad spectral resolution, has poor ability in discriminating pigments.This implies that machine learning techniques can overcome the weak points of the empirical equations.In particular, the underestimation that results from the broadband width was efficiently fixed.In terms of a simple performance of chl-a estimation, the model developed in this study can be considered to have low accuracy.For coastal and estuary waters, however, a high spatial resolution is powerful.The latest studies noted that a land imager such as Sentinel-2 and Landsat-8 can retrieve reliable results from these bodies of water [52,66,67].In recent, Automatic Model Selection Algorithm (AMSA) to determine the best model was developed for several datasets from MERIS and MODIS-Aqua by Blix and Eltoft [68].SVR and Gaussian Process Regression (GPR) was selected as the best model and the r-squared performance was between 0.75 and 0.96.The determining of the best model structure and dataset is similar to our study.However, the characteristic of the dataset is different.In this study, four band Rrs dataset had been used to create four-band-ratio and mixed dataset.This can be used for determining the best dataset from a raw Rrs dataset.This implies that machine learning techniques can overcome the weak points of the empirical equations.In particular, the underestimation that results from the broadband width was efficiently fixed.In terms of a simple performance of chl-a estimation, the model developed in this study can be considered to have low accuracy.For coastal and estuary waters, however, a high spatial resolution is powerful.The latest studies noted that a land imager such as Sentinel-2 and Landsat-8 can retrieve reliable results from these bodies of water [52,66,67].In recent, Automatic Model Selection Algorithm (AMSA) to determine the best model was developed for several datasets from MERIS and MODIS-Aqua by Blix and Eltoft [68].SVR and Gaussian Process Regression (GPR) was selected as the best model and the r-squared performance was between 0.75 and 0.96.The determining of the best model structure and dataset is similar to our study.However, the characteristic of the dataset is different.In this study, four band Rrs dataset had been used to create four-band-ratio and mixed dataset.This can be used for determining the best dataset from a raw Rrs dataset.In addition, distribution maps of chl-a, created using machine learning models, are shown in Figure 6.From the raw Landsat-8 image, the ANN and SVR models were applied and chl-a maps were developed.For the raw images, a total of three images, 3 August, 12 August, and 13 September 2017, were used.To avoid an overfitting problem, the models that had the lowest difference between the training and validation performance was selected.The four-band-ratio and four-band datasets were chosen for the ANN and SVR models, respectively.In the ANN model case, for (a), (b) and (c), the trends in the coastal areas have a relatively high chl-a concentration and open ocean has under 2 mg m −3 appeared, generally.However, the four-band SVR model showed poor distribution In addition, distribution maps of chl-a, created using machine learning models, are shown in Figure 6.From the raw Landsat-8 image, the ANN and SVR models were applied and chl-a maps were developed.For the raw images, a total of three images, 3 August, 12 August, and 13 September 2017, were used.To avoid an overfitting problem, the models that had the lowest difference between the training and validation performance was selected.The four-band-ratio and four-band datasets were chosen for the ANN and SVR models, respectively.In the ANN model case, for (a), (b) and (c), the trends in the coastal areas have a relatively high chl-a concentration and open ocean has under 2 mg m −3 appeared, generally.However, the four-band SVR model showed poor distribution performance.There was no remarkable difference between the coastal and open ocean areas.In the case of 13 September 2017, a high concentration distribution appeared but there was also more than 4 mg m −3 in the open ocean area.
performance.There was no remarkable difference between the coastal and open ocean areas.In the case of 13 September 2017, a high concentration distribution appeared but there was also more than 4 mg m −3 in the open ocean area.

Conclusions
The purpose of this research was to evaluate the performance of OC algorithms and develop optimal ANN and SVR models for chl-a estimation in coastal waters.The major findings are as follows: 1.
Seven OC algorithms were evaluated after applying various calibration gains.All OC algorithms showed poor performance using Landsat-8 satellite data.OC1c showed the best performance among the OC algorithms with an R 2 of 0.2992.The R 2 values of OC2 and OC2b were less than 0.1.

2.
The ANN and SVR models showed better estimation performance than that of the OC algorithms.Compared to previous studies using oceanic color sensors, the machine-learning techniques using Landsat-8 images showed satisfactory performance.

3.
The SVR model showed slightly better results than those of the ANN model during the training and validation steps.The four-band-ratio dataset SVR is not appropriate for chl-a estimation.However, the ANN model generated a more reasonable and reliable distribution of chl-a as compared to the raw image.
This study demonstrates that Landsat-8 OLI satellite data can potentially be used in coastal and oceanic research for remote sensing of HABs and that machine-learning techniques are efficient and useful tools to estimate chl-a concentration using reflectance data obtained by Landsat-8 OLI satellites from complex coastal waters.

Figure 1 .
Figure 1.Map of research site including field sampling stations (green circles).

Figure 1 .
Figure 1.Map of research site including field sampling stations (green circles).

Figure 2 .
Figure 2. Rrs spectra.Each line represents Rrs spectrum from each station.

Figure 4 .
Figure 4. Scatter plots of in situ and estimated chl-a for the ANN model.(a) the training results for the four-band, (d) the validation results for the four-band, (b) the training results for the four-band-

Figure 4 .
Figure 4. Scatter plots of in situ and estimated chl-a for the ANN model.(a) the training results for the four-band, (d) the validation results for the four-band, (b) the training results for the four-band-ratio, (e) the validation results for the four-band-ratio, (c) the training results for mixed dataset (f) the validation results for mixed dataset.

Water 2018, 10 , 1020 12 of 17 Water
2018, 10, x FOR PEER REVIEW 12 of 17 ratio, (e) the validation results for the four-band-ratio, (c) the training results for mixed dataset (f) the validation results for mixed dataset.

Figure 5 .
Figure 5. Scatter plots of in situ and estimated chl-a for the SVR model using the Gaussian radius bias kernel.(a) the training results for the four-band, (d) the validation results for the four-band, (b) the training results for the four-band-ratio, (e) the validation results for the four-band-ratio, (c) the training results for mixed dataset (f) the validation results for mixed dataset.

Figure 5 .
Figure 5. Scatter plots of in situ and estimated chl-a for the SVR model using the Gaussian radius bias kernel.(a) the training results for the four-band, (d) the validation results for the four-band, (b) the training results for the four-band-ratio, (e) the validation results for the four-band-ratio, (c) the training results for mixed dataset (f) the validation results for mixed dataset.

Figure 6 .
Figure 6.Chl-a distribution map developed using machine learning models and the Landsat-8 OLI images of 3 August, 12 August, and 13 September 2017.(a-c) show the four-band-ratio ANN results and (d-f) show the four-band SVR model maps.

Figure 6 .
Figure 6.Chl-a distribution map developed using machine learning models and the Landsat-8 OLI images of 3 August, 12 August, and 13 September 2017.(a-c) show the four-band-ratio ANN results and (d-f) show the four-band SVR model maps.

Table 2 .
Information regarding the satellite images, field sampling, and extracted band data.

Table 3 .
Ocean Chlorophyll (OC) algorithms and performance of the Landsat-8 images.

Table 4 .
Best optimized parameters for the artificial neural network (ANN) and support vector model (SVR) models.

Table 5 .
Model performance of the ANN and SVR models.