An Integrative Remote Sensing Application of Stacked Autoencoder for Atmospheric Correction and Cyanobacteria Estimation Using Hyperspectral Imagery

Hyperspectral image sensing can be used to effectively detect the distribution of harmful cyanobacteria. To accomplish this, physicaland/or model-based simulations have been conducted to perform an atmospheric correction (AC) and an estimation of pigments, including phycocyanin (PC) and chlorophyll-a (Chl-a), in cyanobacteria. However, such simulations were undesirable in certain cases, due to the difficulty of representing dynamically changing aerosol and water vapor in the atmosphere and the optical complexity of inland water. Thus, this study was focused on the development of a deep neural network model for AC and cyanobacteria estimation, without considering the physical formulation. The stacked autoencoder (SAE) network was adopted for the feature extraction and dimensionality reduction of hyperspectral imagery. The artificial neural network (ANN) and support vector regression (SVR) were sequentially applied to achieve AC and estimate cyanobacteria concentrations (i.e., SAE-ANN and SAE-SVR). Further, the ANN and SVR models without SAE were compared with SAE-ANN and SAE-SVR models for the performance evaluations. In terms of AC performance, both SAE-ANN and SAE-SVR displayed reasonable accuracy with the Nash–Sutcliffe efficiency (NSE) > 0.7. For PC and Chl-a estimation, the SAE-ANN model showed the best performance, by yielding NSE values > 0.79 and > 0.77, respectively. SAE, with fine tuning operators, improved the accuracy of the original ANN and SVR estimations, in terms of both AC and cyanobacteria estimation. This is primarily attributed to the high-level feature extraction of SAE, which can represent the spatial features of cyanobacteria. Therefore, this study demonstrated that the deep neural network has a strong potential to realize an integrative remote sensing application.

generate the concentration map, the optical feature of the reflectance spectra is taken into account, wherein the optical feature bands can be reduced in the SAE. In other words, SAE is a promising tool that can be implemented for AC and cyanobacteria estimation. However, only a few studies have been performed using autoencoder with hyperspectral images. Moreover, an integrated remote sensing application for AC and cyanobacteria estimation using a deep neural network has not been realized yet. To address these challenges, this study aimed at achieving the following goals: 1) development of an SAE network for AC and cyanobacteria estimation; 2) generation of quantitative cyanobacteria bloom maps; and, 3) comparison of the SAE models with a conventional machine learning model for model performance evaluation.

Study Site
The Baekje reservoir is located at the Geum River in South Korea, particularly at the mid-western region (36 • 31´87.75´´N, 126 • 93´90.52´´E) (Figure 1). Baekje Weir has a length of 23 km, a basin area of 7,976 km 2 , and a water storage capacity of 24.2 million·m −3 . Most of the water is consumed for domestic, industrial, and agricultural purposes. Recently, cyanobacterial blooms have been occurring at Baekje Weir during summer, mainly due to the excessive nutrient supply from non-point sources including soil erosion and runoff from livestock farms, as well as rural and domestic wastes [36].
Remote Sens. 2020, 12, x FOR PEER REVIEW 3 of 24 spectra is taken into account, wherein the optical feature bands can be reduced in the SAE. In other words, SAE is a promising tool that can be implemented for AC and cyanobacteria estimation. However, only a few studies have been performed using autoencoder with hyperspectral images. Moreover, an integrated remote sensing application for AC and cyanobacteria estimation using a deep neural network has not been realized yet. To address these challenges, this study aimed at achieving the following goals: 1) development of an SAE network for AC and cyanobacteria estimation; 2) generation of quantitative cyanobacteria bloom maps; and, 3) comparison of the SAE models with a conventional machine learning model for model performance evaluation.

Study Site
The Baekje reservoir is located at the Geum River in South Korea, particularly at the midwestern region (36° 31´ 87.75´´N, 126° 93´ 90.52´´E) (Figure 1). Baekje Weir has a length of 23 km, a basin area of 7,976 km 2 , and a water storage capacity of 24.2 million·m −3 . Most of the water is consumed for domestic, industrial, and agricultural purposes. Recently, cyanobacterial blooms have been occurring at Baekje Weir during summer, mainly due to the excessive nutrient supply from nonpoint sources including soil erosion and runoff from livestock farms, as well as rural and domestic wastes [36].

Field and Experimental Data
This study implemented field monitoring and experimental analysis four times in 2016 and five times in 2017. Table 1 shows the overall sampling information of algal pigments, water temperature, and the number of sampling points. The total number of sampling points allowed for the identification of the amount of data that was utilized in the training and validation of the deep

Field and Experimental Data
This study implemented field monitoring and experimental analysis four times in 2016 and five times in 2017. Table 1 shows the overall sampling information of algal pigments, water temperature, and the number of sampling points. The total number of sampling points allowed for the identification of the amount of data that was utilized in the training and validation of the deep learning model. Each monitoring point contained observed data, including that of PC, Chl-a, and surface reflectance spectra. During the field sampling, airborne monitoring was conducted for hyperspectral imagery sensing along the Baekje Weir region. Monitoring was conducted under clear sky conditions. A field spectroradiometer (ASD FieldSpec 4 Hi-Res; ASD Inc., USA) was used to measure the optical parameters, such as downwelling irradiance, downwelling radiance, and water-leaving radiance. The spectroradiometer had a spectral range from 350 nm to 2500 nm, with optical data being recorded at 1 nm interval. The spectral resolution of the device was 3 nm at approximately 700 nm and 8 nm at 1400 nm and 2100 nm, with the spectral bandwidth being 1.4 nm from 350 nm to 1000 nm and 1.1 nm from 1001 nm to 2500 nm. The measured optical parameters were used to calculate the remote sensing reflectance of the water surface, using the following equation: where R rs is the remote sensing reflectance (sr -1 ), L w is the water-leaving radiance (W·sr -1 ·m -2 ), L sky is radiance from the sky (W·sr -1 ·m -2 ), and E d is irradiance from the sky (W·m -2 ). The downwelling irradiance was measured by a cosine detector fore-optic. And, the radiance data were measured with bare fiber fore-optic. The measurement positions of the field spectroradiometer were strictly maintained for the zenith angle less than 42 • and azimuth angle less than 135 • [37]. This study adopted the skylight correction as a constant value of 0.025, by considering clear sky and gentle breeze condition (i.e., wind speed < 5 m s -1 ) [38]. The remote sensing reflectance data was then used to evaluate the AC performance of deep learning approaches. At the same location of measuring the optical data, Water samples were collected from the same locations as the optical data to determine the algal pigment concentrations at Baekje Weir. Water bottles of 2 L capacity were for the sample collection for Chl-a analysis. In addition, plankton net (DAIHAN CHEMLAB Inc. South Korea) having 20 µm mesh size was used to concentrate water of 10 L. The 100 mL water bottle contained concentrated samples for PC analyses. All water samples were preserved in an ice box and transported to the laboratory immediately after field sampling for pigment extraction. Chl-a concentration was analyzed as the biomass indicator of algae [39]. The solvent extraction method was used to extract the Chl-a pigment [40].
A freezing and thawing method was implemented to extract PC pigment, which is an indicator of cyanobacteria biomass [41]. The water samples were homogenized using a sonicator (Sonictopia Inc., South Korea), and were then centrifuged at 4000 rpm at 4 • C for 15 minutes. 5 mL of phosphate buffer (pH 7.2) was then added to the remaining pellets. These resulting samples were then stored in a dark room for 24 h at −20 • C. After the freezing step, the samples were thawed at room temperature. The samples were agitated using a shaking incubator (N-BIOTECK Inc. South Korea), at a speed of 150 rpm. The combination of freezing, thawing, and shaking processes facilitated the release of the PC pigment, without releasing the Chl-a pigment. After which, the samples were centrifuged at 4000 rpm at 4 • C for 15 minutes. The absorbance of the supernatant of the samples was measured using a Cary-5000 UV-VIS-NIR spectrophotometer. The following equation was used to determine the PC concentration: where OD(620) is the optical density at 620 nm; OD(652) is the optical density at 652 nm; q is 0.474; and p is 5.34 referred by [41].

Hyperspectral Image Sensing
The AISA eagle sensor (SPECIM Inc., Finland) attached to an aircraft captured the hyperspectral images of the Baekje Weir. The airborne monitoring was conducted when the zenith angle of the sun was between 35 • and 65 • , in order to minimize the sun glint and shading effect. The flying time was less than 3 hours and the flying altitude was 3 km above the ground. The hyperspectral sensor has a full width at half maximum (FWHM), from 4.36 nm to 4.82 nm. A spectral information of the sensor has signal to noise ratio (SNR) as 1,250:1. The field of view (FOV) and instantaneous field of view (IFOV) were 39.7 degrees and 0.039 degrees, respectively. In addition, the swath width of the AISA eagle sensor had 1024 pixels, with a spatial resolution of 2 m. The sensor had a spectral range from 400 nm to 970 nm, with a spectral resolution of 4-5 nm. Image processing was implemented using MODTRAN 6 software. The MODTRAN is a scalar radiative transfer code calculating AC parameters (i.e., path radiance, solar flux, direct transmittance, diffuse transmittance, and spherical albedo) [42]. The default atmospheric condition was assigned, simulating the software. The statistical band model assigned the radiative transfer algorithms for generating atmospheric correction parameters from MODTRAN 6. Specifically, the multiple scattering algorithm was selected for the discrete ordinate radiative transfer algorithm. A mid-latitude summer atmospheric profile was selected and 400 ppmv of CO 2 concentration was set for the atmospheric profile. The aerosol specification was set to rural boundary aerosol. Furthermore, the sampling time and geographic coordinates were used for solar geometry, including solar zenith angle and azimuth angle. More detailed information of MODTRAN 6 implementation is described in [43]. The AC parameters and digital numbers from 400 nm to 800 nm were then used as the input dataset of the data-driven models, to directly estimate water surface reflectance, thereby sequentially estimating cyanobacteria concentrations.

Autoencoder
Autoencoder is a neural network for unsupervised feature learning [44]. The typical structure of the autoencoder is presented in Figure 2a. The representative layers of the autoencoder are composed of an encoder and a decoder, that are composed of the following nonlinear autoencoder functions: where f(x) and g(x) are the encoder and decoder functions, respectively; W e and W d represent the weight matrices, while b e and b d are the bias vectors. For the activation function, sigmoid function was utilized by the encoding and decoding layers, as given in Equations (5) and (6): In the encoding layer, the image pixels are fed as input feature. Spectral and spatial information of the input pixels are then compressed and encoded to the middle layer, thereby reducing the number of hidden nodes. In the decoding layer, the terminal nodes are reconstructed to be identical to the original input image.
The encoding layer transforms high-dimensional data into low-dimensional data, while decoding recovers the low-dimensional data and turns it into a high-dimensional data that is identical to the original input structure [19]. Herein, the hidden nodes of the autoencoder layers deal with manifold features from the hypercubes of the input image. In particular, the autoencoder has advantages in feature extraction and dimensionality reduction of nonlinear data [45]. However, it is only limited to a small number of spectral bands. Handling hundreds of hyperspectral bands would be inadvisable for the autoencoder, since the data complexity causes difficulty in extracting proper abstractions of the input feature. Thus, this study introduced a variant autoencoder network in the form of the SAE. Detailed information and the mathematical formula of the SAE are explained in the following section.

Stacked Autoencoder (SAE)
The fundamental principle of the SAE is similar to that of the original autoencoder network. SAE is an alternative to the basic autoencoder network, when dealing with complex feature information of the hyperspectral image cube [46]. Contrary to the autoencoder that has a single hidden layer, SAE consists of multiple encoding and decoding layers (Figure 2b), as represented by the following equations: where fk(x) and gt(x) are the encoder and decoder functions in the k-th and t-th layer, respectively, Wk,e and Wt,d represent the weight matrices in the k-th and t-th layer, while bk,e and bt,d are the bias vectors.
To optimize the SAE network, the error between input and output data should be minimized. The mean squared error (MSE) of each iteration was determined, while the lowest MSE value was identified using the cost function below: where Y is the cost function, N denotes the number of nodes, g(x) represents the reconstructed input, and Io is the original input. The input data for AC included the AC parameters and digital number, while remote sensing reflectance with 86 bands between 400 nm and 800 nm was the input for the cyanobacteria estimation. To train the SAE network, the backpropagate error derivatives update the network parameters in the autoencoder layers in the network using the function in Equtionuation (10) where Af represents the autoencoder functions.

Stacked Autoencoder with ANN and SVR
This study utilized the feature extraction and dimensionality reduction of the SAE network, to implement AC and cyanobacterial estimation with artificial neural network (ANN) and support vector regression (SVR), as fine-tuning operators of the SAE network. The ANN model is a feedforward neural network capable of the regression task with nonlinear environmental data [47].

Stacked Autoencoder (SAE)
The fundamental principle of the SAE is similar to that of the original autoencoder network. SAE is an alternative to the basic autoencoder network, when dealing with complex feature information of the hyperspectral image cube [46]. Contrary to the autoencoder that has a single hidden layer, SAE consists of multiple encoding and decoding layers (Figure 2b), as represented by the following equations: where f k (x) and g t (x) are the encoder and decoder functions in the k-th and t-th layer, respectively, W k,e and W t,d represent the weight matrices in the k-th and t-th layer, while b k,e and b t,d are the bias vectors.
To optimize the SAE network, the error between input and output data should be minimized. The mean squared error (MSE) of each iteration was determined, while the lowest MSE value was identified using the cost function below: where Y is the cost function, N denotes the number of nodes, g(x) represents the reconstructed input, and I o is the original input. The input data for AC included the AC parameters and digital number, while remote sensing reflectance with 86 bands between 400 nm and 800 nm was the input for the cyanobacteria estimation. To train the SAE network, the backpropagate error derivatives update the network parameters in the autoencoder layers in the network using the function in Equation (10) where A f represents the autoencoder functions.

Stacked Autoencoder with ANN and SVR
This study utilized the feature extraction and dimensionality reduction of the SAE network, to implement AC and cyanobacterial estimation with artificial neural network (ANN) and support vector regression (SVR), as fine-tuning operators of the SAE network. The ANN model is a feedforward neural network capable of the regression task with nonlinear environmental data [47]. The hidden layer of ANN model is composed of trainable weight and biases in the hidden nodes. These nodes capture the input features after which deliver the traits to the consecutive layer, by using the nonlinear activation function. The training of the ANN model optimized the weights and biases, in order to minimize the error between measured and estimated results. The SVR model has been utilized for the regression problem with multivariate datasets. The SVR model projects the training data to the higher dimensional feature space, utilizing nonlinear kernel function [48]. The kernel function makes the nonlinear data into linear in the feature space for solving linear regression. After assigning kernel function, the SVR model is trained to minimize the error between observed and estimated data.
The SAE network with ANN and SVR was able to provide water surface reflectance from AC, and PC and Chl-a pigments from cyanobacteria. The path radiance, solar flux, direct transmittance, diffuse transmittance, and spherical albedo were assigned as atmospheric influence input, and digital numbers were represented to the optical information input for AC. These data were fed into SAE network input for atmospheric and optical feature extraction, after which consecutive ANN and SVR models estimated the surface reflectance spectra. The estimated reflectance data were fed into a sequential SAE model for extracting features of water surface reflectance, thereby estimating algal pigment concentration in the consecutive models. These comprehensive processes and data compositions followed the conventional remote sensing application for water quality estimations.
To run the SAE, ANN, and SVR, the TensorFlow library was adopted. Figure 3 shows the deep neural network structure composed of two SAE networks, which were followed by the fine-tuning layers. The parameters of the data-driven model were adjusted using several empirical experiments [49,50]. The learning rate, number of hidden nodes and layers, activation function, and kernel functions were significant variables for the data-driven model performance. For the convenience of the reader, SAE with ANN and SVR are denoted as SAE-ANN and SAE-SVR, respectively.

Model Comparison
This study evaluated and compared the performances between the conventional machine learning models and deep neural network models. ANN and SVR models without SAE were implemented to estimate water surface reflectance and cyanobacterial concentration. The learning rate and the number of layers and nodes were adjusted iteratively. In addition, the different activation functions of ANN and the different kernel functions of SVR were tested and adopted based on their performances. This study compared the performances between SAE-ANN, SAE-SVR, ANN, and SVR, in estimating the water surface reflectance and cyanobacterial concentration; 70% and 30% of the input data were used as the training and validation dataset, respectively.  ) for atmospheric correction and cyanobacterial estimation, stacked autoencoder#1 and fine tuning#1, for the water surface reflectance estimation using hyperspectral image data inputs including total flux, diffuse transmittance, direct transmittance, spherical albedo, path radiance, digital number, and point sample number, stacked autoencoder#2 and fine tuning#2 for the PC and Chl-a estimations, using atmospherically-corrected reflectance spectra. ) for atmospheric correction and cyanobacterial estimation, stacked autoencoder#1 and fine tuning#1, for the water surface reflectance estimation using hyperspectral image data inputs including total flux, diffuse transmittance, direct transmittance, spherical albedo, path radiance, digital number, and point sample number, stacked autoencoder#2 and fine tuning#2 for the PC and Chl-a estimations, using atmospherically-corrected reflectance spectra.

Accuracy
The performance of the data-driven model was evaluated using the root mean squared error (RMSE), mean absolute error (MAE), and Nash-Sutcliffe efficiency (NSE). The RMSE, MAE, and NSE functions are represented by Equations (11)-(13), respectively: where P t is the estimated surface reflectance (sr -1 ), PC (mg m -3 ), or Chl-a (mg m -3 ); O t is the observed surface reflectance, PC, or Chl-a; O a is the average surface reflectance, PC, or Chl-a; and n is the number of samples. Table 1 shows the concentrations of PC and Chl-a. This information was used to identify the temporal variations in the PC concentration as ranging between 0.19 mg m -3 to 146.99 mg m -3 and Chl-a concentration from 8.45 mg m -3 to 111.40 mg m -3 during the monitoring periods. Water temperature varied from 12.93 • C to 31.06 • C during the sampling periods. In particular, the pigments data collected in August 2016 showed considerable variations, with PC ranging between 6.04-146.99 mg m -3 and Chl-a between 14.19-111.40 mg m -3 . The high PC concentration indicated the outbreak of cyanobacterial blooms. It was found that the dominant cyanobacterial genera were Microcystis and Oscillatoria (Table 1).

AC Performance of SAE
This study adopted SAE #1 layer configuration to a 7-6-5-3-5-6-7 hidden node, with encoding and decoding layers ( Figure 3). The first layer with seven nodes represents the input layer, consisting of five AC parameters, a digital number, and a sampling event number for each wavelength band. After training the SAE #1, the manifold feature layer (middle layer with three nodes) was used as the input for atmospheric correction of the fine-tuning operators (ANN and SVR). The ANN model had 3-10-5-1 nodes for each layer, wherein the input layer with three nodes corresponded to the results of the manifold feature layer. Meanwhile, the ANN output layer estimated the surface reflectance for each wavelength band, resulting in a total of 86 water surface reflectance values. For the SVR models, radial basis function (RBF) was implemented and optimized as the kernel function. Without SAE, the ANN model had a 7-6-1 node configuration to estimate the water surface reflectance. Furthermore, the SVR model was performed for AC by adopting RBF.  (Figure 4c) for training and validation, respectively. Figure 5 shows the comparison observed reflectance spectra, with estimated spectra from SAE-ANN. The training and validation results had good agreement with in-situ reflectance spectra. In particular, the estimated spectra from 600 nm to 700 nm was able to describe the PC peaks (i.e., 615 nm -622 nm) and the Chl-a peaks (i.e., 660 nm −670 nm).

Cyanobacteria Estimation of SAE
The estimated water reflectance was then used as input of the SAE #2. Seven layers, with 86-60-40-20-40-60-86 node configurations, were adopted. The 86 nodes in the input layer represent

Cyanobacteria Estimation of SAE
The estimated water reflectance was then used as input of the SAE #2. Seven layers, with 86-60-40-20-40-60-86 node configurations, were adopted. The 86 nodes in the input layer represent the estimated reflectance of the 86 bands. Then, the concentrated feature layer of ANN (middle layer with 20 nodes) was used as input for the second ANN and SVR, that estimated the cyanobacterial concentration. The consecutive ANN model for the cyanobacterial estimation had 20-10-5-2 node configuration, that yielded the PC and Chl-a concentrations. Among the applied activation functions for the ANN model, the sigmoid function was adopted, by showing relatively accurate model performance compared to the other activation functions. Then, the learning rate of 0.0001 was set. For the SVR models, RBF was utilized as the kernel function. The reconstruction of the SAE input showed an RMSE value of 5.4 × 10 −7 sr −1 . Moreover, the ANN model without SAE was performed by having an 86-2 node configuration for estimates of PC and Chl-a concentrations. The ANN models adopted a sigmoid function as the activation function, with a learning rate of 0.0001. The SVR model without SAE was conducted, to estimate cyanobacteria with RBF. Figure 6 shows the results of PC estimation. SAE-ANN showed a satisfactory performance with The SAE-ANN model was able to capture the temporal variation of the cyanobacteria, in terms of the PC and Chl-a concentrations. A relatively low concentration was observed in autumn compared to summer. However, the Chl-a maintained a concentration level > 10 mg m -3 in autumn (Figure 9f-h). Meanwhile, the spatial dynamic of the cyanobacteria peaked in summer, which eventually lessens in autumn (Figure 9).

Model Comparison
The ANN and SVR models estimated the water surface reflectance and cyanobacteria concentration, without the feature extraction. The evaluation results of both models for surface   The trained SAE-ANN and -SVR models were applied to generate the PC and Chl-a maps shown in Figure 8. In the figure, the PC and Chl-a concentration levels of SAE-SVR were lower than those of SAE-ANN, due to the tendency of SVR to underestimate PC and Chl-a. Regardless, both models were still able to generate spatial distribution maps, indicating that SAE has the capacity to represent the nonlinear spatial feature of the cyanobacteria by comparing RGB images. ( Figure S1).
The SAE-ANN model was able to capture the temporal variation of the cyanobacteria, in terms of the PC and Chl-a concentrations. A relatively low concentration was observed in autumn compared to summer. However, the Chl-a maintained a concentration level > 10 mg m -3 in autumn (Figure 8f-h). Meanwhile, the spatial dynamic of the cyanobacteria peaked in summer, which eventually lessens in autumn (Figure 8).

Data-Driven Model Comparison
For AC, SAE-ANN and -SVR models were not comparable. In addition, SAE-ANN showed more accurate AC than the conventional commercial software, even when the training dataset was limited. In this regard, the data-driven model could be used as an alternative to the physical based-model for accurate AC results, when the data of the atmosphere library of the commercial software could not reflect real atmospheric conditions. SAE-ANN showed higher pigment estimation accuracy than the SAE-SVR model. Similar results were also found in a previous study. The ANN and SVR results without SAE were due to the

Model Comparison
The ANN and SVR models estimated the water surface reflectance and cyanobacteria concentration, without the feature extraction. The evaluation results of both models for surface reflectance are presented in Figure 9a,b. Compared to SAE-ANN and SAE-SVR, the ANN and SVR models showed higher MAE values, > 0.75 for training and > 0.58 for validation (Table 2). This study also ran the conventional model MODTRAN 6 from [51] for AC, to compare the results from the data-driven models. The accuracy of MODTRAN 6 based-AC showed an R 2 of 0.69 and RMSE of 0.0021 sr -1 (Figure 9a,b). Among the models, SAE-ANN showed the best AC performance in terms of training and validation.

AC and Cyanobacteria Estimation
The NSE values of SAE-ANN and SAE-SVR were over 0.70, for both training and validation for AC ( Table 2), implying that the feature extraction and dimensionality reduction of SAE resulted in accurate performance. Moreover, [52] and [53] mentioned that precise AC was necessary to achieve reliable cyanobacteria estimation. Additionally, [43] suggested that the AC with high accuracy has an influence on the accuracy of the bio-optical algorithm for PC and the reliability of the PC map. However, a few outliers were observed, which resulted from an abnormal reflectance peak beyond 700 nm (Figures 4-5). The outlier peaks were caused by high phytoplankton scattering from high In Figure 9c,d, the ANN model showed an PC estimation with R 2 > 0.78, for both training and validation, while the SVR model showed a lower validation performance. Although the coupling of SAE and SVR improved the accuracy of the SVR model, SAE-SVR still needs further development. Likewise, though the ANN model showed MAE value > 2.54, it was still higher than that shown by SAE-ANN. The SVR model showed Chl-a results by yielding R 2 values > 0.74 for training and validation (Figure 9e,f); however, SAE-SVR showed a better performance than SVR, by having higher R 2 and NSE values. SAE-ANN also significantly improve the accuracy of ANN results, and showed the best performance, as well as the lowest MAE < 0.22 for Chl-a estimation among the four models ( Table 2). In addition, the SAE-ANN showed a relatively better performance for cyanobacterial estimation, compared to the conventional bio-optical algorithms, the two-band ratio and the inherent optical property (IOP) algorithms [14]. The accuracy of the two-band ratio algorithm for the PC estimation showed an R 2 of 0.76 and an RMSE of 10.56 mg m -3 , while the IOP algorithm yielded 0.82 for R 2 and 25.83 mg m -3 for RMSE (Figure 9c,d). For the Chl-a estimation, the conventional algorithms showed relatively low R 2 and high RMSE values, having 0.29 and 13.62 mg m -3 for the two-band ratio and 0.34 and 13.45 mg m -3 for the IOP algorithm (Figure 9e,f), respectively.

AC and Cyanobacteria Estimation
The NSE values of SAE-ANN and SAE-SVR were over 0.70, for both training and validation for AC ( Table 2), implying that the feature extraction and dimensionality reduction of SAE resulted in accurate performance. Moreover, [52] and [53] mentioned that precise AC was necessary to achieve reliable cyanobacteria estimation. Additionally, [43] suggested that the AC with high accuracy has an influence on the accuracy of the bio-optical algorithm for PC and the reliability of the PC map. However, a few outliers were observed, which resulted from an abnormal reflectance peak beyond 700 nm (Figures 4  and 5). The outlier peaks were caused by high phytoplankton scattering from high algae presence on August 12 in 2016. SAE-ANN and -SVR models underestimated the peaks, because the models may be difficult to learn the specific abnormal features of high phytoplankton scattering.
SAE-ANN has proven to be acceptable for estimating cyanobacteria, compared to previous studies that applied the conventional bio-optical algorithms. The R 2 values of the conventional bio-optical algorithms for cyanobacteria estimations are as follows: 0.76 [54]; 0.71 [55]; 0.77 [14]; 0.55 [56]; and 0.65 [57]. When the PC concentration is greater than 10 mg m -3 , most model performances showed a good agreement with the observed PC, while inaccurate PC estimations can be observed for low PC concentrations of less than 10 mg m -3 . In particular, the A-D region in Figure 6a-d enclosed in broken circles indicates the region with a discrepancy between the estimated pigments and the observed ones. This could be caused by the relatively weak relationship between the corrected reflectance and low PC concentration. In addition, by comparing the disagreement levels, SAE-SVR and SVR models had higher uncertainties, compared to the SAE-ANN and ANN models (Figure 6c,d). The corrected reflectance error at each band may result in the incorrect feature extraction of low PC concentrations in the models (Figure 5), since the reflectance spectra is affected by the pigment concentration [52,58]. On the other hand, the SAE-SVR showed an underestimation of high PC concentrations greater than 40 mg m −3 . This can be attributed to the occurrence of scum during an intense cyanobacterial bloom, leading to the reduced accuracy of cyanobacterial estimation. Overall, the feature extraction with dimensionality reduction of SAE was able to estimate both PC and Chl-a. The encoding layer showed a well-defined temporal variation within the observed range of PC and Chl-a.
A high cyanobacteria concentration was mainly observed near the Baekje Weir region, due to the high flow velocity caused by the hydraulic gate operation (i.e., hydraulic power plant), which gathered the cyanobacteria from the upstream to the back of the Weir [33] (Figure 8a,e). After the gate operation, the cyanobacteria temporarily disappeared in front of the Weir, by the flushing and dilution effect [5]. The gate operation released a substantial amount of water, which generates water turbulence, thereby increasing the turbidity. This occurrence resulted in the decrease of light availability, which eventually hindered cyanobacterial growth [59]. The turbulent flow also physically inhibited cyanobacterial growth by damaging the phytoplankton cells [60]. On the other hand, a high concentration of cyanobacteria can be observed at the river side, which was mainly caused by a longer residence time. The cyanobacteria favor low flow velocity for blooming, since the temperature stratification zone and colonial formation are easily developed without flow suppression. Moreover, [61] suggested that a critical flow velocity < 0.06 m s −1 would be proper condition for cyanobacterial growth. Likewise, other previous studies found that flow velocity and cyanobacteria concentration have a negative relationship [60]. The decrease in cyanobacteria was driven by the unsuitable growth conditions, primarily due to the decreasing temperature and low light intensity [62,63]. Furthermore, [64] also proved that the main control factors of cyanobacteria growth were temperature and light availability, with 15-year MODIS imagery and the temperature dataset.

Data-Driven Model Comparison
For AC, SAE-ANN and -SVR models were not comparable. In addition, SAE-ANN showed more accurate AC than the conventional commercial software, even when the training dataset was limited. In this regard, the data-driven model could be used as an alternative to the physical based-model for accurate AC results, when the data of the atmosphere library of the commercial software could not reflect real atmospheric conditions. SAE-ANN showed higher pigment estimation accuracy than the SAE-SVR model. Similar results were also found in a previous study. The ANN and SVR results without SAE were due to the limitation of the conventional model in reflecting the temporal variability of the optically complexed inland water [28]. Notably, [19] demonstrated that the stacked denoising autoencoder coupled with ANN fine-tuning showed the highest accuracy, compared to conventional contrast models in predicting water quality parameters of the biofilm system. In their study, the encoding layers were able to produce a high-level feature representation of the input imagery, which made the coupling of the models more efficient [65]. The SAE confronted the original input feature into smaller data and reconstructed the reduced data to original data in the training process [45]. The internal parameters of the SAE were updated to retrieve minimum error by comparing them to the output data. After training the SAE, the similarity between original input and reconstructed input implied that the trained parameters ensure the internal features in each layer that can represent the original input features. Accordingly, the input data used for AC and estimation of cyanobacteria was present in the middle of the SAE network. Thus, this confronted layer resulted in reduced data complexity and improved data abstraction, thereby contributing higher regression accuracy than conventional machine learning regression without SAE.
Previous studies showed that ANN has a better regression performance than SVR [66,67]. The performance difference between ANN and SVR models depends on the data. The model performance cannot be generalized, due to inconsistencies in the data behavior [68]. When coupled with the SAE network, the high dimensionality of the input data is compressed to a relatively low dimension with abstracted feature representation. The ANN model might reflect the PC and Chl-a features at low concentrations, with multiple nodes and layers, better than the SVR model. Moreover, [69] discussed the local underestimation of SVR, in which the kernel location was supposed to be the center of the epsilon-tube, but the SVR only allowed a small number of estimated values to fall below the observed values.

Deep Neural Network for Remote Sensing Application
In several previous studies, AC [21,31,70] and cyanobacterial estimations [33,71,72] have been performed using conventional machine learning models. However, a deep neural network yields a relatively high accuracy compared to the conventional models, owing to the utilization of high-level feature learning from the data [73,74]. Although deep neural networks with large datasets require high-end infrastructure facilities, such as a graphical processing unit (GPU), and a long model training time, the testing time for the trained model can be quite less. This aspect was identified by determining the training time for AC to be 1045.96 s and 2.78 s, respectively, and those for pigment estimation to be 508.54 s and 3.28 s, respectively. In addition, the SAE-ANN model improved the accuracy of surface reflectance estimations by 23% and that of pigment estimations by 26%, compared to conventional ANN models. This is because SAE provides higher level features for the robust representation of the temporal surface reflectance and pigment variations. However, it is difficult to accurately identify the function of neurons and their layers in the network architecture to be modeled [75].
As a deep neural network is suitable for complex image processing, it has been implemented for comprehensive remote sensing application (i.e., AC and cyanobacteria estimation) in this study. When SAE is coupled with ANN, a high estimation accuracy of water surface reflectance and cyanobacteria concentrations is possible. During the training process, the encoding layers learned the abstract features of the input data by reducing their dimension. For AC training, the SAE extracted the optical features (i.e., digital numbers) and atmospheric features (i.e., total flux, diffuse transmittance, direct transmittance, spherical albedo, path radiance), by reconstructing the original input data. The optical and atmospheric features were utilized to estimate water surface reflectance in the consecutive ANN model. During this process, the digital numbers with atmospheric effect were directly transformed into surface reflectance that rarely possessed the effects. For pigment estimation, the estimated reflectance features were concentrated by the SAE, to estimate Chl-a and PC concentrations. This process also provided an efficient representation of the spatial distribution of the pigments during different periods. In short, the data-driven models provided implicit methods that only considered the relationship between the remote sensing input and target, without any complex formulations and parameterization of AC and bio-optical algorithms.
For study areas that have input data ranges similar to this study, the trained model can provide robust performance, whereas, for study areas that have different data ranges, the model can be used as a pre-trained one that requires additional model tuning without initiating end-to-end model configurations. As many researches have utilized a pre-trained model for their studies [76][77][78], the application of such a model is the primary benefit for a data-driven model to rapidly achieve reasonable outcomes. In addition, future research using deep learning can be conducted, by referring to the structures and internal parameters of this study for regression tasks using remote sensing data. Thus, we conclusively demonstrated the potential of a deep learning network in providing reliable and comprehensive remote sensing applications.

Conclusions
This study utilized the deep neural network in implementing AC and cyanobacteria estimation using hyperspectral images. To accomplish this, field and airborne monitoring, water sample collection, and optical measurement of the water were implemented. After which, phytoplankton pigments were analyzed (i.e., PC and Chl-a). To perform AC and estimate cyanobacteria, we developed the SAE-ANN and SAE-SVR models. The input data for AC consists of AC parameters driven by MODTRAN 6, digital numbers from hyperspectral imagery, and the number of sampling events. The input parameters were fed into the first SAE-ANN and -SVR models to produce the estimated surface reflectance, which was consequently assigned as input for the second SAE-ANN and -SVR, to estimate PC and Chl-a concentrations. The ANN, SVR, SAE-ANN, and SAE-SVR models were evaluated by R 2 , RMSE, NSE, and MAE. The major findings of this study are the following: 1.
SAE-ANN and -SVR models for AC showed good agreement with the observed reflectance spectra (i.e., NSE > 0.7); the SAE-ANN model estimated the cyanobacteria concentrations with the highest accuracy.

2.
The encoding layers of the SAE-ANN and -SVR models were able to contribute to the generation of cyanobacterial distribution maps, that represented actual cyanobacterial distribution, by reflecting the varied spatial and spectral features of the input data.

3.
The SAE-ANN and -SVR models showed an improved accuracy of 23% and 6% for surface reflectance, and 26% and 9% for cyanobacteria estimation, respectively, due to the high-level feature extraction of SAE, compared to the single model performances of ANN and SVR.
This study demonstrated an integrative implementation of AC and cyanobacteria estimation with high accuracy, by developing deep neural networks. Thus, we hope that this study will provide the preceding information to a comprehensive remote sensing application for cyanobacteria management to future researches.