Inland Lakes Mapping for Monitoring Water Quality Using a Detail/Smoothing-Balanced Conditional Random Field Based on Landsat-8/Levels Data

The sustainable development of water resources is always emphasized in China, and a set of perfect standards for the division of inland water environment quality have been established to monitor water quality. However, most of the 24 indicators that determine the water quality level in the standards are non-optically active parameters. The weak optical characteristics make it difficult to find significant correlations between the single parameters and the remote sensing imagery. In addition, traditional on-site testing methods have been unable to meet the increasingly extensive water-quality monitoring requirements. Based on the above questions, it’s meaningful that the supervised classification process of a detail-preserving smoothing classifier based on conditional random field (CRF) and Landsat-8 data was proposed in the two study areas around Wuhan and Huangshi in Hubei Province. The random forest classifier was selected to model the association potential of the CRF. The results (the first study area: OA = 89.50%, Kappa = 0.841; the second study area: OA = 90.35%, Kappa = 0.868) showed that the water-quality monitoring based on CRF model is feasible, and this approach can provide a reference for water-quality mapping of inland lakes. In the future, it may only require a small amount of on-site sampling to achieve the identification of the water quality levels of inland lakes across a large area of China.


Introduction
Water is at the heart of sustainable development, and water resources play a vital role in meeting human productivity needs, economic development, environmental protection, and ecosystem services [1]. Globally, due to the geographical complexity of water supply and use, it is now difficult to assess whether sufficient freshwater resources are available for future needs [2]. However, what is certain is that a large proportion of the world's population are currently suffering from "water stress" [3]. This stress is not only due to the shrinkage of freshwater resources, but it is also closely

Satellite Data and Vector Data
NASA successfully launched the Landsat 8 satellite on 11 February 2013, with the OLI sensor onboard, which collects data from nine spectral bands. Apart from the 8th panchromatic band (15 m), the remaining bands have a spatial resolution of 30 m. The first seven commonly used bands were selected for the experimentation, i.e., coastal, blue, green, red, NIR, SWIR1, and SWIR2 [36,37]. The Landsat 8 OLI images were obtained from the Geospatial Data Cloud site (http://www.gscloud.cn/search, 22 February 2020).
The data identifier of the image selected for the first study area of Wuhan is LC81230392018098LGN00. The image was acquired on April 8, 2018, when the amount of cloud was 8.94% and the lakes were not covered by cloud. The data identifier of the image selected for the second study area of Huangshi is LC81220392017120LGN00. This image was acquired on April 30, 2017, with 0.06% cloud cover. The lake vector maps within the two research areas were also obtained from the Geospatial Data Cloud site, for the water extraction of the Landsat 8 images.

Surface Water Environment Quality Levels
The water quality data of the first study area (Wuhan) from May 2018 were obtained from the Wuhan Ecology and Environment Bureau (http://hbj.wuhan.gov.cn/hbHjjc/index.jhtml, 3 January 2020). The water quality data of the second study area (Huangshi) were released by the provincial environmental monitoring center station of the Department of Ecology and Environment of Hubei

Satellite Data and Vector Data
NASA successfully launched the Landsat 8 satellite on 11 February 2013, with the OLI sensor onboard, which collects data from nine spectral bands. Apart from the 8th panchromatic band (15 m), the remaining bands have a spatial resolution of 30 m. The first seven commonly used bands were selected for the experimentation, i.e., coastal, blue, green, red, NIR, SWIR1, and SWIR2 [36,37]. The Landsat 8 OLI images were obtained from the Geospatial Data Cloud site (http://www.gscloud.cn/search, 22 February 2020).
The data identifier of the image selected for the first study area of Wuhan is LC81230392018098LGN00. The image was acquired on April 8, 2018, when the amount of cloud was 8.94% and the lakes were not covered by cloud. The data identifier of the image selected for the second study area of Huangshi is LC81220392017120LGN00. This image was acquired on 30 April 2017, with 0.06% cloud cover. The lake vector maps within the two research areas were also obtained from the Geospatial Data Cloud site, for the water extraction of the Landsat 8 images.

Surface Water Environment Quality Levels
The water quality data of the first study area (Wuhan) from May 2018 were obtained from the Wuhan Ecology and Environment Bureau (http://hbj.wuhan.gov.cn/hbHjjc/index.jhtml, 3 January 2020). The water quality data of the second study area (Huangshi) were released by the provincial environmental Sensors 2020, 20, 1345 5 of 18 monitoring center station of the Department of Ecology and Environment of Hubei province in 2017 (http://sthjt.hubei.gov.cn, 22 February 2020). The water quality data is the result of in-site sampling in a certain period. In order to ensure the validity of the experimental data, the acquisition time of the selected Landsat 8 OLI images should be similar to the collection time of the water quality data. Several representative lakes in the first and second study areas were selected in Tables 1 and 2 respectively. The parameters of over-standard in the assessment of the water quality were pointed out in the table, including TP, COD, biochemical oxygen demand (BOD), Permanganate index (COD Mn ) and NH3-N. Surface water is divided into five level (Table 3) according to the function. The standard values for the respective functional class are executed. The Class I water belongs to the source water, which belongs to the national natural protection zones. The water quality in the two table represents the current level of water quality assessed for the lakes. The value represents the superstandard multiple, and the datum line correspond to the function level of the lakes.  The pie charts in Figure 2 show the percentage of lakes in each water quality level, for both study areas. In Wuhan and its surrounding areas, we recorded the water quality levels of 64 lakes, as shown in Figure 2a. In this area, there are no Class I lakes (the highest water quality). The Class II lakes number only one, namely, Niushan Lake. The Class III lakes account for 17%, and the Class IV, Class V, and Class VI lakes account for 34%, 28%, and 19% respectively. Therefore, only 19% of the lakes meet or exceed the Class III water quality level, which is the standard for centralized drinking water. In the second study area of Huangshi, along the lower reaches of the Yangtze River, 49 lakes were selected on both sides of the Yangtze River for the statistical analysis, as shown in Figure 2b. In this area, there are no Class I or Class II waters. Class III lakes account for about 16%, Class IV and Class V lakes both account for about 22%, and the Class VI lakes account for about 39%. Therefore, only 16% of the lakes are equal to or better than the Class III water quality level.

The Improved Conditional Random Field (CRF) Model and Other Models
The CRF is a kind of probability model, which has been widely used in image segmentation, stereoscopic vision and activity analysis because of its ability to combine spatial information [29]. In this paper, a method of water-quality classification based on the detail-preserving smoothing CRF was proposed, which used the probability of each class obtained by the RF classifier to define as the unary potential of the CRF, and defined the linear combination of the spatial smoothing term and the local class label cost term as the pairwise potential, so as to achieve the classification effect of combining spatial contextual information and retaining detailed information at the same time.
The CRF model have been developed with a unified probability framework to simulate local neighborhood interactions between random variables, where the posterior probability is expressed as a Gibbs distribution directly [38]: where y is the observation data of the input image, that is, the pixel-by-pixel spectral vector; x represents the class labels; Z(y) is the partition function; ( , ) cc xy  is the potential function, which models the spatial interaction of random variables locally based on the neighborhood system and clique c in the image; and C represents a fully connected subgraph. In this paper, 8-neighborhood model was applied in pairwise CRF framework.
Assuming an observation filed 12 { , , , } N y y y y  , which N is the total number of pixels, and a labeling field According to the posterior distribution of the label x, given the observation y, the corresponding Gibbs energy is shown in Equation (2): In order to find the label image x which maximizes the posterior probability ( | ) P x y , based on the Bayesian maximum posterior rule (MAP), the MAP label MAP X of the random field is given: When the posterior probability ( | ) P x y is maximum, the energy function ( | ) E x y is minimum. In Equation (3), , 1  is the unary potential function, which represents the segmentation result under the premise of independent consideration of each pixel; 2  is the pairwise potential function, which represents the influence of the relationship

The Improved Conditional Random Field (CRF) Model and Other Models
The CRF is a kind of probability model, which has been widely used in image segmentation, stereoscopic vision and activity analysis because of its ability to combine spatial information [29]. In this paper, a method of water-quality classification based on the detail-preserving smoothing CRF was proposed, which used the probability of each class obtained by the RF classifier to define as the unary potential of the CRF, and defined the linear combination of the spatial smoothing term and the local class label cost term as the pairwise potential, so as to achieve the classification effect of combining spatial contextual information and retaining detailed information at the same time.
The CRF model have been developed with a unified probability framework to simulate local neighborhood interactions between random variables, where the posterior probability is expressed as a Gibbs distribution directly [38]: where y is the observation data of the input image, that is, the pixel-by-pixel spectral vector; x represents the class labels; Z(y) is the partition function; ψ c (x c , y) is the potential function, which models the spatial interaction of random variables locally based on the neighborhood system and clique c in the image; and C represents a fully connected subgraph. In this paper, 8-neighborhood model was applied in pairwise CRF framework.
Assuming an observation filed y = y 1 , y 2 , · · · , y N , which N is the total number of pixels, and a labeling field x = {x 1 , x 2 , · · · , x N }. According to the posterior distribution of the label x, given the observation y, the corresponding Gibbs energy is shown in Equation (2): In order to find the label image x which maximizes the posterior probability P(x|y), based on the Bayesian maximum posterior rule (MAP), the MAP label X MAP of the random field is given: Sensors 2020, 20, 1345 7 of 18 When the posterior probability P(x|y) is maximum, the energy function E(x|y) is minimum.
, φ 1 is the unary potential function, which represents the segmentation result under the premise of independent consideration of each pixel; φ 2 is the pairwise potential function, which represents the influence of the relationship between pixels on segmentation. The nonnegative λ is the tuning parameter that represents the proportion of the pairwise potential. The larger λ, the more obvious the smoothing effect.
The unary potential function models the relationship between class label and pixel spectral data. The probability estimation of each pixel is calculated by discriminant classifier, and the feature vectors are given. It plays a leading role in the process of classification, and is generally the posterior probability of a supervised classifier. The unary potential function is defined as: where f i (y) represents the feature vector at the position i, which comes from the spectral dimension mapping of a pixel in an image. P[x i = l k | f i (y)] is the probability of class label l k taken by the pixel i based on the feature vector. Because the RF algorithm is stable, and the classification effect is good without parameter adjustment, the RF classifier was selected as the unary potential. Based on the probability distribution results of the unary potential, the pairwise potential function models the label class relationship of the pixels in the neighborhood. The similarity between pairs of pixels is measured by the local features of the image, which affects the label class between pixels in the neighborhood, and reflects the interaction of points. In order to minimize the Gibbs energy of the corresponding model, if the feature difference between pixels is large, the pairwise potential function value should be small, that is, the labeling results should be accepted; If the feature difference between pixels is small, the pairwise potential function value should be large, and the labeling results should be modify by the model. The expression of the pairwise potential function is: where g ij (y) represents a smooth term related to data y, g ij (y) = dist(i, j) −1 exp −β y i − y j 2 . dist(i, j) is the Euclidean distance, and y is the spectral vector. Θ L x i , x j |y represents the cost between labels x i and x j in the neighborhood. The parameter θ is applied to control the degree of the label cost term in pairwise potential function. The range of parameter θ is usually [0-4]. The local class label cost term Θ L x i , x j |y is defined as: where P[x i | f i (y)] is the label probability given by the RF classifier; f i (y) represents the spectral feature vector at the position i; and x i is the class label. Θ L x i , x j |y will affect the label estimation of the current pixel according to the probability distribution of adjacent pixels, so the model can smooth the classification results when considering the spatial contextual information.
As mentioned earlier, the local class label cost term is expressed as a probability estimate of the spatial distribution of category labels. Thus, the final classification accuracy depends on the accuracy of the probability estimate, which is obtained from majority voting of the original RF classification map. In order to effectively remove the salt-and-pepper classification noise, the label property of adjacent cells should be taken into account. Therefore, the maximum of all the class labels for each pixel will be the probability estimate of the segmentation result.
In summary, aiming at the classification of water quality in China's water quality assessment system, a supervised classification method CRF combined spectral information with spatial contextual information was proposed in this paper. It takes the probability distribution of the RF classifier as the Sensors 2020, 20, 1345 8 of 18 unary potential, and defines the linear combination of spatial smooth term and local label cost term as pairwise potential. The model can predict the label class of current pixel with reference to the water quality level of adjacent pixel. In addition, three pixel-based classifiers were added to the experiment.
Three other models were discussed in this paper, namely the based-pixel RF, DT and DNN. For the image classification problem, the RF is not the best-performing algorithm. However, due to its simplicity, ease of implementation, strong generalization ability, and good performance on many datasets, it has been widely used in academic research and industrial applications [39,40]. The RF is an algorithm that integrates multiple trees through the idea of ensemble learning. Its basic unit is the decision tree, so N trees have N classification results. RF integrates N voting results, and the class with the most votes is the final output. Recently, although the DT model is no longer a mainstream classifier for use alone, it is widely used as a base learner in more complex algorithms, because of its fast speed, high accuracy, and ease of understanding [41,42]. DT classification represents the process of classifying instances based on features. Based on the if-then rule, its classification speed is fast and it is a commonly used classifier. Since the number of samples is not uniform, the DT model automatically adjusts the weight based on the number of samples. LeCun et al. [43] published an article on deep learning in Nature in 2015, expressing the importance of the model to human society. The DNN is a pixel-based supervised learning model and the basis of other deep learning models. The ability of the neural networks to express models is dependent on the optimization algorithms. The optimizer selection will be described further later. The training process of a DNN consists of two parts: the forward propagation of the signal and the reverse propagation of the error. The back-propagation algorithm can optimize the weight and bias of the neural network according to the defined loss function, so that the loss value of the model reaches a smaller value. In this study, the algorithms were implemented in Python and TensorFlow.

Evaluation Indicators
The overall accuracy (OA) and Kappa coefficients (Kappa) were applied to evaluate the results of the model predictions in the experiments, and were calculated based on a confusion matrix. The OA represents the percentage of all the samples with correct predictions divided by all the samples which take part in the classification. That is, the number of oblique diagonal samples of the confusion matrix divided by the total number of samples. Because there was no significant priority between the different water quality levels in these experiments, and the classification result of a certain level has no decisive influence on the evaluation of the model, the OA can express the classification accuracy most intuitively, so it was applied in these experiments. Kappa is used for consistency testing and is often used for multi-classification problems [44,45]. The formula for Kappa is: where p oa represents the overall classification accuracy OA, a i is the number of ground-truth samples of the i-th class, b i is the number of samples predicted by the i-th class, and n represents the total number of samples. Kappa takes a range of [0, 1] in practical applications. The higher the coefficient value, the higher the accuracy of the model prediction results.
The ultimate purpose of this study is to use only a few samples to predict the water quality levels of the lakes. Therefore, it is necessary to count the number of each class label within the vector range of each lake. The classification result of water quality is determined by the maximum percentage of the labels. When the lake is classified as a unit, the results are still evaluated by OA.

Data Description
According to the vector maps, most of the lakes were divided into water quality levels, except for some lakes where the range of the water quality levels was uncertain. The experimental datasets were subjected to water vector masking and water extraction. The statistical results are shown in Table 4. The colors in Table 4 correspond to the water quality level distributions in Figure 3, and the last column is the total number of samples in the datasets of the two study areas. It can be observed that the number of each label class is unevenly distributed. In the second study area, the Class VI lakes are mainly concentrated in densely populated cities on both sides of the Yangtze River. These lakes are small and their circulation is poor, so they are easily polluted. In the second study area, the numbers of samples for each level were relatively uniform (between 18% and 34%). In the experiment, the datasets were produced as the form [features, samples], where the features refer to the seven bands of the images, and the sample represents a single pixel. Because the number of image bands is small, we used all bands for model training without too much information redundancy. In addition, because RF, DT, and DNN, including CRF's unary potential, are all based on pixels, the data sets produced are also based on pixels. The training set: test set = 1:9, where the training set was applied to learn the parameters of the model, and the test set was used to test the classification effect of the model. that the number of each label class is unevenly distributed. In the second study area, the Class VI lakes are mainly concentrated in densely populated cities on both sides of the Yangtze River. These lakes are small and their circulation is poor, so they are easily polluted. In the second study area, the numbers of samples for each level were relatively uniform (between 18% and 34%). In the experiment, the datasets were produced as the form [features, samples], where the features refer to the seven bands of the images, and the sample represents a single pixel. Because the number of image bands is small, we used all bands for model training without too much information redundancy. In addition, because RF, DT, and DNN, including CRF's unary potential, are all based on pixels, the data sets produced are also based on pixels. The training set: test set = 1:9, where the training set was applied to learn the parameters of the model, and the test set was used to test the classification effect of the model. The image sizes for the first and second study areas were 2873 × 3037 pixels and 4121 × 4784 pixels, respectively. Figures 3a,c show the false-color images and the water vector data (blue) from the Geospatial Data Cloud. Figures 3b,d are the distribution maps of the lakes after water extraction, that is, corresponding ground-truth maps. In order to ensure the effectiveness of the extracted lake range, a masking process was performed using the water body vector (blue) before water extraction. Then the ground-truth maps were used as a mask file to extract the range of the study area on the Landsat images. Finally, we applied the masked images to generate the above mentioned datasets through MATLAB. that the number of each label class is unevenly distributed. In the second study area, the Class VI lakes are mainly concentrated in densely populated cities on both sides of the Yangtze River. These lakes are small and their circulation is poor, so they are easily polluted. In the second study area, the numbers of samples for each level were relatively uniform (between 18% and 34%). In the experiment, the datasets were produced as the form [features, samples], where the features refer to the seven bands of the images, and the sample represents a single pixel. Because the number of image bands is small, we used all bands for model training without too much information redundancy. In addition, because RF, DT, and DNN, including CRF's unary potential, are all based on pixels, the data sets produced are also based on pixels. The training set: test set = 1:9, where the training set was applied to learn the parameters of the model, and the test set was used to test the classification effect of the model. The image sizes for the first and second study areas were 2873 × 3037 pixels and 4121 × 4784 pixels, respectively. Figures 3a,c show the false-color images and the water vector data (blue) from the Geospatial Data Cloud. Figures 3b,d are the distribution maps of the lakes after water extraction, that is, corresponding ground-truth maps. In order to ensure the effectiveness of the extracted lake range, a masking process was performed using the water body vector (blue) before water extraction. Then the ground-truth maps were used as a mask file to extract the range of the study area on the Landsat images. Finally, we applied the masked images to generate the above mentioned datasets through MATLAB. that the number of each label class is unevenly distributed. In the second study area, the Class VI lakes are mainly concentrated in densely populated cities on both sides of the Yangtze River. These lakes are small and their circulation is poor, so they are easily polluted. In the second study area, the numbers of samples for each level were relatively uniform (between 18% and 34%). In the experiment, the datasets were produced as the form [features, samples], where the features refer to the seven bands of the images, and the sample represents a single pixel. Because the number of image bands is small, we used all bands for model training without too much information redundancy. In addition, because RF, DT, and DNN, including CRF's unary potential, are all based on pixels, the data sets produced are also based on pixels. The training set: test set = 1:9, where the training set was applied to learn the parameters of the model, and the test set was used to test the classification effect of the model. The image sizes for the first and second study areas were 2873 × 3037 pixels and 4121 × 4784 pixels, respectively. Figures 3a,c show the false-color images and the water vector data (blue) from the Geospatial Data Cloud. Figures 3b,d are the distribution maps of the lakes after water extraction, that is, corresponding ground-truth maps. In order to ensure the effectiveness of the extracted lake range, a masking process was performed using the water body vector (blue) before water extraction. Then the ground-truth maps were used as a mask file to extract the range of the study area on the Landsat images. Finally, we applied the masked images to generate the above mentioned datasets through MATLAB. that the number of each label class is unevenly distributed. In the second study area, the Class VI lakes are mainly concentrated in densely populated cities on both sides of the Yangtze River. These lakes are small and their circulation is poor, so they are easily polluted. In the second study area, the numbers of samples for each level were relatively uniform (between 18% and 34%). In the experiment, the datasets were produced as the form [features, samples], where the features refer to the seven bands of the images, and the sample represents a single pixel. Because the number of image bands is small, we used all bands for model training without too much information redundancy. In addition, because RF, DT, and DNN, including CRF's unary potential, are all based on pixels, the data sets produced are also based on pixels. The training set: test set = 1:9, where the training set was applied to learn the parameters of the model, and the test set was used to test the classification effect of the model. The image sizes for the first and second study areas were 2873 × 3037 pixels and 4121 × 4784 pixels, respectively. Figures 3a,c show the false-color images and the water vector data (blue) from the Geospatial Data Cloud. Figures 3b,d are the distribution maps of the lakes after water extraction, that is, corresponding ground-truth maps. In order to ensure the effectiveness of the extracted lake range, a masking process was performed using the water body vector (blue) before water extraction. Then the ground-truth maps were used as a mask file to extract the range of the study area on the Landsat images. Finally, we applied the masked images to generate the above mentioned datasets through MATLAB. that the number of each label class is unevenly distributed. In the second study area, the Class VI lakes are mainly concentrated in densely populated cities on both sides of the Yangtze River. These lakes are small and their circulation is poor, so they are easily polluted. In the second study area, the numbers of samples for each level were relatively uniform (between 18% and 34%). In the experiment, the datasets were produced as the form [features, samples], where the features refer to the seven bands of the images, and the sample represents a single pixel. Because the number of image bands is small, we used all bands for model training without too much information redundancy. In addition, because RF, DT, and DNN, including CRF's unary potential, are all based on pixels, the data sets produced are also based on pixels. The training set: test set = 1:9, where the training set was applied to learn the parameters of the model, and the test set was used to test the classification effect of the model. The image sizes for the first and second study areas were 2873 × 3037 pixels and 4121 × 4784 pixels, respectively. Figures 3a,c show the false-color images and the water vector data (blue) from the Geospatial Data Cloud. Figures 3b,d are the distribution maps of the lakes after water extraction, that is, corresponding ground-truth maps. In order to ensure the effectiveness of the extracted lake range, a masking process was performed using the water body vector (blue) before water extraction. Then the ground-truth maps were used as a mask file to extract the range of the study area on the Landsat images. Finally, we applied the masked images to generate the above mentioned datasets through MATLAB. that the number of each label class is unevenly distributed. In the second study area, the Class VI lakes are mainly concentrated in densely populated cities on both sides of the Yangtze River. These lakes are small and their circulation is poor, so they are easily polluted. In the second study area, the numbers of samples for each level were relatively uniform (between 18% and 34%). In the experiment, the datasets were produced as the form [features, samples], where the features refer to the seven bands of the images, and the sample represents a single pixel. Because the number of image bands is small, we used all bands for model training without too much information redundancy. In addition, because RF, DT, and DNN, including CRF's unary potential, are all based on pixels, the data sets produced are also based on pixels. The training set: test set = 1:9, where the training set was applied to learn the parameters of the model, and the test set was used to test the classification effect of the model. The image sizes for the first and second study areas were 2873 × 3037 pixels and 4121 × 4784 pixels, respectively. Figures 3a,c show the false-color images and the water vector data (blue) from the Geospatial Data Cloud. Figures 3b,d are the distribution maps of the lakes after water extraction, that is, corresponding ground-truth maps. In order to ensure the effectiveness of the extracted lake range, a masking process was performed using the water body vector (blue) before water extraction. Then the ground-truth maps were used as a mask file to extract the range of the study area on the Landsat images. Finally, we applied the masked images to generate the above mentioned datasets through MATLAB. that the number of each label class is unevenly distributed. In the second study area, the Class VI lakes are mainly concentrated in densely populated cities on both sides of the Yangtze River. These lakes are small and their circulation is poor, so they are easily polluted. In the second study area, the numbers of samples for each level were relatively uniform (between 18% and 34%). In the experiment, the datasets were produced as the form [features, samples], where the features refer to the seven bands of the images, and the sample represents a single pixel. Because the number of image bands is small, we used all bands for model training without too much information redundancy. In addition, because RF, DT, and DNN, including CRF's unary potential, are all based on pixels, the data sets produced are also based on pixels. The training set: test set = 1:9, where the training set was applied to learn the parameters of the model, and the test set was used to test the classification effect of the model. The image sizes for the first and second study areas were 2873 × 3037 pixels and 4121 × 4784 pixels, respectively. Figures 3a,c show the false-color images and the water vector data (blue) from the Geospatial Data Cloud. Figures 3b,d are the distribution maps of the lakes after water extraction, that is, corresponding ground-truth maps. In order to ensure the effectiveness of the extracted lake range, a masking process was performed using the water body vector (blue) before water extraction. Then the ground-truth maps were used as a mask file to extract the range of the study area on the Landsat images. Finally, we applied the masked images to generate the above mentioned datasets through MATLAB. that the number of each label class is unevenly distributed. In the second study area, the Class VI lakes are mainly concentrated in densely populated cities on both sides of the Yangtze River. These lakes are small and their circulation is poor, so they are easily polluted. In the second study area, the numbers of samples for each level were relatively uniform (between 18% and 34%). In the experiment, the datasets were produced as the form [features, samples], where the features refer to the seven bands of the images, and the sample represents a single pixel. Because the number of image bands is small, we used all bands for model training without too much information redundancy. In addition, because RF, DT, and DNN, including CRF's unary potential, are all based on pixels, the data sets produced are also based on pixels. The training set: test set = 1:9, where the training set was applied to learn the parameters of the model, and the test set was used to test the classification effect of the model. The image sizes for the first and second study areas were 2873 × 3037 pixels and 4121 × 4784 pixels, respectively. Figures 3a,c show the false-color images and the water vector data (blue) from the Geospatial Data Cloud. Figures 3b,d are the distribution maps of the lakes after water extraction, that is, corresponding ground-truth maps. In order to ensure the effectiveness of the extracted lake range, a masking process was performed using the water body vector (blue) before water extraction. Then the ground-truth maps were used as a mask file to extract the range of the study area on the Landsat images. Finally, we applied the masked images to generate the above mentioned datasets through MATLAB. that the number of each label class is unevenly distributed. In the second study area, the Class VI lakes are mainly concentrated in densely populated cities on both sides of the Yangtze River. These lakes are small and their circulation is poor, so they are easily polluted. In the second study area, the numbers of samples for each level were relatively uniform (between 18% and 34%). In the experiment, the datasets were produced as the form [features, samples], where the features refer to the seven bands of the images, and the sample represents a single pixel. Because the number of image bands is small, we used all bands for model training without too much information redundancy. In addition, because RF, DT, and DNN, including CRF's unary potential, are all based on pixels, the data sets produced are also based on pixels. The training set: test set = 1:9, where the training set was applied to learn the parameters of the model, and the test set was used to test the classification effect of the model. The image sizes for the first and second study areas were 2873 × 3037 pixels and 4121 × 4784 pixels, respectively. Figures 3a,c show the false-color images and the water vector data (blue) from the Geospatial Data Cloud. Figures 3b,d are the distribution maps of the lakes after water extraction, that is, corresponding ground-truth maps. In order to ensure the effectiveness of the extracted lake range, a masking process was performed using the water body vector (blue) before water extraction. Then the ground-truth maps were used as a mask file to extract the range of the study area on the Landsat images. Finally, we applied the masked images to generate the above mentioned datasets through MATLAB. The image sizes for the first and second study areas were 2873 × 3037 pixels and 4121 × 4784 pixels, respectively. Figure 3a,c show the false-color images and the water vector data (blue) from the Geospatial Data Cloud. Figure 3b,d are the distribution maps of the lakes after water extraction, that is, corresponding ground-truth maps. In order to ensure the effectiveness of the extracted lake range, a masking process was performed using the water body vector (blue) before water extraction. Then the ground-truth maps were used as a mask file to extract the range of the study area on the Landsat images. Finally, we applied the masked images to generate the above mentioned datasets through MATLAB. that is, corresponding ground-truth maps. In order to ensure the effectiveness of the extracted lake range, a masking process was performed using the water body vector (blue) before water extraction. Then the ground-truth maps were used as a mask file to extract the range of the study area on the Landsat images. Finally, we applied the masked images to generate the above mentioned datasets through MATLAB.

Experiment 1: The Wuhan Dataset
The first experiment was conducted for the Wuhan study area and its surrounding water systems. To evaluate CRF performance, we compared its classification performance with that of common machine-learning models DT, DNN and RF. In total, 100 trees were set in the experiment for RF. The minimum number of samples required to split an internal node was 10. Since only the seven bands of the Landsat 8 OLI image were used, we did not need to set the maximum number of features for the DT and RF. For the optimizer of the DNN (Table 5), we considered stochastic gradient descent (SGD), momentum, and the adaptive optimizer of root mean square prop (RMSProp), but their effects were slightly worse than that of the adaptive moment (Adam) optimizer. The models trained by the Adam optimizer were stable and highly accurate. The neural network structure was manually adjusted, and the four layers of hidden layers were determined. The number of neurons in each layer was 2 8 . The predictive accuracy of the model trained by this structure was found to be the highest. The learning rate was set to 0.01. Slightly larger values caused the loss value to fluctuate significantly. The number of iterations was 2000. Table 5. The characteristic of the optimizers of DNN.

Optimizers
Characteristic SGD Parameter update speed is fast, but the vibration range is large Momentum Restrain vibrate, but poor adaptability RMSProp Solve the problem of sharp drop in learning rate and reduce manual adjustment of learning rate Adam Calculate the adaptive learning rate of each parameter with good adaptability

Experiment 1: The Wuhan Dataset
The first experiment was conducted for the Wuhan study area and its surrounding water systems. To evaluate CRF performance, we compared its classification performance with that of common machine-learning models DT, DNN and RF. In total, 100 trees were set in the experiment for RF. The minimum number of samples required to split an internal node was 10. Since only the seven bands of the Landsat 8 OLI image were used, we did not need to set the maximum number of features for the DT and RF. For the optimizer of the DNN (Table 5), we considered stochastic gradient descent (SGD), momentum, and the adaptive optimizer of root mean square prop (RMSProp), but their effects were slightly worse than that of the adaptive moment (Adam) optimizer. The models trained by the Adam optimizer were stable and highly accurate. The neural network structure was manually adjusted, and the four layers of hidden layers were determined. The number of neurons in each layer was 2 8 . The predictive accuracy of the model trained by this structure was found to be the highest. The learning rate was set to 0.01. Slightly larger values caused the loss value to fluctuate significantly. The number of iterations was 2000. The adjustable parameters in the CRF model include the weight of the pairwise potential λ. Through the experiments, the effect of the parameter λ on the experimental results is very obvious, which was determined to be 0.8 in the two study areas. In addition, the scope of the pairwise potential is 8 adjacent pixels. Since a segmentation method similar to object-oriented processing is used, which was mentioned above section, the salt and pepper phenomenon has been alleviated a lot. In Figure 4, (c1-c5) is the probability estimation of the pixel-oriented RF obtained from the unary potential, and (d1-d5) are the output maps of the segmented result of the pairwise potential. It is observed that the classification accuracy for the water-quality levels of a single lake is greatly improved.
Sensors 2020, 20, x FOR PEER REVIEW 11 of 18 (1) Class II Niushan Lake; (2) Class III East Lake; (3) Class IV Wu Lake; (4) Class V Tangxun Lake; (5) Class VI South Lake and Yezhi Lake. Table 6 lists the classification accuracies (OAs) of the different classifiers for each water quality level. For the Class VI water, the classification result of the RF-CRF classifier is much better than other classifiers. And whether it is OA (89.5%) or kappa (0.841), the accuracy of this classifier is also the highest. A representative lake was selected for each type of label in Figure 4, including Niushan Lake (Class II), East Lake (Class III), Wu Lake (Class IV), Tangxun Lake (Class V), South Lake and Yezhi Lake (Class VI). Since the DT, DNN, and RF models are classified on the pixels, a lot of salt-and- (1) Class II Niushan Lake; (2) Class III East Lake; (3) Class IV Wu Lake; (4) Class V Tangxun Lake; (5) Class VI South Lake and Yezhi Lake. Table 6 lists the classification accuracies (OAs) of the different classifiers for each water quality level. For the Class VI water, the classification result of the RF-CRF classifier is much better than other classifiers. And whether it is OA (89.5%) or kappa (0.841), the accuracy of this classifier is also the highest. A representative lake was selected for each type of label in Figure 4, including Niushan Lake (Class II), East Lake (Class III), Wu Lake (Class IV), Tangxun Lake (Class V), South Lake and Yezhi Lake (Class VI). Since the DT, DNN, and RF models are classified on the pixels, a lot of salt-and-pepper noise can be seen from the figure. Especially for South Lake, although the RF classification result is slightly worse than the CRF, the misclassification scene still stands out. However, because the CRF model has used spatial contextual information, and the label information of adjacent pixel can also affect the classification results of the current pixel, it can be seen that the smoothing effect of the classification maps of the lake is very obvious. This will be of great help to further judge the water quality level of the lakes based on the number of class labels. The spatial distribution of the water quality level based on the RF-CRF in the first study area is shown in Figure 5. According to the Table 6, the OA of the Class VI water is 95.34%. From the red area in the Figure 3, the Class VI lakes are mainly distributed on both sides of the Yangtze River and in the densely populated urban areas. The several Class VI lakes can be predicted accurately, and the water quality in the whole range of lakes is clearly presented. However, the classification of small lakes will still cause serious misjudgments. Due to the influence of adjacent pixels, there are more misclassifications in the classification of lake edge regions. This has a very large impact on the prediction results of the finely divided patches.
According to the classification result maps of the water quality levels, within the vector range of each lake, the number and percentage of the pixels of each water quality level were counted. The class with the largest proportion was then considered as the water quality level of the lake, as shown in Table 7. As shown in the table, excluding individual lakes whose original water vectors do not match the image, there are 64 lakes in the first study area for experiments. The bold figures are the number of lakes correctly predicted. There is only one lake in the Class II lakes, which was correctly predicted. There are 11 lakes with a water quality of Class III, which were predicted 81.8% correctly. In total, 21 Class IV lakes were accurately identified. In addition, one lake was classified as Class V. There are 20 lakes in Class V, except for two lakes that was not extracted, and the remaining 18 are correctly identified. And Class VI lakes were fully predictive and accurate. In summary, within the study area, the number of correctly predicted lakes in the lakes used for the experiment reached 61, and the accuracy rate was 95.31%. Most of the lakes that were judged to be the wrong class were relatively small.  According to the classification result maps of the water quality levels, within the vector range of each lake, the number and percentage of the pixels of each water quality level were counted. The class with the largest proportion was then considered as the water quality level of the lake, as shown in Table 7. As shown in the table, excluding individual lakes whose original water vectors do not match the image, there are 64 lakes in the first study area for experiments. The bold figures are the number of lakes correctly predicted. There is only one lake in the Class II lakes, which was correctly predicted. There are 11 lakes with a water quality of Class III, which were predicted 81.8% correctly. In total, 21 Class IV lakes were accurately identified. In addition, one lake was classified as Class V.

Experiment 2: The Huangshi Dataset
The second experiment was conducted for the city of Huangshi and the surrounding water systems in the lower reaches of the Yangtze River. The classification accuracies for the four levels are listed in Table 8. For RF, the number of trees (100) and the minimum number of samples required to split an internal node (10) were adjusted using the same adjustment strategy as that used in the first study area. The strategy of superparametric adjustment of the DNN model was the same as for the first study area. The structure of the neural network was manually adjusted. The number of hidden layers is 4, and the number of neurons in each layer is 2 8 . The Adam optimizer was selected, with a learning rate of 0.01, an iteration number of 2000, and 10% of the neurons were randomly deleted. The RF-CRF achieved the best classification performance (90.35% of OA, 0.868 of Kappa), higher than the DT-based, DNN-based and the RF-based methods. Its Kappa is 0.047 higher than that of RF. The highest prediction accuracy for the samples of the four levels is 99.06%, and there is no obvious class with a high misclassification ratio. Based on the trained four models, the water quality for the second study area was predicted. A representative classification result is selected from the lakes of each water quality level for display in Figure 6. From the perspective of the mapping effect, the CRF still performs best, followed by the RF. The mapping results are consistent with the calculated accuracy of the classifier. Haikou Lake is located in Huangshi City, which is directly connected to the Yangtze River. The field test results of the water quality showed that the phosphorus exceeded the standard, and the performance was Class VI. Figure 6(c4) is the RF classification result of Haikou Lake. Although the RF classification accuracy is not much different from the RF-CRF, there is still a certain phenomenon of salt and pepper. It is known during the manual interpretation that the class labels of all the pixels in the same lake are the same, but the pixel-based classification method ignores this important information. And the RF-CRF used the correlation information of adjacent pixels in the water images, so that all the pixels of Haikou Lake were completely predicted accurately. In addition, the performance at Baoan Lake and Daye Lake was also excellent.
Sensors 2020, 20, x FOR PEER REVIEW 14 of 18 The spatial distribution of the water quality level based on the RF-CRF in the second study area is shown in Figure 7. The Class VI lakes (red) are concentrated on both sides of the Yangtze River, and most of them are located in Huangshi City. Combined with Table 8, the accuracy of its prediction is close to 100%, which is much higher than DT (80.62%), DNN (75.97%), and RF (84.37%) models. Going to the upper reaches of the Yangtze River in the Ezhou, the water quality has improved significantly, and most of them are Class IV water. However, the situation at the edge of the lake can still be seriously mispredicted. Especially small lakes, it is difficult to determine the water quality of the lake as a whole. In future research, it will be necessary to adopt a satellite with a high resolution.
According to the classification result maps of the water quality levels for the Huangshi dataset, the class with the largest proportion is regarded as the water quality level of the lake, as shown in Table 9. The number of Class III and Class VI lakes is 8 and 19 respectively, and the accuracy all reaches 100%. The classification accuracy of Class IV and Class V lakes is 63.6%. It is found through statistical observation that most of the misclassified lakes are extremely small lakes, and the noise phenomenon is obvious. Because the CRF model classifies based on contextual information, the finer patches will reduce the classification effect. In addition, since the unary potential of CRF is pixelbased RF model, the bottom reflection of the lake will directly affect the probability distribution of class labels, and then affect the final classification results of the CRF model. Therefore, application of the RF-CRF to remotely sense water-quality levels needs to be further strengthened, and it is necessary to improve the spatial resolution of images in future experiments. The spatial distribution of the water quality level based on the RF-CRF in the second study area is shown in Figure 7. The Class VI lakes (red) are concentrated on both sides of the Yangtze River, and most of them are located in Huangshi City. Combined with Table 8, the accuracy of its prediction is close to 100%, which is much higher than DT (80.62%), DNN (75.97%), and RF (84.37%) models. Going to the upper reaches of the Yangtze River in the Ezhou, the water quality has improved significantly, and most of them are Class IV water. However, the situation at the edge of the lake can still be seriously mispredicted. Especially small lakes, it is difficult to determine the water quality of the lake as a whole. In future research, it will be necessary to adopt a satellite with a high resolution.

Conclusions
In this paper, in view of the China's water quality assessment system, we discussed the possibility of using Landsat 8 imagery and machine learning methods to assess the water quality of inland lakes on a large scale. Due to the spatial continuity of the lake, the CRF classifier based on the probability map is suitable for the current scene. The RF was applied as the unary potential of the CRF model. To evaluate CRF performance, three other commonly used pixel-based classifiers were selected for comparison. The classification accuracy of the RF-CRF model was found to be the highest, with the Kappa in the two study areas being 0.841 and 0.868, respectively. We also investigated the prediction effect of the RF-CRF model on the whole images.
The ultimate goal of this paper was to explore the possibility of using satellite imagery and machine learning algorithms to rapidly evaluate lake water quality levels. By training a small number of samples, we can predict the water quality of a large range of lakes and basically determine the According to the classification result maps of the water quality levels for the Huangshi dataset, the class with the largest proportion is regarded as the water quality level of the lake, as shown in Table 9. The number of Class III and Class VI lakes is 8 and 19 respectively, and the accuracy all reaches 100%. The classification accuracy of Class IV and Class V lakes is 63.6%. It is found through statistical observation that most of the misclassified lakes are extremely small lakes, and the noise phenomenon is obvious. Because the CRF model classifies based on contextual information, the finer patches will reduce the classification effect. In addition, since the unary potential of CRF is pixel-based RF model, the bottom reflection of the lake will directly affect the probability distribution of class labels, and then affect the final classification results of the CRF model. Therefore, application of the RF-CRF to remotely sense water-quality levels needs to be further strengthened, and it is necessary to improve the spatial resolution of images in future experiments.

Conclusions
In this paper, in view of the China's water quality assessment system, we discussed the possibility of using Landsat 8 imagery and machine learning methods to assess the water quality of inland lakes on a large scale. Due to the spatial continuity of the lake, the CRF classifier based on the probability map is suitable for the current scene. The RF was applied as the unary potential of the CRF model. To evaluate CRF performance, three other commonly used pixel-based classifiers were selected for comparison. The classification accuracy of the RF-CRF model was found to be the highest, with the Kappa in the two study areas being 0.841 and 0.868, respectively. We also investigated the prediction effect of the RF-CRF model on the whole images.
The ultimate goal of this paper was to explore the possibility of using satellite imagery and machine learning algorithms to rapidly evaluate lake water quality levels. By training a small number of samples, we can predict the water quality of a large range of lakes and basically determine the water quality level of each lake. This approach not only has great benefits with regard to time cost, but is also represents a breakthrough in the application of remote sensing technology in water quality monitoring. And this methodology could be used by authorities on the water quality monitoring program in China. Through continuous exploration of the possibility of the CRF applied for water quality classification, further improving the accuracy of classification, training a more stable model with strong mobility, the monitoring team will be able to regularly assess the water quality of inland water using the satellite imagery rapidly.
In the future, satellites with a high spatial resolution and short revisit period could be utilized in water quality monitoring, and it will be possible to explore emergency response strategies via water quality remote sensing. Although the CRF considered in this paper were based on the characteristics of the spectral dimension and spatial dimension for classification. But some more valuable spatial features (textures, edges, etc.) are not used. Therefore, in future experiments, extraction of spatial features will be considered, and spatial-spectral fusion technology will be applied to further improve classification accuracy.