Monitoring of Urban Black-Odor Water Based on Nemerow Index and Gradient Boosting Decision Tree Regression Using UAV-Borne Hyperspectral Imagery

Wei, Lifei; Huang, Can; Wang, Zhengxiang; Wang, Zhou; Zhou, Xiaocheng; Cao, Liqin

doi:10.3390/rs11202402

Open AccessArticle

Monitoring of Urban Black-Odor Water Based on Nemerow Index and Gradient Boosting Decision Tree Regression Using UAV-Borne Hyperspectral Imagery

by

Lifei Wei

^1,2,

Can Huang

^1,*,

Zhengxiang Wang

^1,2,

Zhou Wang

¹,

Xiaocheng Zhou

³ and

Liqin Cao

⁴

¹

Faculty of Resources and Environmental Science, Hubei University, Wuhan 430062, China

²

Hubei Key Laboratory of Regional Development and Environmental Response, Hubei University, Wuhan 430062, China

³

Key Laboratory of Spatial Data Mining and Information Sharing, Ministry of Education, Fuzhou University, Fuzhou 350116, China

⁴

School of Printing and Packaging, Wuhan University, Wuhan 430079, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2019, 11(20), 2402; https://doi.org/10.3390/rs11202402

Submission received: 22 August 2019 / Revised: 13 October 2019 / Accepted: 15 October 2019 / Published: 16 October 2019

(This article belongs to the Special Issue Remote Sensing of Water Resources Monitoring, Parametrization and Modeling)

Download

Browse Figures

Versions Notes

Abstract

The formation of black-odor water in urban rivers has a long history. It not only seriously affects the image of the city, but also easily breeds germs and damages the urban habitat. The prevention and treatment of urban black-odor water have long been important topics nationwide. “Action Plan for Prevention and Control of Water Pollution” issued by the State Council shows Chinese government’s high attention to this issue. However, treatment and monitoring are inextricably linked. There are few studies on the large-scale monitoring of black-odor water, especially the cases of using unmanned aerial vehicle (UAV) to efficiently and accurately monitor the spatial distribution of urban river pollution. Therefore, in order to get rid of the limitations of traditional ground sampling to evaluate the point source pollution of rivers, the UAV-borne hyperspectral imagery was applied in this paper. It is hoped to grasp the pollution status of the entire river as soon as possible from the surface. However, the retrieval of multiple water quality parameters will lead to cumulative errors, so the Nemerow comprehensive pollution index (NCPI) is introduced to characterize the pollution level of urban water. In the paper, the retrieval results of six regression models including gradient boosting decision tree regression (GBDTR) were compared, trying to find a regression model for the retrieval NCPI in the current scenario. In the first study area, the retrieval accuracy of the training dataset (adjusted_R² = 0.978), and test dataset (adjusted_R² = 0.974) was higher than that of the other regression models. Although the retrieval effect of random forest is similar to that of GBDTR in both training accuracy and image inversion, it is more computationally expensive. Finally, the spatial distribution graphs of NCPI and its technical feasibility in monitoring pollution sources were investigated, in combination with field observations.

Keywords:

unmanned aerial vehicle; hyperspectral imagery; Nemerow index; gradient boosting decision tree; urban water; black-odor water

Graphical Abstract

1. Introduction

In urban development, urban rivers play a vital role, not only in the function of water circulation and shaping the urban landscape, but also flood control, drainage, and maintaining the regional water balance. However, with the acceleration of the urban development process, especially for developing countries, the pollution of urban rivers has developed into a problem that cannot be ignored [1]. In China, many urban rivers are polluted, and some of them have become black and odorous, forming the so-called urban “black-odor water” [2,3]. In the areas of freshwater and marine ecosystems that are deficient in oxygen, this type of polluted water is also known as black water blooms, black spots, black water agglomerates, or dead zones [4,5,6]. The black color and odor are an extreme manifestation of organic pollution. The black-odor water not only pollute the water and destroy river ecosystems, but they also emit stench, and the breeding of microorganisms causes peripheral air pollution and even outbreaks of infectious diseases. Since April 2, 2015, local governments across China have gradually incorporated black-odor water management into their work plans. The total number of occurrences of black-odor water in 36 key cities of China was 897, and the completion rate of successful remediation was 70% on average, with another 274 occurrences added as of October 27, 2018 (http://www.hcstzz.com/). At the time of publication, the total number of black-odor water occurrences in China is 2100 [7].

To date, the published articles on black polluted water have mainly focused on the biochemical mechanism of water and black odor [8], the related treatment [9,10,11], and the optical characteristics of black-odor water [12,13]. Urban black-odor water identification, water pollution assessment, and river pollution source monitoring still rely mainly on in situ campaigns that provide point measurements, with a long monitoring period but low monitoring frequency [14]. The low frequency of in situ monitoring makes it difficult to evaluate short-term temporal variation, and limited extent of in situ monitoring makes it difficult to evaluate spatial variation. In contrast, remote sensing techniques have monitoring resolutions and frequencies that allow detection of spatial and temporal changes [15]. The color change of polluted water is determined by the optical properties of its dissolved and particulate components. Therefore, the remote sensing reflectance spectrum of the water information is clearly different from that of non-polluted water [16]. To date, very few remote sensing monitoring methods have been developed for use with black-odor water. Shuang et al. [17] used high spatial resolution imagery combined with a spectral index model for threshold segmentation, mainly relying on empirical interpretation, which can effectively identify black-odor water, to a certain extent. However, the threshold needs to be set in this method manually, resulting in models that may not be generally applicable. It also cannot be applied to all types of polluted waters, and the results of the water pollution assessment cannot be quantified. As early as 2014, for the phenomenon of black water aggregation (BWA) in Taihu Lake in China, a BWA search algorithm based on digital number (DN) gray values and research area variance was proposed and achieved good results. However, an error in visual interpretation will affect the final evaluation accuracy, and BWA can only be identified simply as being present or not [18].

According to the Chinese Government’s Guideline for Urban Black and Odorous Water Treatment (hereinafter referred to as “the Guide”) [19], the assessment of non-black-odor water (NBO), mild black-odor water (MBO), and severe black-odor water (SBO) is not determined by a single water quality parameter, but includes the four water quality parameters of Secchi depth (SD), dissolved oxygen (DO), oxidation-reduction potential (ORP), and ammonia nitrogen (AN). Therefore, the method of quantitatively inverting a single indicator by means of remote sensing does not apply to the current situation.

In this paper, a weighted multi-factor environmental quality index—the Nemerow comprehensive pollution index (NCPI)—is introduced in [20]. The NCPI is mostly used for soil heavy metal pollution assessment and the comprehensive evaluation of environmental quality [21,22]. When combined with the black-odor water determination approach mentioned in the Guide, it is not only possible to identify the presence of the black-odor water, but it is also possible to calculate the pollution level of black-odor water quantitatively. Therefore, compared with single parameter factor inversion, pollution index inversion based on the Guide and UAV-borne hyperspectral imagery is more suitable for the current situation. Hyperspectral imagery has a high spectral resolution and rich feature information, and is increasingly used in ecological environment monitoring [23]. In this study, we applied an unmanned aerial vehicle (UAV) remote sensing platform equipped with a miniature hyperspectral camera to obtain hyperspectral remote sensing (HRS) image data. These data were combined with ground-measured spectra to select a machine learning regression model with high stability and high accuracy—gradient boosting decision tree regression (GBDTR)—allowing us to develop a water pollution monitoring and evaluation method to quantitatively retrieve the NCPI. The effect of “water-leaving reflectance” on the NCPI is different from the effect of reflectance spectra on the leaf area index, especially with near-infrared spectroscopy [24]. Therefore, the pollution assessment process for urban riverways proposed in this paper is more closely related to the empirical statistical method of linking water-leaving reflectance and the Nemerow index. The objectives of this study were: (a) To verify the accuracy of the NCPI in the application of black-odor water identification and water pollution assessment; (b) to combine UAV-borne HRS images and ground-measured spectra, with which a machine learning algorithm is used as the core to retrieve the water pollution index; and (c) to analyze the NCPI distribution in the study area, in order to provide technical support for the application of polluted water monitoring and targeted treatment, as well as pollution source monitoring.

2. Methodology

2.1. Study Area

On May 31 and June 1, 2019, we selected two riverways in Wuhan, China, as the research areas: Shahu Port (114°21′22.36′′E, 30°35′11.44′′N) and the Xunsi River (114°18′0.12′′E, 30°29′58.87′′N). Shahu Port is a famous “smelly water port” in Wuhan. According to the National Urban Black-odor Water Remediation Supervision Platform (http://gz.hcstzz.com), the riverway is under treatment, and the pollution situation has improved. However, there is still a smell, and the water is still cloudy. The riverway can be initially deemed to be slightly polluted qualitatively. The study area was made up of a section of Shahu Port from the intersection of Yangyuan South Road and Luojiagang, with 40 evenly distributed sampling points, as shown in Figure 1. The stretch of the Xunsi River from Wutai Gate to the Beijing–Guangzhou railway line is 450-m long and has been included in the scope of black-odor water treatment. However, there is no effective governance at present, and it can be judged from the senses that the pollution level of this study area is much higher than that of Shahu Port. As shown in Figure 2, a section of U-shaped riverway was chosen as the second study area, and a total of 38 sampling points were evenly distributed along the stretch of river. Due to the slow flow at the bend of the river and the long-term accumulation, visible surface residue and odors are common.

Shahu Port is an important channel connecting the East Lake and Sha Lake to the Yangtze River. The existence of black-odor water affects the water quality of the Yangtze River. The drinking water of the residents is mostly from the river in Wuhan. Xunsi River is an important drainage channel in South Lake and Baishazhou area. In the flood season of 2016, due to the poor water-carrying capacity of the Xunsi River and other multiple factors, the South Lake area was seriously waterlogged. Taking into account the military games held in Wuhan in 2019 and the geographical location of these two rivers in the center of the city, the Wuhan government has attached great importance to it. Therefore, choosing these two areas as study areas not only supports the government’s planning, but also can observe their pollution changes for a long time.

2.2. In Situ Data and Spectra Collection

The data acquisition operation in the field included the measurement of SD, DO, ORP, and water temperature, the collection of water samples and ground-measured spectra, as well as the longitude and latitude data of the sampling locations. Bottled water samples were taken back to the laboratory for testing, and the AN was measured indoors.

In the field, SD was measured using a JCT-8 Secchi disk. The disk was slowly submerged into the water until the black-white disk was only just visible. The scale value was then recorded. To ensure stable readings, the measurement was repeated two or three times. A Shanghai Lei magnetic JPB-607A portable dissolved oxygen meter was applied to measure DO, and the instrument simultaneously recorded the water temperature at the sampling points. ORP was measured in the field by a portable oxidation-reduction potentiometer. Measurement of AN was based on gas phase molecular absorption spectrometry (HJ/T 195-2005), and the measuring instrument was a GMA3376 gas phase molecular absorption spectrometer.

The measurement of the water surface spectra was based on an “above-water method”. The instrument used in the spectral acquisition was an American ASD FieldSpec 3 field-portable spectrometer (wavelength range of 350–2500 nm) manufactured by ASD Inc. (Boulder, Colorado, U.S.A) and provided by the China University of Geosciences (Wuhan, China). The acquisition number for the “spectral average” based on the ASD spectrometer was set to 10, indicating 10 measurements per second. The larger the value, the higher the accuracy and the better the denoising effect, but the greater the power consumption. In addition, in order to eliminate the interference of transient changes in ambient light conditions and ensure the stability of the acquired spectra, five water surface spectra and five skylight spectra were collected at the same point. The subsequent spectral pretreatments were averaged based on these 10 spectra.

There are some pictures of the in-site measurements of spectral data using ASD device as shown in Figure 3. Among them, (a) is the process of whiteboard correction, (b) is the process of measurement of spectra above the water surface, and (c) is the process of measurement of sky spectra. As well, we arranged a special person to record the GPS position in Figure 3b. Figure 3d,e are the representative water-leaving spectra in Shahu Port and Xunsi River, respectively. In the legend, the P+i point corresponds to the spatial distribution of the sampling points in Figure 8 and Figure 11, and the subsequent values are the measured results of SD, DO, ORP and AN. According to the high and low distribution of NCPI in Shahu Port, four points P4, P11, P13 and P33 were selected. The values of P4 and P11 are similar, and they are upstream of the drain. The values of P13 and P33 are similar, they are downstream of the drain in Figure 1. It can be seen that the spectral reflectance gradually increases from upstream to downstream. As will be discussed later, from P11 to P13, there will be a sudden drop in NCPI. In the Xunsi River, P5 with a less polluted level, P16 at the turn, and P29 and P34 at the end of the river were selected. P16 is at the point where the NCPI rises, and the NCPIs at P29 and P34 are the highest, respectively on the sides of the river. The spatial distribution corresponding to NCPI will be discussed in detail later in the NCPI statistical results.

Measurement of the reflectance was taken at least five times to calculate the ratio of upwelling radiance to downwelling irradiance, following the NASA Ocean Optics Protocols [25,26]. To measure the total radiance

L_{s w} (λ)

(

W \cdot m^{- 2} \cdot s r^{- 1}

), the radiometer was pointed toward the water surface at the angles of

(θ, ϕ) = (40 °, 135 °)

(viewing angle

θ

relative to water surface, azimuth angle

ϕ

relative to the sun’s azimuth) to avoid direct solar reflection and the surrounding shadows. Due to the reflection of skylight on the water surface, the radiometer was then rotated upwards to view the sky at angles of

(θ_{s k y}, ϕ_{s k y}) = (θ, ϕ) = (40 °, 135 °)

(zenith angle

θ_{s k y}

, azimuth angle

ϕ_{s k y}

) to measure the sky radiance

L_{s k y} (λ)

(

W \cdot m^{- 2} \cdot s r^{- 1}

). When collecting the ground spectra, it was found that if the observation geometry was not standardized, spectral overexposure would easily occur. The measurement results through certain observation geometry were used to calculate the remote sensing reflectance

R_{r s}

[27,28,29]:

R_{r s} = \frac{L_{s w} - ρ \cdot L_{s k y}}{E_{d} (0^{+})}

(1)

E_{d} (0^{+}) \equiv E_{s} = L_{p} \cdot \frac{π}{ρ_{p}}

(2)

where

E_{d} (0^{+}) \equiv E_{s}

is the incident spectral irradiance measured above the water surface.

L_{p}

is the signal converted to the 100% reflectivity plate.

ρ_{p}

is the reflectivity of the air-water interface to skylight, which is related to the sky radiance distribution, wind speed and direction, solar position, and water state [30].

2.3. Airborne Hyperspectral Imagery and Preprocessing

An eight-rotor DJ M600 Pro UAV was selected as the remote sensing platform. The sensor mounted on the UAV was a Headwall NANO-Hyperspec ultra-micro airborne hyperspectral imaging spectrometer. The hyperspectral images taken by the UAV are in the visible and near-infrared (NIR) region of the electromagnetic spectrum (400–1000 nm). The numbers of spectral channels and spatial channels are 270 and 640, respectively.

Based on the experimental conditions and the objective conditions, such as the width of the riverway in the study area, a single strip in the first area of Shahu Port and two strips in the second area of Xunsi River were determined in the UAV flight plan. A lens with a focal length of 8 mm was used in both areas, with a flight height of 200 m. The image spatial resolution was 0.185 m and the pixel spacing was 7.4 um. The observed wind speed for UAV takeoff and landing was less than 8 m/s, which satisfied the requirements for safe flight operation.

Since the initial data collected from the hyperspectral sensor storage module are in the form of a DN value image, the data must undergo a series of preprocessing before use (Figure 4). The preprocessing generally includes sensor radiation correction, geometric correction, inlaying, site absolute radiometric calibration, masking, and water extraction. Due to the low flight height of the UAV, complex atmospheric effects could be ignored in the radiometric calibration [31]. The method is described as follows:

(1): The sensor radiation calibration converts the signal output by the sensor unit into actual radiance. In this study, the original image data were converted from DN value to radiance value by pixel, according to the model and the conversion parameters in the radiation correction document provided by the hyperspectral sensor manufacturer;
(2): The UAV-borne Headwall Nano-Hyperspec hyperspectral sensor used in this study is a linear push-broom imaging sensor. Therefore, it is easily affected by shake during the flight, which can cause severe deformation of the obtained imagery. The UAV features a position and orientation system (POS) which integrates differential GPS technology and inertial measurement unit (IMU) technology. The POS can provide sensor position and attitude parameters to directly and quickly geolocate the images to the correct geographic location;
(3): Due to the difference in illumination geometry and flight time between the two strips, this may cause other changes in the remote sensing images, in addition to the interference factors caused by water quality changes. Therefore, it was necessary to adjust the second strip with the first strip as a reference, and to then splice the two strips. The imagery was processed using ENVI, and the histogram was matched to the entire image;
(4): The radiation calibration of hyperspectral imagery is commonly undertaken in the 6S atmospheric correction model and the MODTRAN model, but these models are only suitable for the situation where the atmospheric environment during the flight is relatively complicated and the flight height of the UAV is high (at the kilometer level). The flight areas of this study were located in urban areas, and the relative flight was only 200 m. Therefore, in the radiation calibration, we could ignore the complex atmospheric effects and only consider the linear relationship of the DN or radiance measured on board and the in-site reflectance [32,33,34]. Since the reflectivity of the standard board could not meet the experimental requirements, the linear relationship calibration of the UAV-borne radiance images was completed by the use of ground-measured spectra [35]. The number of ground-measured spectra used for the absolute radiation calibration in the two study areas was 29 and 23, respectively. The calibration program was written in IDL/ENVI, and the linear function fitted by the ground spectra and UAV spectra was applied to the radiation calibration of the UAV-borne images;
(5): The UAV-borne images had a wavelength range of 400–1000 nm, covering the visible to near-infrared. Since the mid-infrared band which is usually used in the modified normalized difference water index (MNDWI) was not covered, we used a green (560 nm) and a near-infrared (830 nm) band for masking in this experiment [36], and the water area was extracted by a decision tree model. The normalized difference water index (NDWI) threshold in Shahu Port was [0.592, 0.6], and the threshold in Xunsi River was [0.5, 0.77].

In addition, as the UAV flight height was low, the spatial resolution of the hyperspectral images was only 0.185 m. If the spectrum is extracted from a single pixel, it is inevitable that deviation will occur. If the mean window is chosen as too large, it will cause the spectrum to lose some features. Through many experiments, it was determined that the spectral average of a 5 × 5 pixel matrix was suitable and could be used in the subsequent experiments.

2.4. Spectral Data Preprocessing

The UAV-borne image preprocessing process was described in Section 2.3. In the experiment, 40 (training sample number/test sample number = 7:3) and 38 (training sample number/test sample number = 8:2, because of too few training samples) spectra were extracted from the images of the first study area of Shahu Port and the second study area of Xunsi River, respectively The band range was 400–1000 nm, and the experiment was performed by intercepting the 400–900 nm (225 bands) band range. Correlation evaluation is an important part of quantitative retrieval. Pearson correlation coefficients were used to characterize the correlation between the spectra x_spectra and the inverted target y_nemerow_index. The band ratio method can eliminate the interference of water surface roughness and background noise, and is a commonly used contrast enhancement operation in quantitative remote sensing inversion [37]. In this study, the exhaustive method was used to calculate the ratio between any two bands, and then the Pearson correlation coefficient was obtained with the Y_Nemerow_index [38,39,40]. The correlation is clearly improved, as shown in Figure 5. In Shahu Port (Figure 5a,b), the maximum correlation coefficient between the original spectra and the Nemerow index was 0.75, which was raised to 0.89 after band ratio treatment. Similarly, for Xunsi River (Figure 5c,d), the maximum correlation coefficient increased from 0.46 to 0.80.

After the band ratio processing, 50400 ratio features were generated, which were arranged in descending order, according to the correlation coefficients. Through the feature accumulation iterative experiment, the first 21 ratio features were selected as the input variables from the first study area. It was more appropriate to select 91 features from the second study area. The experiment showed that continuing to increase the number of features would reduce the accuracy of the inversion. In the future research, more features are worth trying [35].

2.5. Modeling Approaches

2.5.1. Nemerow Comprehensive Pollution Index

The Nemerow index is a weighted multi-factor environmental quality index that takes into account the extreme values. Therefore, the influence and effect of the water quality parameter with the largest numerical value on the quality of the water environment can be highlighted. The physical concept of NCPI is clear and the calculation process is simple. It is one of the most commonly used methods for comprehensive pollution index calculation. According to the Guide (Table 1), in the four physical and chemical indicators (SD, DO, ORP and AN), when more than 60% of the data of one indicator or more than 30% of the data of two indicators reach the level of SBO, the detection point should be regarded as SBO; otherwise, it can be considered as MBO or NBO. This means that if a single indicator at a monitoring point reaches the severe level in Table 1, the point reaches the level of SBO. This classification is also applicable to MBO and NBO. It can be seen that the assessment of water pollution in the Guide highlights the effect of a single indicator. Based on the characteristics of NCPI and the objective requirements of the Guide for the evaluation of black-odor water, NCPI was determined as a quantitative indicator of the water quality assessment in our experiment.

According to the Guide for black-odor water classification standards, the characteristic indicators are defined as:

0 < P_{i} \leq 1

equals NBO;

1 < P_{i} \leq 2

equals MBO;

2 < P_{i} \leq 10

equals SBO, where

P_{i}

represents the NCPI of the i-th sample. The dimensionless linear relationship is shown in Figure 6. The horizontal axis node of the piecewise function comes from Table 1. According to the water temperature (about 24 °C) recorded at the sampling points under standard atmospheric pressure, the corresponding amount of saturated DO (8.41 mg/L) was determined as the maximum DO value. The range of ORP was [−413, 811] (mV).

The equation for the NCPI is as follows [41,42,43]:

P_{f i n a l} = \sqrt{\frac{{P_{\max}}^{2} + {(\sum_{i = 1}^{n} W_{i} P_{i})}^{2}}{2}}

(3)

W_{i} = \frac{C_{i} / S_{i}}{\sum_{i = 1}^{n} C_{i} / S_{i}} = \frac{I_{i}}{\sum_{i = 1}^{n} I_{i}}

(4)

where

P_{\max}

is the maximum of all the indices

P_{i}

;

W_{i}

represents the weight of the i-th polluting substance; and

I_{i}

is the ratio of the i-th water quality parameter factor

C_{i}

to its objective concentration

S_{i}

. The DO and AN objective concentrations

S_{D O}

(5 mg/L) and

S_{A N}

(1 mg/L) are obtained according to the class-3 water standard for surface water; the SD standard value

S_{T P}

(1.2 m) is obtained from the class A or B landscape-water standard; and the ORP standard value

S_{O R P}

(50 mV) is obtained from the critical value for black-odor water and NBO water.

2.5.2. Gradient Boosting Decision Tree and Other Models

Valiant and Kearns proposed the concept of weak learning (or a base learner) and strong learning. A learning algorithm with a recognition error rate less than 1/2, that is, the algorithm with a slightly higher accuracy than random guessing is called a weak learning algorithm. A learning algorithm that has high recognition accuracy and can be completed in polynomial time (an algorithm is said to be of polynomial time if its running time is upper bounded by a polynomial expression in the size of the input for the algorithm) is called a strong learning algorithm. A boosting algorithm that can upgrade multiple weak learners to strong learners is an example of homogeneous ensemble learning. The ensemble learning is to combine multiple weak monitoring models here to get a better and more comprehensive strong monitoring model. Boosting algorithm is just one of the ensemble algorithms. It improves the performance of the classifier by changing the weight of the training samples (increasing the weight of the error-corrected samples and reducing the weight of the paired samples), learning multiple classifiers, and linearly combining these classifiers. Gradient boosting is a type of boosting method. Its main idea is to create a new model each time based on the direction in which the gradient of the model loss function was previously established. The gradient boosting decision tree (GBDT) method used in the experiment was originally proposed by Friedman [44]. Gradient boosting uses the decision tree as the base learner to globally converge the algorithm according to the direction of the negative gradient [45,46]. The model F is defined as the addition model:

F (x, w) = \sum_{m = 0}^{M} β_{m} h_{m} (x, α_{m}) = \sum_{m = 0}^{M} f_{m} (x, α_{m})

(5)

where x is the input variable, the function h is the decision tree,

α

is the parameter of the decision tree,

β

is the weight of each tree, and M is the maximum number of regression trees.

There are now input samples

D = {(x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{n}, y_{n})}

, the maximum number of iterations M, the loss function L, and the base learner

h (x)

. The steps of GBDT are presented as follows:

Step 1: Initial base learner, which minimizes the initial loss function:

f_{o} (x) = \arg \min_{ρ} \sum_{i = 1}^{N} L (y_{i}, ρ)

(6)

where

ρ

represents the initialized base learner.

Step 2: For the number of iterations m = 1:M, calculate the negative gradient of the loss function at the m-th iteration:

- g_{m} (x_{i}) = - {[\frac{\partial L (y_{i}, F (x_{i}))}{\partial F (x_{i})}]}_{F (x_{i} = F_{m - 1} (x_{i}))}, i = 1, 2, \dots, N

(7)

A regression-based base learner is constructed in each iteration.

F_{m} (x)

is the prediction function obtained after the m-th iteration, and the corresponding loss function is

L (y, F_{m} (x))

.

- g_{m} (x_{i})

indicates the direction in which the base learner of the m-th iteration is established. The m-th base learner is built based on the gradient descent direction of the predicted loss function generated by the previous m − 1 iterations.

Step 3: Calculate the parameters

α_{m}

of the regression tree

h_{m} (x, α_{m})

based on the m-th iteration.

α_{m} = \arg \min_{α, β} \sum_{i = 1}^{N} {[- g_{m} (x_{i}) - h (x_{i}, α)]}^{2}

(8)

Step 4: Calculate the optimal step size

β_{m}

in the search direction of the m-th iteration.

β_{m} = \arg \min_{β} \sum_{i = 1}^{N} L (y_{i}, F_{m - 1} (x_{i}) + β h (x_{i}, α_{m}))

(8)

Step 5: Update the prediction function obtained after the iteration.

F_{m} (x) = F_{m - 1} (x) + v \cdot β_{m} h (x, α_{m})

(9)

In order to prevent overfitting, it is necessary to multiply the step by the learning rate v, and the range of the learning rate v is (0, 1). The smaller the value of v, the larger the number of iterations M when the same accuracy is achieved. When v is too small, it may be difficult to reach the local optimal solution. Conversely, the larger the learning rate, the easier it is to overfit, which is similar to the concept of the learning rate in deep neural networks.

In this experiment, since the amount of samples was small, no additional adjustment was made, such as the minimum number of samples required to split an internal node, the maximum depth of the individual regression estimators, the number of features to consider when looking for the best split, and the minimum number of samples required to be at a leaf node. We just tried to adjust the learning rate, the number of iterations, and the fraction of samples used for fitting the individual base learners (subsample). During the experiment, it was found that lowering the subsample rate can effectively prevent overfitting.

In the paper, in addition to using the gradient boosting decision tree regression (GBDTR) model to fit the target index, many other representative algorithms were introduced, including the multi-layer perception regression (MLPR), random forest regression (RFR), support vector regression (SVR), ordinary least square regression (OLSR), and kernel ridge regression (KRR) models [47,48,49,50,51]. MLPR, which is also known as a feedforward neural network, is the predecessor of the current popular deep neural networks. The multi-layer hidden layers are increased, compared to the perceptron, and the activation function is extended. Random forest (RF), like GBDT, is an ensemble learning method, and a decision tree is also used as the base learner. RF is based on bagging ensemble learning to introduce random attribute selection in the training process of the decision tree. SVR is an important branch of support vector machine (SVM). The difference is that there is only one class of SVR sample points. The optimal hyperplane it seeks is not the “most open” that divides two or more types of sample points, as in SVM, but the one that minimizes the total deviation of all the sample points from the hyperplane. The more commonly used OLSR is selected from the generalized linear models. The basic principle of OLSR is that the best fit curve minimizes the sum of the squares of the distances from the points to the line (the residual square sum (RSS)). KRR is a kernel trick based on ridge regression to implement a non-linear transformation of a linear model. The KRR model is similar to SVR, but the loss function is different. KRR uses squared error loss, and SVR uses ε-insensitive loss, both of which use L2 regularization.

2.5.3. Statistical Analysis

In this study, the

a d j u s t e d_R^{2}

, the RMSE [52,53], and the mean absolute percentage error (MAPE) were used to determine the accuracy of each model. Since the size of

R^{2}

is affected by the size of the dataset samples, the larger the sample size, the larger

R^{2}

is. Therefore, there will be some errors in comparing the inversion results of different datasets. Of course, comparing the advantages and disadvantages of each model in the same dataset is not affected. To solve this problem, the

a d j u s t e d_R^{2}

[54] is introduced, which penalizes for the addition of more predictor variables. RMSE can be used to measure the deviation between inversion values and real values, but it is more sensitive to outliers than the mean absolute error (MAE). Among the different indices, the implementation code of the measures of goodness-of-fit

R^{2}

and RMSE of the water pollution degree were taken from the scikit-learn library, and the

a d j u s t e d_R^{2}

and MAPE were compiled in Python.

The coefficient of determination (

R^{2}

) and the adjusted coefficient of determination (

a d j u s t e d_R^{2}

) are defined as:

R^{2} = 1 - \frac{\sum {(Y_a c t u a l - Y_p r e d i c t)}^{2}}{\sum {(Y_a c t u a l - Y_m e a n)}^{2}}

(10)

a d j u s t e d_R^{2} = 1 - \frac{(1 - R^{2}) (n - 1)}{n - p - 1}

(11)

where Y_actual, Y_predict, and Y_mean are the real, predicted, and real mean values of the inversion indices, respectively. The denominator of Equation (10) can be expressed as the dispersion degree of the original data, the numerator represents the errors between the predicted and original data, and the division of the two can eliminate the influence of the discreteness of the original data. The normal range of

a d j u s t e d_R^{2}

and

R^{2}

is [0, 1]. The closer the value is to 1, the stronger the interpretation ability of the input variables of the models to the inversion target. In Equation (11), n denotes the number of samples and p denotes the number of features.

The mean absolute percentage error (MAPE):

M A P E = \frac{1}{N} \sum_{i}^{N} \frac{| X_{E s t, i} - X_{O b s, i} |}{X_{O b s, i}} \times 100 %

(12)

where

X_{O b s, i}

is the observed value in the field or laboratory, and

X_{E s t, i}

is the estimated value.

It was found that RMSE and MAPE were affected by the standardization of the datasets. When using an MLPR model, normal distribution normalization needs to be undertaken. The processed data conforms to the standard normal distribution, i.e., the mean is 0 and the standard deviation is 1. If the inversion results are not reduced to the original magnitude, the RMSE and MAPE evaluation results will be affected, but the

a d j u s t e d_R^{2}

will be completely unaffected. Therefore, in the case where the data magnitude is different, the

a d j u s t e d_R^{2}

can truly evaluate the results of the model fitting.

3. Results and Discussion

3.1. Gradient Boosting Decision Tree

Based on the data after the 2.4 section preprocessing, GBDTR was selected as the regression model. The learning rate in the third step was determined as 0.01, 0.001, and 0.0001, the maximum number of iterations was accumulated in turn, and the subsample was adjusted appropriately. The inversion result

a d j u s t e d_R^{2}

is shown in Figure 7. For Shahu Port, the three graphs (Figure 7a–c) represent the inversion results with different parameters (a: learning rate = 0.01, subsample = 0.5; b: learning rate = 0.001, subsample = 0.5; c: learning rate = 0.0001, subsample = 1), and the iteration number of 2350 was selected based on Figure 7b. Similarly, for Xunsi River, the three inversion results with different parameters (d: learning rate = 0.01, subsample = 0.5; e: learning rate = 0.001, subsample = 1; f: learning rate = 0.0001, subsample = 1) are shown in Figure 7d–f, and the iteration number of 3400 was selected based on Figure 7e. In order to increase the generalization ability of the model and prevent overfitting, the learning rate was reduced and the number of iterations was increased. However, if the learning rate is too small, the number of iterations is too large, which will lead to a waste of computing resources. According to the above adjustments, the statistical analysis indicators were used to evaluate the retrieval results, and the accuracies of both the training dataset (

a d j u s t e d_R^{2}

= 0.978, RMSE = 0.41 mg/L, MAPE = 24.69%) and the test dataset (

a d j u s t e d_R^{2}

= 0.974, RMSE = 0.48 mg/L, MAPE = 18.96%) were very satisfactory.

The inversion object mentioned above was no longer a specific field test result or laboratory test index, but a comprehensive pollution index that does not have any physical meaning. In the final step, the above trained models were applied to a single pixel of the UAV-borne HRS images. Since the band ratio method is used in the preprocessing stage, the UAV-borne hyperspectral image also needs to be preprocessed for a specific band. By observing the overall distribution of the NCPI of the urban riverway on the image, the gradual change of water pollution can be observed, which is of great significance for regulation and monitoring of urban polluted riverways.

3.2. First Dataset: Shahu Port

3.2.1. Model Optimization and Accuracy Evaluation

According to the field measurements or laboratory test results for SD, DO, ORP, and AN, compared with Table 1, it can be seen that the pollution level of each sampling point belongs to NBO, MBO, or SBO. The results of the manual interpretation are shown in Figure 8a. As well, Figure 8b,c are the spatial distribution of the points. The scale of the two images is the same. In the first picture, the x-axis represents the position number of the sampling points, which corresponds to the points in Figure 8b,c. The right y-axis indicates the pollution category of the sampling point, where 1 means NBO, 2 means MBO, and 3 means SBO. By observing the scatter plot of the interpretation results of the sampling points, the first 11 points belong to SBO, the 12th point belongs to MBO, and the next 28 points belong to NBO. The left y-axis is the calculation result for the NCPI. It can be seen from the line graph that the index of the first 11 sampling points is greater than 5, which is SBO water, according to the assumption of Section 2.5.1. The index of the 12th sampling point is between

(1, 2]

and belongs to MBO. It can be seen that there is a drain between P12 and P13 from Figure 8b. As well, it can be speculated that the NCPI sudden drop of sample 11 is related to the drain. The pollution index of the remaining points is between

(0, 1]

and belongs to NBO. It can be seen that the NCPI judgment result is completely consistent with the manual judgment result, and the NCPI can quantitatively reflect the degree of point pollution.

The input dataset was standardized using the z-score method, and the model parameters were initialized, including the one-layer hidden layer, the neuron number (

2^{^5}

), the learning rate (0.01), the maximum number of iterations (50), the activation function (Rule), and the optimizer (Adam). By observing the fitting accuracy of the training and test datasets, if the deviation of the predicted value and the true value is large, then the neuron number of the single hidden layer or the depth of the hidden layer can be increased; that is, the network complexity is increased. Increasing the maximum number of iterations while ensuring the learning rate is also appropriate. A large learning rate will result in a too large update range of weights. It is possible to cross the minimum value of the loss function, causing the parameter values to sway on both sides of the best value and no longer converge. However, if the learning rate is too small, the parameter update will be too slow and will consume more resources. Finally, the hidden layer (64, 64), the learning rate (0.0001), and the maximum number of iterations (500) were determined. Under the GBDTR model, the training set (

a d j u s t e d_R^{2}

= 0.823, RMSE = 1.18, MAPE = 53.78%) and the test set (

a d j u s t e d_R^{2}

= 0.809, RMSE = 1.31, MAPE = 44.25%) are far less accurate than for GBDTR. However, when we tried to increase the network complexity, the deviation between the predicted result and the true value did not decrease but increased.

In the process of RF hyperparameter adjustment, it was easy to avoid overfitting by adjusting the maximum number of iterations. Since the number of selected features was small, it was of little significance to adjust the maximum feature number or the maximum depth. When the maximum number of iterations was set to 400, and the minimum sample number of the leaf node was limited to 3, the overfitting phenomenon did not occur, while ensuring the accuracy. The

a d j u s t e d_R^{2}

of the test and the training datasets reached 0.96 or more, and the retrieval accuracy was almost the same as that of the GBDTR model.

The radial basis function (RBF) was selected in the SVR model. Therefore, the parameters to be adjusted were mainly gamma and the penalty coefficient C. The smaller the gamma value, the more support vectors there are. The higher the value of C, the more the error cannot be tolerated, and overfitting can easily occur; otherwise, there will be under-fitting. Finally, the sizes of gamma (580) and C (50) were determined. In the SVR model, both the

a d j u s t e d_R^{2}

of the training and test datasets reached 0.95, and the accuracy was slightly inferior to that of RFR.

On the training dataset, the

a d j u s t e d_R^{2}

of the OLSR model was 0.97, which was better than RFR and SVR. However, by observing the MAPE, it was found that the value was larger than that of the other two models. The true value and the estimated value were output, and it was found that the linear model had a poor prediction effect when the real value was small. Although the retrieval accuracy of the training data was high, the test dataset (

a d j u s t e d_R^{2}

= 0.506, RMSE = 2.11, MAPE = 145.11%) was particularly low and the generalization ability was poor.

In the KRR-related experiments, we attempted to use linear, RBF, sigmoid, and polynomial kernel functions. It was found that when the kernel function was selected to be polynomial, the number of degrees was 2, and the gamma was 60, no overfitting occurred, and the

a d j u s t e d_R^{2}

(training: 0.85, test: 0.85) was not low. From the MAPE (training: 76.94%, test: 98.36%), the same situation as OLSR occurred, and MAPE was larger when the RMSE was not large. The prediction value was determined and was found to be very poor when fitting small values.

For the first study area of Shahu Port, the evaluation indicators of the retrieval results of the above representative regression models are shown in Table 2. The scatter plots of the six regression models for the estimated and in-situ values are shown in Figure 9. The predicted values of only GBDTR, RFR and SVR model are concentrated on the diagonal, while others are more scattered. The closer the scatter is to the diagonal, the closer the estimate is to the in-situ value.

3.2.2. UAV-Borne Image Inversion Based on the Different Models

In Table 2, it is shown that the GBDTR retrieval accuracy is the highest, and the retrieval results of RFR and SVR are slightly worse, but they also perform well enough. Therefore, in this experiment, we attempted to use these three regression models for inversion on the UAV-borne HRS images. It was hoped that through the analysis of the inversion results, a suitable model for the monitoring of black-odor water would be found. In addition, since the prediction accuracy of the OLSR training dataset was higher than that of RFR and SVR, the OLSR linear model was used to invert a single pixel of the hyperspectral image. The predicted values far exceed the maximum value of 10, and some negative values also occur. The effect can be regarded as extremely poor.

The spatial distribution of the inversion results of NCPI based on the GBDTR, RFR, and SVR models is shown in Figure 10. In order to facilitate observation, the images were cut into two parts: (a)(b)(c) is the first part, and (d)(e)(f) is the second part. The three models generally reflect changes in the pollution status of the river. The pollution index in the first small segment is high, and in the second half is low. In the second half, there is a region with a significant increase in pollution level. The details are discussed in Section 3.4. Table 3 lists the statistical information of the UAV-borne image inversion map (Figure 10) for Shahu Port, the first line of which is the real value (NCPI maximum = 8.61, NCPI minimum = 0.59). According to the statistical results, the maximum and minimum values of the estimated values of GBDTR (maximum = 7.71, minimum = 0.48) and RFR (maximum = 7.35, minimum = 0.62) are similar, and not much different from the real values. The SVR model has some negative values for the single-pixel inversion (minimum = −0.45), and the maximum value is 12.04, which exceeds the set maximum value of 10, which is clearly not realistic. In addition, the time for each model to calculate the inversion map was also recorded in this experiment. Based on the same computer hardware (CPU: Core i7-8700; memory: 16 GB, 2666 MHz; graphics card: GTX1060 6 GB) and image size (rows: 3359, columns: 5280, feature number: 21), the GBDTR operation time is 178 s, RFR is 4010 s, and SVR is 43 s. The calculation speed of the retrieval graph of RFR is significantly more than that of GBDTR.

In summary, GBDTR and RFR obtain similar results for single-pixel estimation on UAV-borne images, and both obtain good results. However, the RFR operation time is much higher than that of GBDTR. Therefore, it is necessary to consider the calculation time when selecting a quantitative inversion method for UAV-borne hyperspectral imagery. When the area of the calculation graph is large, the difference will be huge. When SVR is inverted on the image, the stability is poor, and the SVR model is too sensitive to the difference in the features. As a result, SVR is not applicable to the current situation.

3.3. Second Dataset: Xunsi River

3.3.1. Model Optimization and Accuracy Evaluation

Figure 11a shows the pollution index values for Xunsi River and the results of the manual judgment. Figure 11b,c is the spatial distribution of the points. The scale of the two images is the same. In the first picture, the x-axis represents the position number of the sampling points, which corresponds to the points in Figure 11b,c. The manual interpretation results are shown in the scatter plot (red). Points 2–6, 9, 13, 14, 37, and 38 belong to MBO, and all the others are SBO. This result is completely consistent with the results shown in the NCPI line chart (blue). For the point where the pollution level is 2 (MBO), the NCPI falls between

(1, 2]

. The Nemerow index values of the remaining points are all greater than 2, and the judgment result is 100% coincident with the scatter plot (red). From the results of the manual interpretation, it can be found that there are no NBO samples in the sampling points of Xunsi River, and the overall pollution degree of the second study area is greater than that of the first study area. For the Nemerow pollution index, starting from sampling points 20 to 36, the index shows that the area is more polluted than other areas, which is consistent with the sensory results on the river. The study area is at a bend in the river, where a large amount of black pollutants float on the surface of the river and the odor is obvious, but this phenomenon is not seen in the straight portion.

Before the multi-layer perceptron regression model training, maximum and minimum normalization of the input samples and the inversion target were respectively performed in the range of

[0, 1]

. The initial parameters were the same as the first study area, including the one-layer hidden layer, the neuron number (

2^{^5}

), the learning rate (0.01), the maximum number of iterations (50), the activation function (Rule), and the optimizer (Adam). According to the parameter adjustment method described in Section 3.2.1, three hidden layers (256, 256, 128), a learning rate of 0.0001, and a maximum iteration number of 500 were determined. The retrieval results of the various models are shown in Table 5. The

a d j u s t e d_R^{2}

of MLPR (0.924) is slightly lower than that of GBDTR (0.938). However, regardless of the training or test dataset, the fitting accuracy is better than that for the first study area of Shahu Port.

For the RFR model, we attempted to adjust the maximum depth and the maximum features, but the effect was not obvious. Therefore, in the second study area, the maximum number of iterations was adjusted to 1000, and the minimum sample number of the leaf nodes was limited to 1. To calculate the evaluation indicator, the test dataset (0.926) was slightly higher than the training dataset (0.9), and this difference was also reflected in the RMSE and MAPE. As in the retrieval accuracy for the first study area, it can be found that the ability of the RFR model to fit the predicted target is not bad. However, it’s fitting accuracy is not optimal compared to GBDTR, for both Shahu Port and Xunsi River.

The RBF kernel function was also selected in SVR, and we set gamma = 65, C = 20. Attempting to increase the gamma or penalty parameter C would result in overfitting. Finally, the SVR inversion accuracy is not as good as for the first study area. The

a d j u s t e d_R^{2}

of the training data based on the SVR model is 0.781, the

a d j u s t e d_R^{2}

of the test data is 0.783, which is much lower than the 0.95 in the first study area, and the RMSE is greater than 1. However, under the SVR model, the MAPE is lower than the MAPE of GBDTR. Comparing the predicted values with the true values, it is found that the deviation is large in the prediction of individual samples, but the overall deviation is not large, so the

a d j u s t e d_R^{2}

is small and the MAPE is also small.

The OLSR model fitting results are typically overfitted. The prediction result for the training dataset is close to the true value, while the

a d j u s t e d_R^{2}

is negative on the test dataset. This means that the result of the inversion on the test data is almost the same as the manual judgment result.

For the KRR model of the second study area, linear, RBF, sigmoid, and polynomial kernel functions were also tried. Similarly, when choosing a polynomial kernel function, the best fit was selected. Compared with the lowest value (1.14) for the inversion target, the KRR regression model has a poor fitting accuracy (

a d j u s t e d_R^{2}

= 0.794, RMSE = 1.17) and is not suitable for the target value inversion under the current dataset.

For the second study area of Xunsi River, the evaluation indicators for the retrieval results of the above representative regression models are listed in Table 4. The scatter plots of the six regression models for the estimated and in-situ values are shown in Figure 12. Observing the test set in each figure, the inversion accuracy of GBDTR, MLPR and RFR model is higher. Especially GBDTR, the scatter is basically concentrated near the diagonal. As well, negative numbers appear in the estimates of OLSR model.

3.3.2. UAV-Borne Image Inversion Based on the Different Models

For the comparison of the inversion accuracy of each regression model listed in Table 4, three models with an

a d j u s t e d_R^{2}

of greater than 0.9 were selected, i.e., GBDTR, MLPR, and RFR. The inversion results were only compared to the three regression models from the pollution index images. On the hyperspectral images (row = 3519, column = 2780, feature number = 91), a single pixel was extracted cyclically as an input variable. The statistical information of the NCPI distribution map for the above models is listed in Table 5. It was found that the fitting time of the RFR model for the whole image is about 34 times that of GBDTR, while the calculation time of MLPR is the shortest, at only 174 s. The spatial distribution of the pollution index is shown in Figure 13, where Figure 13a is the inversion graph of the GBDTR model. According to the legend, the maximum value is 8.38 and the minimum value is 1.27. Compared with the pollution index (maximum value 8.62, minimum value 1.14) calculated by the real sample, the prediction results are close. Figure 13b is the inversion graph of the RFR model. From the graph, the gradation distribution is similar to that of GBDTR, and the maximum value (8.12) and the minimum value (1.81) are also close, so the distribution of the pollution area is also consistent with the field observation results. However, the RFR model fitting result is worse than that of GBDTR, and the calculation time of the whole image inversion is nearly 34 times that of GBDTR. Figure 13c is the inversion graph of the MLPR model. In the observation legend, the result of a single pixel prediction has a negative value, which is clearly inconsistent with the pollution index calculated by the ground sample. Negative values occur because standardization was required before training for a single pixel, and anti-standardization was also required for the accuracy evaluation. Therefore, the predicted value deviates greatly from the true value.

In summary, for the experiment in the second research area, the GBDTR, MLPR, and RFR models were selected to estimate the pollution index of the entire river. Both GBDTR and RFR can achieve good results, and the inversion results are close to the actual situation. However, the RFR inversion accuracy is not as high as that of GBDTR. A specific description of the GBDTR-based inversion of the UAV-borne images is provided in Section 3.4. In addition, in terms of runtime comparison, GBDTR is far superior to RFR.

3.4. UAV-Borne Image Inversion Based on the Gradient Boosting Decision Tree Regression Model

In the above Section 3.2 and Section 3.3, the inversion results of each model were discussed separately for the two datasets of Shahu Port and Xunsi River. From the retrieval accuracy of the training and test datasets, from the operation time of the spatial distribution graph of NCPI (Figure 14 for Shahu Port, Figure 15 for Xunsi River), and the single-pixel prediction results on the inversion graph, it was found that the GBDTR model was superior to the other models. The numbers in Figure 14 and Figure 15 represent the overall pollution index values calculated from SD, DO, ORP, and AN.

For the first study area of Shahu Port, in the previous discussion, it was noted that the minimum estimated in the GBDTR inversion graph is 0.48. When divided from a minimum to 1 as a color scale (dark blue), which characterizes NBO, the NCPI of points 1–2 is light blue, indicating that the water is MBO. The NCPI of points 2–8 shows other colors, which indicate SBO. From the results displayed by the actual values (the values on the sampling points), the riverway is NBO from the 13th point to the end. Most of the points on the inversion graph from the 13th point are basically dark blue, indicating that most of these points are NBO, which is consistent with the results of the real value representation. According to the Shahu Port sampling point distribution (Figure 1) and the site photos, there is a drain with a very large discharge between the 12th and 13th points. With this drain as the dividing line, the front area is SBO and the back area is NBO. After verification, it was confirmed that the drain was connected to the water plant. The manual judgment result may have been affected due to the addition of chlorine to the water during the disinfection process. After the chlorine was dissolved in water, it would react with AN in the riverway, resulting in a significant decrease in AN concentration (AN drops from 2.75 mg/L to less than 1.5 mg/L). At the same time, it may affect the determination of ORP. In addition, due to the large amount of water discharged into the riverway to supplement the oxygen, the DO concentration increased from the initial 0.05 mg/L to more than 2.5 mg/L.

In Figure 1, the 18th and 26th points are the drainage outlets. For the 18th point especially, a lot of oil floats on the surface of the water. It can be seen from the inversion graph that the pollution levels of the two outlets are significantly higher than the surrounding area. In addition, near sampling points No. 36 and No. 37, it was found that the pollution index increased significantly from below 1 to above 2. However, no obvious sewage outlets were found in the scene. Therefore, it can be speculated that this is caused by the impact of suspected nearby dumping or sewage flowing into the river.

For the second study area, Xunsi River, from Figure 15, according to the distribution graph of NCPI, the minimum predicted value is 1.27. It can be seen that, for the pixel inversion, some areas indicate the characteristics of MBO. The MBO in the picture is shown in light blue; however, most areas still show SBO. From the measured results (Figure 11), the pollution index values of the points showing MBO are also high, and close to the limit of SBO. Overall, points 1–9 are less polluted. The marginal area near point 9 is blocked by floating duckweed and a bridge pier, and the pollution level in this area is thus significantly increased. According to the measured results, the NCPI values of sampling points No. 8 and No. 9 are, respectively, 6.5 and 4.9, which are completely consistent with the inversion results. Further down the riverway, it is found that the NCPI continues to rise. The pollution coefficients at points 15 (1.73) and 16 (2.19) are low, and the pollution index at point 17 (4.36) is significantly increased. Combined with Figure 11, from the 17th point, the pollution index continues to rise to above 7, and some red pixels show the SBO phenomenon, with values of 8 or higher. By carefully observing sampling points 22 to 24 from the inversion graph, a blue area appears near the right side of the duckweed, indicating a decrease in the degree of pollution. It is speculated that this may be due to the improvement of the water environment by aquatic plants floating on the surface. By points 37 (1.66) and 38 (1.14), which are downstream of the river bend, there is no deposition of pollutants, so the pollution index drops significantly. The red rectangle on the figure indicates where the images are mosaiced. Although there is a clear dividing line, the trend of the spatial distribution of the NCPI is still very obvious.

4. Conclusions

In this study, we used UAV-borne HRS images to monitor urban black-odor water. The methodology part of the paper describes the whole process of UAV-borne hyperspectral imagery applied to water remote sensing, including the optical geometry of the ground-spectrum acquisition, the flight conditions, and the image preprocessing. The band ratio method was used for the spectral preprocessing, and the Pearson correlation coefficients were significantly improved. The evaluation index for the inversion accuracy was the

a d j u s t e d_R^{2}

method, which can be compared for different datasets. The feasibility of the use of the NCPI instead of a single-index inversion method in the evaluation of urban black-odor water was also discussed. Six regression models of GBDTR, MLPR, RFR, SVR, OLSR, and KRR were evaluated in the experiments. The applicable models for the current scene were then compared and analyzed from the three angles of inversion accuracy, computation time, and the single-pixel inversion results. The GBDTR model obtained the highest inversion accuracy in both study areas, the calculation time was acceptable, and the inversion results on the image were ideal. In the first study area, the training and test dataset

a d j u s t e d_R^{2}

values both reached 0.97 or higher, the RMSE was 0.41 and 0.48, respectively, and the computation time for the whole image was 178 s. The

a d j u s t e d_R^{2}

of the second study area reached 0.93 or more, and the computation time for the whole image was 785 s. For the two study areas, the next best model was RFR. Regardless of the inversion accuracy or the inversion result on the UAV-borne images, RFR performs slightly worse than GBDTR, but it is close. However, its operation time is too long, and the inversion time in the second study area reached 27316 s, which is close to seven and a half hours. This is not acceptable for engineering applications, especially for the processing of hyperspectral imagery over a wide range. In actual projects, we may have to deal with dozens of hyperspectral images or tens of Tb data. Therefore, it is more effective to select the GDBTR model with shorter running time. The other models performed less well in terms of training data accuracy and inversion on the imagery.

From the above discussion, it was found that both GBDT and RF could achieve good results in the field of quantitative remote sensing of water based on UAV-borne hyperspectral imagery. Both methods belong to the ensemble algorithm family, and the base learner uses the decision tree. The difference is that the core of GBDT is to fit the loss function gradient through the base learner, while the core of RF is self-sampling and random attribution. Therefore, in the future, based on the quantitative inversion of UAV-borne hyperspectral imagery, we will continue to explore in this direction, including the use of AdaBoost, XGBoost, and the latest algorithms for bagging and boosting. In addition, in this paper, we also discussed the application of MLPR, and its advanced model deep neural network is a popular topic in the current industry and academia. With regard to the direction of future research, we will also try more generalized deep learning methods combined with UAV-borne hyperspectral imagery to explore its application in the field of water remote sensing. Based on the performance of GBDT in this study, we suggest that this approach could be applied to other black-odor water monitoring applications.

Author Contributions

L.W. and C.H. were responsible for the overall design of the study. C.H. collected the datasets, performed all the experiments, and drafted the manuscript. Z.W. (Zhou Wang) and X.Z. made the figures. Z.W. (Zhengxiang Wang) and L.C. helped collect the datasets. All authors read and approved the final manuscript.

Funding

This research was funded by the “National Key Research and Development Program of China” (2017YFB0504202), the “National Natural Science Foundation of China” (41622107), The central government guides local science and technology development projects(Ecological Remote Sensing Monitoring and Wetland Restoration in the Yangtze River Basin),the “Special projects for technological innovation in Hubei” (2018ABA078), the “Open Fund of Key Laboratory of Ministry of Education for Spatial Data Mining and Information Sharing” (2018LSDMIS05), the “Open Fund of the State Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University” (18R02) and the “Open fund of Key Laboratory of Agricultural Remote Sensing of the Ministry of Agriculture” (20170007).

Acknowledgments

The Intelligent Data Extraction and Remote Sensing Analysis Group of Wuhan University (RSIDEA) provided the datasets. The Remote Sensing Monitoring and Evaluation of Ecological Intelligence Group (RSMEEI) helped to process the datasets.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, J.; Liu, X.D.; Lu, J. Urban river pollution control and remediation. Procedia Environ. Sci. 2012, 13, 1856–1862. [Google Scholar] [CrossRef]
He, D.F.; Chen, R.R.; Zhu, E.H.; Chen, N.; Yang, B.; Shi, H.H.; Huang, M.S. Toxicity bioassays for water from black-odor rivers in Wenzhou, China. Environ. Sci. Pollut. Res. 2015, 22, 1731–1741. [Google Scholar]
Xue, W.; Tao, T.; Yang, J.; Wu, J.; Duan, S. Summary on ecological treatment of urban river. Sci. Soil Water Conserv. 2008, 6, 106–111. [Google Scholar]
Diaz, R.J.; Rosenberg, R. Spreading dead zones and consequences for marine ecosystems. Science 2008, 321, 926–929. [Google Scholar] [CrossRef] [PubMed]
Pucciarelli, S.; Buonanno, F.; Pellegrini, G.; Pozzi, S.; Ballarini, P.; Miceli, C. Biomonitoring of Lake Garda: Identification of ciliate species and symbiotic algae responsible for the “black-spot” bloom during the summer of 2004. Environ. Res. 2008, 107, 194–200. [Google Scholar] [CrossRef] [PubMed]
Shen, Q.; Liu, C.; Zhou, Q.; Shang, J.; Zhang, L.; Fan, C. Effects of physical and chemical characteristics of surface sediments in the formation of shallow lake algae-induced black bloom. J. Environ. Sci. 2013, 25, 2353–2360. [Google Scholar] [CrossRef]
Chen, G.; Luo, J.; Zhang, C.; Jiang, L.; Tian, L.; Guangping, C. Characteristics and influencing factors of spatial differentiation of urban black and odorous waters in China. Sustainability 2018, 10, 4747. [Google Scholar] [CrossRef]
Alp, E.; Melching, C.S. Allocation of supplementary aeration stations in the Chicago waterway system for dissolved oxygen improvement. J. Environ. Manag. 2011, 92, 1577–1583. [Google Scholar] [CrossRef]
Noblet, J.; Schweitzer, L.; Ibrahim, E.; Stolzenbach, K.D.; Zhou, L.; Suffet, I.H. Evaluation of a taste and odor incident on the Ohio River. Water Sci. Technol. 1999, 40, 185–193. [Google Scholar] [CrossRef]
Peng, W.; Yisong, W.; Pingfang, Z.; Mei, L.; Jingya, S.; Fusheng, Z. Analysis of formation and mechanisms of black and smelly river water in island cities. Meteorol. Environ. Res. 2018, 9, 42–48. [Google Scholar]
Romano, A.H.; Safferman, R.S. Studies on actinomycetes and their odors. J. Am. Water Work. Assoc. 1963, 55, 169–176. [Google Scholar] [CrossRef]
Battin, T.J. Dissolved organic matter and its optical properties in a blackwater tributary of the upper Orinoco river, Venezuela. Org. Geochem. 1998, 28, 561–569. [Google Scholar] [CrossRef]
Berthon, J.-F.; Zibordi, G. Optically black waters in the northern Baltic Sea. Geophys. Res. Lett. 2010, 37. [Google Scholar] [CrossRef]
Peter, A.; Köster, O.; Schildknecht, A.; von Gunten, U. Occurrence of dissolved and particle-bound taste and odor compounds in Swiss lake waters. Water Res. 2009, 43, 2191–2200. [Google Scholar] [CrossRef] [PubMed]
Salem, S.I.; Strand, M.H.; Higa, H.; Kim, H.; Kazuhiro, K.; Oki, K.; Oki, T.; Salem, S.I.; Strand, M.H.; Higa, H. Evaluation of MERIS chlorophyll-a retrieval processors in a complex turbid lake Kasumigaura over a 10-year mission. Remote Sens. 2017, 9, 1022. [Google Scholar] [CrossRef]
Duan, H.; Ma, R.; Loiselle, S.A.; Shen, Q.; Yin, H.; Zhang, Y. Optical characterization of black water blooms in eutrophic waters. Sci. Total Environ. 2014, 482–483, 174–183. [Google Scholar] [CrossRef]
Shuang, W.; Qiao, W.; Yun-Mei, L.I.; Li, Z.; Heng, L.; Lei, S.H.; Ding, X.L.; Song, M. Remote sensing identification of urban black-odor water bodies based on high-resolution images: A case study in Nanjing. Environ. Sci. 2018, 39, 57–67. [Google Scholar]
Lei, Z.; Bing, Z.; Junsheng, L.; Qian, S.; Fangfang, Z.; Ganlin, W. A study on retrieval algorithm of black water aggregation in Taihu Lake based on HJ-1 satellite images. In IOP Conference Series: Earth and Environmental Science; IOP Publishing: Bristol, UK, 2014; Volume 17. [Google Scholar]
Ministry of Housing and Urban-Rural Development of China. The Guideline for Urban Black and Odorous Water Treatment; Ministry of Housing and Urban-Rural Development of China: Beijing, China, 2015. (In Chinese)
Nemerow, N.L. (Ed.) Stream, Lake, Estuary, and Ocean Pollution; Van Nostrand Reinhold Publishing Co.: New York, NY, USA, 1991; pp. 0–472. [Google Scholar]
Brady, J.P.; Ayoko, G.A.; Martens, W.N.; Goonetilleke, A. Development of a hybrid pollution index for heavy metals in marine and estuarine sediments. Environ. Monit. Assess. 2015, 187, 306. [Google Scholar] [CrossRef]
Liu, X.; Heilig, G.K.; Chen, J.; Heino, M. Interactions between economic growth and environmental quality in Shenzhen, China’s first special economic zone. Ecol. Econ. 2007, 62, 559–570. [Google Scholar] [CrossRef]
Arroyo-Mora, J.; Kalacska, M.; Inamdar, D.; Soffer, R.; Lucanus, O.; Gorman, J.; Naprstek, T.; Schaaf, E.; Ifimov, G.; Elmer, K.; et al. Implementation of a UAV–hyperspectral pushbroom imager for ecological monitoring. Drones 2019, 3, 12. [Google Scholar] [CrossRef]
Houborg, R.; Soegaard, H.; Boegh, E. Combining vegetation index and model inversion methods for the extraction of key vegetation biophysical parameters using Terra and Aqua MODIS reflectance data. Remote Sens. Environ. 2007, 106, 39–58. [Google Scholar] [CrossRef]
Mueller, J.L.; Fargion, G.S.; McClain, C.R.; Mueller, J.L.; Morel, A.; Frouin, R.; Davis, C.; Arnone, R.; Carder, K.; Steward, R.G.; et al. Ocean Optics Protocols for Satellite Ocean Color Sensor Validation, Revision 4, Volume III: Radiometric Measurements and Data Analysis Protocols; NASA Goddard Space Flight Center: Greenbelt, MD, USA, 2003. [Google Scholar]
Niroumand-Jadidi, M.; Pahlevan, N.; Vitti, A. Mapping substrate types and compositions in shallow streams. Remote Sens. 2019, 11, 262. [Google Scholar] [CrossRef]
Lee, Z.; Carder, K.L.; Steward, R.G.; Peacock, T.G.; Davis, C.O.; Mueller, J.L. Remote sensing reflectance and inherent optical properties of oceanic waters derived from above-water measurements. In Ocean Optics XIII; International Society for Optics and Photonics: Bellingham, WA, USA, 1997; Volume 2963, pp. 160–166. [Google Scholar]
Mobley, C.D. Estimation of the remote-sensing reflectance from above-surface measurements. Appl. Opt. 1999, 38, 7442–7455. [Google Scholar] [CrossRef] [PubMed]
Rhea, W.J.; Davis, C.O. A comparison of the SeaWiFS chlorophyll and CZCS pigment algorithms using optical data from the 1992 JGOFS Equatorial Pacific Time Series. Deep Sea Res. Part II Top. Stud. Oceanogr. 1997, 44, 1907–1925. [Google Scholar] [CrossRef]
Watanabe, F.; Alcântara, E.; Rodrigues, T.; Imai, N.; Barbosa, C.; Rotta, L. Estimation of chlorophyll-a concentration and the trophic state of the Barra Bonita hydroelectric reservoir using OLI/Landsat-8 images. Int. J. Environ. Res. Public Health 2015, 12, 10391–10417. [Google Scholar] [CrossRef] [PubMed]
Zhong, Y.; Wang, X.; Xu, Y.; Wang, S.; Jia, T.; Hu, X.; Zhao, J.; Wei, L.; Zhang, L. Mini-UAV-borne hyperspectral remote sensing: From observation and processing to applications. IEEE Geosci. Remote Sens. Mag. 2018, 6, 46–62. [Google Scholar] [CrossRef]
Ben-Dor, E.; Kindel, B.; Goetz, A.F.H. Quality assessment of several methods to recover surface reflectance using synthetic imaging spectroscopy data. Remote Sens. Environ. 2004, 90, 389–404. [Google Scholar] [CrossRef]
Smith, G.; Milton, E. The use of the empirical line method to calibrate remotely sensed data to reflectance. Int. J. Remote Sens. 1999, 20, 2653–2662. [Google Scholar] [CrossRef]
Malenovský, Z.; Lucieer, A.; King, D.H.; Turnbull, J.D.; Robinson, S.A. Unmanned aircraft system advances health mapping of fragile polar vegetation. Methods Ecol. Evol. 2017, 8, 1842–1857. [Google Scholar] [CrossRef]
Niroumand-Jadidi, M.; Vitti, A.; Lyzenga, D.R. Multiple Optimal Depth Predictors Analysis (MODPA) for river bathymetry: Findings from spectroradiometry, simulations, and satellite imagery. Remote Sens. Environ. 2018, 218, 132–147. [Google Scholar] [CrossRef]
Niroumand-Jadidi, M.; Vitti, A. Reconstruction of river boundaries at sub-pixel resolution: Estimation and spatial allocation of water fractions. ISPRS Int. J. Geo Inf. 2017, 6, 383. [Google Scholar] [CrossRef]
Pulliainen, J.; Kallio, K.; Eloheimo, K.; Koponen, S.; Servomaa, H.; Hannonen, T.; Tauriainen, S.; Hallikainen, M. A semi-operative approach to lake water quality retrieval from remote sensing data. Sci. Total Environ. 2001, 268, 79–93. [Google Scholar] [CrossRef]
Legleiter, C.J.; Roberts, D.A.; Lawrence, R.L. Spectrally based remote sensing of river bathymetry. Earth Surf. Process. Landf. 2009, 34, 1039–1059. [Google Scholar] [CrossRef]
Niroumand-Jadidi, M.; Vitti, A. optimal band ratio analysis of worldview-3 imagery for bathymetry of shallow rivers (case study: Sarca River, Italy). Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, XLI-B8, 361–364. [Google Scholar] [CrossRef]
Legleiter, C.J.; Stegman, T.K.; Overstreet, B.T. Spectrally based mapping of riverbed composition. Geomorphology 2016, 264, 61–79. [Google Scholar] [CrossRef]
Bekhet, H.A.; Yasmin, T. Exploring EKC, trends of growth patterns and air pollutants concentration level in Malaysia: A nemerow index approach. In IOP Conference Series: Earth and Environmental Science; IOP Publishing: Bristol, UK, 2013; Volume 16. [Google Scholar]
Guan, Y.; Shao, C.; Ju, M. Heavy metal contamination assessment and partition for industrial and mining gathering areas. IJERPH 2014, 11, 7286–7303. [Google Scholar] [CrossRef] [PubMed]
Bi, S.; Yang, Y.; Xu, C.; Zhang, Y.; Zhang, X.; Zhang, X. Distribution of heavy metals and environmental assessment of surface sediment of typical estuaries in eastern China. Mar. Pollut. Bull. 2017, 121, 357–366. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Wang, H.; Meng, Y.; Yin, P.; Hua, J. A model-driven method for quality reviews detection: An ensemble model of feature selection. In Proceedings of the Wuhan International Conference on E-Business, Wuhan, China, 27–29 May 2016; p. 2. [Google Scholar]
Yuan, X.; Abouelenien, M. A multi-class boosting method for learning from imbalanced data. IJGCRSIS 2015, 4, 13–29. [Google Scholar] [CrossRef]
Wei, L.; Yuan, Z.; Yu, M.; Huang, C.; Cao, L. Estimation of arsenic content in soil based on laboratory and field reflectance spectroscopy. Sensors 2019, 19, 3904. [Google Scholar] [CrossRef]
Zou, Y.; Ding, Y.; Tang, J.; Guo, F.; Peng, L. FKRR-MVSF: A fuzzy kernel ridge regression model for identifying DNA-binding proteins by multi-view sequence features via Chou’s five-step rule. Int. J. Mol. Sci. 2019, 20, 4175. [Google Scholar] [CrossRef] [PubMed]
Ebrahimi, M.; Mohammadi-Dehcheshmeh, M.; Ebrahimie, E.; Petrovski, K.R. Comprehensive analysis of machine learning models for prediction of sub-clinical mastitis: Deep learning and gradient-boosted trees outperform other models. Comput. Biol. Med. 2019, 114, 103456. [Google Scholar] [CrossRef] [PubMed]
Huang, Z.; Yi, K. Communication-efficient weighted sampling and quantile summary for GBDT. arXiv 2019, arXiv:1909.07633. [Google Scholar]
Wei, Z.; Meng, Y.; Zhang, W.; Peng, J.; Meng, L. Downscaling SMAP soil moisture estimation with gradient boosting decision tree regression over the Tibetan Plateau. Remote Sens. Environ. 2019, 225, 30–44. [Google Scholar] [CrossRef]
Hafeez, S.; Wong, M.; Ho, H.; Nazeer, M.; Nichol, J.; Abbas, S.; Tang, D.; Lee, K.; Pun, L. Comparison of machine learning algorithms for retrieval of water quality indicators in case-II waters: A case study of Hong Kong. Remote Sens. 2019, 11, 617. [Google Scholar] [CrossRef]
Wang, Z.; Kawamura, K.; Sakuno, Y.; Fan, X.; Gong, Z.; Lim, J. Retrieval of chlorophyll-a and total suspended solids using iterative stepwise elimination partial least squares (ISE-PLS) regression based on field hyperspectral measurements in irrigation ponds in Higashihiroshima, Japan. Remote Sens. 2017, 9, 264. [Google Scholar] [CrossRef]
Shen, X.; Cao, L.; Chen, D.; Sun, Y.; Wang, G.; Ruan, H. Prediction of forest structural parameters using airborne full-waveform LiDAR and hyperspectral data in subtropical forests. Remote Sens. 2018, 10, 1729. [Google Scholar] [CrossRef]

Figure 1. Sampling stations in Shahu Port.

Figure 2. Sampling stations in the Xunsi River.

Figure 3. The in-situ measurements of spectral data using ASD device. (a) standard board correction; (b) spectra above the water surface; (c) sky spectra; (d) water-leaving spectra in Shahu Port; (e) water-leaving spectra in Xunsi River.

Figure 4. Preprocessing of the UAV-borne HRS images.

Figure 5. Pearson correlation coefficients with the Nemerow index. (a) Original spectra for Shahu Port. (b) Band ratio for Shahu Port. (c) Original spectra for Xunsi River. (d) Band ratio for Xunsi River.

Figure 6. Dimensionless functions of Secchi depth (SD), dissolved oxygen (DO), oxidation-reduction potential (ORP), and ammonia nitrogen (AN).

Figure 7. The change in adjusted_R² based on gradient boosting decision tree (GBDT) as the number of iterations increases. (a) Shahu Port: learning rate = 0.01, subsample = 0.5. (b) Shahu Port: learning rate = 0.001, subsample = 0.5. (c) Shahu Port: learning rate = 0.0001, subsample = 1. (d) Xunsi River: learning rate = 0.01, subsample = 0.5. (e) Xunsi River: learning rate = 0.001, subsample = 0.5. (f) Xunsi River: learning rate = 0.0001, subsample = 1.

Figure 8. Statistical results of the pollution level and the Nemerow index for the Shahu Port dataset. (a) line chart; (b) spatial distribution of sampling points upstream; (c) spatial distribution of sampling points downstream.

Figure 9. The scatter plots for the estimated and in-situ values in Shahu Port. (a) GBDTR; (b) MLPR; (c) RFR; (d) SVR; (e) OLSR; (f) KRR.

Figure 10. Spatial distribution of the Nemerow index based on the different models in the Shahu Port dataset: (a,d) GBDTR, (b,e) RFR, (c,f) SVR.

Figure 11. Statistical results of the pollution level and the Nemerow index for the Xunsi River dataset. (a) line chart; (b) spatial distribution of sampling points upstream; (c) spatial distribution of sampling points downstream.

Figure 12. The scatter plots for the estimated and in-situ values in Xunsi River. (a) GBDTR; (b) MLPR; (c) RFR; (d) SVR; (e) OLSR; (f) KRR.

Figure 13. Spatial distribution of the Nemerow index based on the different models in the Xunsi River dataset: (a) GBDTR, (b) RFR, (c) MLPR.

Figure 14. Inversion results for the Nemerow index based on GBDTR in the Shahu Port dataset.

Figure 15. Inversion results for the Nemerow index based on GBDTR in the Xunsi River dataset.

Table 1. Classification standard of the pollution levels of urban black-odor water.

Characteristic Indicator	Mild	Severe
SD (cm)	25–10	<10
DO (mg/L)	0.2–2.0	<0.2
ORP (mV)	−200–50	<−200
AN (mg/L)	8.0–15	>15

Table 2. of the retrieval results of the different regression models for the Shahu Port dataset.

Modeling Method	Training Data			Test Data
Modeling Method	Adjusted_R²	RMSE	MAPE	Adjusted_R²	RMSE	MAPE
GBDTR	0.978	0.41	24.69%	0.974	0.48	18.96%
MLPR	0.823	1.18	53.78%	0.809	1.31	44.25%
RFR	0.968	0.50	17.11%	0.962	0.58	9.41%
SVR	0.956	0.59	22.11%	0.954	0.64	13.72%
OLSR	0.970	0.48	43.69%	0.506	2.11	145.11%
KRR	0.850	1.09	76.94%	0.850	1.16	98.36%

Table 3. Statistical information of the UAV-borne image inversion based on the different models in the Shahu Port dataset.

Modeling Method	Computation Time (s)	Min Value	Max Value
—	—	0.59	8.61
GBDTR	178	0.48	7.71
RFR	4010	0.62	7.35
SVR	43	−0.45	12.04

Table 4. Comparison of the retrieval results of the different regression models for the Xunsi River dataset.

Modeling Method	Training Data			Test Data
Modeling Method	Adjusted_R²	RMSE	MAPE	Adjusted_R²	RMSE	MAPE
GBDTR	0.938	0.64	16.59%	0.936	0.63	16.97%
MLPR	0.924	0.71	12.59%	0.921	0.70	20.92%
RFR	0.900	0.82	20.34%	0.926	0.67	19.63%
SVR	0.781	1.21	14.25%	0.783	1.16	21.56%
OLSR	1.000	6.46e−13	1.71e−11%	0	4.30	81.84%
KRR	0.794	1.17	29.61%	0.781	1.16	43.29%

Table 5. Statistical information of the UAV-borne image inversion based on the different models in the Xunsi River dataset.

Modeling Method	Computing Time (s)	Max Value	Min Value
—	—	8.62	1.14
GBDTR	785	8.38	1.27
RFR	27316	8.12	1.81
MLPR	174	6.75	−10.27

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wei, L.; Huang, C.; Wang, Z.; Wang, Z.; Zhou, X.; Cao, L. Monitoring of Urban Black-Odor Water Based on Nemerow Index and Gradient Boosting Decision Tree Regression Using UAV-Borne Hyperspectral Imagery. Remote Sens. 2019, 11, 2402. https://doi.org/10.3390/rs11202402

AMA Style

Wei L, Huang C, Wang Z, Wang Z, Zhou X, Cao L. Monitoring of Urban Black-Odor Water Based on Nemerow Index and Gradient Boosting Decision Tree Regression Using UAV-Borne Hyperspectral Imagery. Remote Sensing. 2019; 11(20):2402. https://doi.org/10.3390/rs11202402

Chicago/Turabian Style

Wei, Lifei, Can Huang, Zhengxiang Wang, Zhou Wang, Xiaocheng Zhou, and Liqin Cao. 2019. "Monitoring of Urban Black-Odor Water Based on Nemerow Index and Gradient Boosting Decision Tree Regression Using UAV-Borne Hyperspectral Imagery" Remote Sensing 11, no. 20: 2402. https://doi.org/10.3390/rs11202402

APA Style

Wei, L., Huang, C., Wang, Z., Wang, Z., Zhou, X., & Cao, L. (2019). Monitoring of Urban Black-Odor Water Based on Nemerow Index and Gradient Boosting Decision Tree Regression Using UAV-Borne Hyperspectral Imagery. Remote Sensing, 11(20), 2402. https://doi.org/10.3390/rs11202402

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Monitoring of Urban Black-Odor Water Based on Nemerow Index and Gradient Boosting Decision Tree Regression Using UAV-Borne Hyperspectral Imagery

Abstract

1. Introduction

2. Methodology

2.1. Study Area

2.2. In Situ Data and Spectra Collection

2.3. Airborne Hyperspectral Imagery and Preprocessing

2.4. Spectral Data Preprocessing

2.5. Modeling Approaches

2.5.1. Nemerow Comprehensive Pollution Index

2.5.2. Gradient Boosting Decision Tree and Other Models

2.5.3. Statistical Analysis

3. Results and Discussion

3.1. Gradient Boosting Decision Tree

3.2. First Dataset: Shahu Port

3.2.1. Model Optimization and Accuracy Evaluation

3.2.2. UAV-Borne Image Inversion Based on the Different Models

3.3. Second Dataset: Xunsi River

3.3.1. Model Optimization and Accuracy Evaluation

3.3.2. UAV-Borne Image Inversion Based on the Different Models

3.4. UAV-Borne Image Inversion Based on the Gradient Boosting Decision Tree Regression Model

4. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI