Research on the Characteristic Spectral Band Determination for Water Quality Parameters Retrieval Based on Satellite Hyperspectral Data

: Hyperspectral remote sensing technology has been widely used in water quality monitoring. However, while it provides more detailed spectral information for water quality monitoring, it also gives rise to issues such as data redundancy, complex data processing


Introduction
Remote sensing has the advantage of large-scale, all-weather accurate dynamic monitoring, and has been widely used in the water conservancy industry.Remote sensing of the water environment was born, and its development has kept pace with the times [1].Changes in the composition and concentration of substances in the water often cause changes in the color of the water body [2].Remote sensing technology of spectral imaging can obtain the color parameters of the water body by obtaining the spectral characteristics of the water body, and then inverse the water quality parameters, so as to realize the water environment monitoring of rivers and lakes [3,4].
Most applications of remote sensing monitoring technology and spectral imaging in the field of water environment monitoring can be summarized in three steps: remote sensing data acquisition, data processing and inversion model construction, and model analysis and application.The available data sources for remote sensing of water quality retrieval are usually multispectral and hyperspectral remote sensors [5] categorized by the spectral resolution of the sensors, carried by the spaceborne, airborne, and portable and groundbased load platforms [6].Multispectral data available for remote sensing water quality retrieval typically have 3-10 bands.Landsat series data are the most commonly utilized for water quality monitoring, such as TSM, COD, and TP, due to their accessibility and geographic, temporal, and spectral resolution [7,8].Hyperspectral satellites have multiple bands with about 5-10 nm spectral resolution.For retrieving water quality, hyperspectral data from satellites such as the HuanJing-1 (HJ-1) [9], Gaofen-5 (GF-5) [10], and Ziyuan1-02D (ZY1-02D) [11] have been employed.Higher spectral resolution data have a large number of bands that can be precisely and optimally chosen for developing inversion models of water quality parameters to differentiate the spectral differences in multispectral data, greatly enhancing the accuracy of inversion algorithms [12][13][14].Among the four spectral sensor platforms for the water quality monitoring, the portable and ground-based spectrometer [4] is less flexible and more labor-intensive; the airborne spectrometer [5,6] is flexible and has high spatial resolution, but the observation area is small; and the satellitebased spectrometer [8] has low imagery cost and is suitable for large-scale monitoring, but it has the disadvantages of low spatial resolution, poor timeliness, and long revisit cycle.
In terms of model construction for water quality parameter retrieval, it is mainly divided into the empirical method, analytical method (also called bio-optical method [6]), artificial intelligence (AI) method, and combined empirical and analytical methods.The fourth method is called the semi-empirical model or semi-analytical model, while in some studies they were listed together [6].The empirical method relies only on the statistical relationship between remote sensing data and measured water quality parameters to establish models [15][16][17][18].The principle of the model is simple and the accuracy of the results is high, but the generality of the results is low.The analysis model is based on the principle of water radiative transfer, and the content of each component in the water is calculated from the remote sensing data through the transfer formula [19].For example, Dekker et al. [20] estimated the water color parameters by building a physical analysis model based on Landsat TM data and the measured intrinsic optical quantity of the water body.Sudduth et al. [12] used airborne hyperspectral data images of the major rivers in Minnesota to establish an analysis model, which was based on the intrinsic optical quantity of the water body and the apparent optical quantity calculated from the spectral data, thereby retrieving the suspended solids concentration of the river and noting that 700 nm is the best band for measuring the suspended solids concentration in the study area.This type of model has good universality and high precision, but the model is difficult to fit.The physical meaning of the spectral index used in the semi-empirical model is clearer than that used in the empirical method.Taking the concentration of chlorophyll-a as an example, the semi-empirical model method is to propose the spectral index related to its concentration based on certain assumptions related to the bio-optical theoretical model [21], and establish a statistical relationship to realize the inversion of the water quality parameters [22,23].The construction of the analytical model depends on the complex water radiative transfer model, and the existing analytical models almost need to introduce empirical formulas to determine some parameters.Generally, the analytical model with empirical formulas is called the semi-analytical model [24,25], which requires the measured absorption coefficient and other intrinsic optical parameters.The AI model effectively trains a large number of reflectivity and water quality parameters by using an AI algorithm, and automatically learns the nonlinear relationship between the two through the network to realize the prediction of water quality parameters [5,[26][27][28].
The core of modeling various water quality parameters is band selection and band combination [29].In the past, multispectral data were the main data used for water environment remote sensing, and only a few bands could not accurately obtain the spectral information of different water quality parameters.In recent years, the number of satellites equipped with hyperspectral imagers has gradually increased, such as ZY1-02D [11], GF-5 [30], and the Hyperspectral Precursor and Application Mission (PRISMA) [31].The on-board hyperspectral camera provides a broad data source for hyperspectral data acquisition, and hyperspectral remote sensing data also provides more bands for model building [32].The spectral curve for water elements also obtained provides a basis for analyzing the physical meaning of bands, and can help researchers to establish a more accurate semi-empirical model with clearer physical meaning.However, the number of hyperspectral remote sensing bands is usually hundreds.At this stage, the band utilization efficiency of the semi-empirical inversion model is generally low.In general, the establishment of the target water quality parameter inversion model can only be completed with data within four bands, resulting in new problems such as hyperspectral data redundancy and complex processing.Therefore, based on the analysis of spectral characteristics of water elements, the band or band combination can be reasonably selected, and a superior semi-empirical retrieval model of hyperspectral water elements can be constructed to guide the customization of the multispectral camera, so as to realize the same precision inversion of water quality parameters based on the customized multispectral camera.At the same time, there are still some limitations in spaceborne hyperspectral technology, such as the spatial resolution, which creates difficulties in monitoring small lakes, reservoirs, and other small water areas, and there are restrictions on the acquisition of some commercial satellite data.Unmanned aerial vehicles (UAVs) [33] have the advantages of low cost, simple operation, high spatial resolution, and easy realization of scanning and imaging.It is convenient for field operation to patrol the water environment, which can effectively overcome this disadvantage.In summary, for the retrieval of water quality parameters, hyperspectral data have the disadvantages of low spatial resolution, excessive data redundancy, low data utilization, and only a few bands are used in the inversion model.Therefore, it is necessary to determine the method of identifying the characteristic bands of the satellite hyperspectral data inversion model, and realize monitoring at the same accuracy with multispectral data as the hyperspectral data.The follow-up research can be used to guide the effective band customization of the multispectral lens used for UAVs, and realize high-resolution, efficient, flexible, fast and low-cost water environment inspection of small water areas by combining the flexibility of UAVs, the characteristics of efficient data utilization, and the low cost of a multispectral lens.
Li et al. [32] utilized hyperspectral data from the Gaofen-5 satellite and employed machine learning methods to comprehensively characterize the features of the hyperspectral data through the combination of multispectral-scale morphological features.They investigated the relationship between the hyperspectral data and water quality parameters, and established a retrieval model.However, this study did not explore efficient utilization methods for hyperspectral data or redundant data removal techniques.Zheng et al. [11], on the other hand, used hyperspectral data from the ZY1-02D satellite and developed a water quality inversion model using machine learning techniques.They utilized empirical parameters and the ratio between two bands as inputs.The focus of their study was primarily on examining the impact of different machine learning methods and comparing the performance of ZY1-02D satellite hyperspectral data with Sentinel-2 multispectral data.Although both studies employed machine learning (AI) methods to construct water quality parameter inversion models, they did not specifically investigate characteristic spectral bands for water quality parameters.These characteristic bands are often derived through empirical models and have certain limitations.
Therefore, this paper aims to combine the advantages of AI algorithms and the explicit concept of characteristic spectral bands for water quality parameters.Based on hyperspectral data from the ZY1-02D satellite, the objective of this paper is to compare the model performance for retrieving various water quality parameters between the typical empirical method and artificial neural network (ANN) method by using different spectral band sets with different band numbers, so as to provide a guidance approach for the determination of characteristic bands of different water quality parameters.This article first introduces the research area and data sources together with the matching method for quality parameters and satellite hyperspectral data to collect data.Then, the preprocessing methods for water quality data and hyperspectral data are introduced.Then, the method for determining the characteristic spectral bands of water quality parameters is highlighted and derived from the correlation of band reflectance and the regression model methods based on empirical and ANN methods.The Results section introduces the high correlation of the two bands and the reduced computational complexity, along with the results of the empirical methodbased model and the ANN-based model.The Discussion section provides recommended values for customizing the characteristic bands and directions for improving the model.Figure 1 is the overall technical flowchart for this work.
ods to construct water quality parameter inversion models, they did not specifically investigate characteristic spectral bands for water quality parameters.These characteristic bands are often derived through empirical models and have certain limitations.
Therefore, this paper aims to combine the advantages of AI algorithms and the explicit concept of characteristic spectral bands for water quality parameters.Based on hyperspectral data from the ZY1-02D satellite, the objective of this paper is to compare the model performance for retrieving various water quality parameters between the typical empirical method and artificial neural network (ANN) method by using different spectral band sets with different band numbers, so as to provide a guidance approach for the determination of characteristic bands of different water quality parameters.This article first introduces the research area and data sources together with the matching method for quality parameters and satellite hyperspectral data to collect data.Then, the preprocessing methods for water quality data and hyperspectral data are introduced.Then, the method for determining the characteristic spectral bands of water quality parameters is highlighted and derived from the correlation of band reflectance and the regression model methods based on empirical and ANN methods.The Results section introduces the high correlation of the two bands and the reduced computational complexity, along with the results of the empirical method-based model and the ANN-based model.The Discussion section provides recommended values for customizing the characteristic bands and directions for improving the model.Figure 1 is the overall technical flowchart for this work.

Study Area
As shown in Figure 2, this study selected the areas around Taihu Lake, such as Suzhou, Wuxi, Shanghai, Jiaxing, and Huzhou, as the region to collect water quality and satellite data.The water network in this region is dense; the distribution of National Surface Water Automatic Monitoring Stations (NSWAMS) stations is concentrated and dense, which can effectively improve the utilization rate for the satellite data.Water quality parameters used in this paper were acquired for the period December 2020 to August 2022.

Study Area
As shown in Figure 2, this study selected the areas around Taihu Lake, such as Suzhou, Wuxi, Shanghai, Jiaxing, and Huzhou, as the region to collect water quality and satellite data.The water network in this region is dense; the distribution of National Surface Water Automatic Monitoring Stations (NSWAMS) stations is concentrated and dense, which can effectively improve the utilization rate for the satellite data.Water quality parameters used in this paper were acquired for the period December 2020 to August 2022.

Data Collection
The water quality parameters are from the National Surface Water Automatic Monitoring Real-Time Data Release System of the China Environmental Monitoring Station.The release scope of this system includes data from the National Surface Water Automatic Monitoring Stations (NSWAMS) system, which was built and officially put into operation.From April 2014 to November 2020, NSWAMS included 134 stations.In December 2020, 1506 new stations were added.In 2021, 365 new stations were added.Currently, there are a total of 2005 stations, which can provide sufficient data support for building regression models of satellite hyperspectral data and water quality parameters.
The water parameters released include water temperature, pH, dissolved oxygen (DO), permanganate index (CODMn), ammonia nitrogen (NH3-N), total phosphorus (TP), total nitrogen (TN), electric conductivity (EC), and turbidity (TUB), a total of 9 monitoring indicators.The data are released every 4 h, which can effectively correspond to the satellite transit at different times.This paper selects 7 water quality parameters as research objects: DO, CODMn, NH3-N, TP, TN, TUB, and EC.
The hyperspectral satellite data comes from the Natural Resources Satellite Remote Sensing Cloud Service Platform, and are obtained from the hyperspectral camera on the ZY1-02D satellite, the Advanced Hyperspectral Imager (AHSI) sensor [11].It has been shown that for the AHSI hyperspectral sensor, the average equivalent reflectance for each band in situ Rrs and the multispectral sensor Multispectral Imager (MSI) are basically the same [11].The satellite carries two cameras that can effectively obtain 9-band multispectral data with 115 km width and 166-band hyperspectral data with 60 km width.Among them, the full spectral resolution can reach 2.5 m; the multispectral is 10 m and the hyperspectral is 30 m.The visible near infrared and shortwave infrared spectral resolution of the hyperspectral payload can reach 10 and 20 nm, respectively.The main parameters of the AHSI are shown in Table 1.

Data Collection
The water quality parameters are from the National Surface Water Automatic Monitoring Real-Time Data Release System of the China Environmental Monitoring Station.The release scope of this system includes data from the National Surface Water Automatic Monitoring Stations (NSWAMS) system, which was built and officially put into operation.From April 2014 to November 2020, NSWAMS included 134 stations.In December 2020, 1506 new stations were added.In 2021, 365 new stations were added.Currently, there are a total of 2005 stations, which can provide sufficient data support for building regression models of satellite hyperspectral data and water quality parameters.
The water parameters released include water temperature, pH, dissolved oxygen (DO), permanganate index (COD Mn ), ammonia nitrogen (NH3-N), total phosphorus (TP), total nitrogen (TN), electric conductivity (EC), and turbidity (TUB), a total of 9 monitoring indicators.The data are released every 4 h, which can effectively correspond to the satellite transit at different times.This paper selects 7 water quality parameters as research objects: DO, COD Mn , NH3-N, TP, TN, TUB, and EC.
The hyperspectral satellite data comes from the Natural Resources Satellite Remote Sensing Cloud Service Platform, and are obtained from the hyperspectral camera on the ZY1-02D satellite, the Advanced Hyperspectral Imager (AHSI) sensor [11].It has been shown that for the AHSI hyperspectral sensor, the average equivalent reflectance for each band in situ R rs and the multispectral sensor Multispectral Imager (MSI) are basically the same [11].The satellite carries two cameras that can effectively obtain 9-band multispectral data with 115 km width and 166-band hyperspectral data with 60 km width.Among them, the full spectral resolution can reach 2.5 m; the multispectral is 10 m and the hyperspectral is 30 m.The visible near infrared and shortwave infrared spectral resolution of the hyperspectral payload can reach 10 and 20 nm, respectively.The main parameters of the AHSI are shown in Table 1.The hyperspectral data from the ZY1-02D satellite have been radiometrically corrected, bad pixels repaired, and spectrally calibrated.To meet the application requirements of quantitative remote sensing, radiometric calibration, atmospheric correction, and orthorectification must be performed [11], as shown in Figure 3.Among them, orthorectification is performed by using the built-in RPC file and Landsat 8 data after terrain correction as a reference image.One example of the ZY1-02D satellite image products is shown in Figure 4, in which the detailed information was listed including the produce time, longitude and latitude, and so on.The hyperspectral data from the ZY1-02D satellite have been radiometrically corrected, bad pixels repaired, and spectrally calibrated.To meet the application requirements of quantitative remote sensing, radiometric calibration, atmospheric correction, and orthorectification must be performed [11], as shown in Figure 3.Among them, orthorectification is performed by using the built-in RPC file and Landsat 8 data after terrain correction as a reference image.One example of the ZY1-02D satellite image products is shown in Figure 4, in which the detailed information was listed including the produce time, longitude and latitude, and so on.

Method
In this section, we introduce the methodology employed in this paper, which includes data matching, data processing, determination of characteristic spectral bands, regression modeling, and model evaluation.The water quality parameters and hyperspectral data do not possess a direct one-to-one correspondence; the data matching method is to establish a consistent correspondence between water quality data collected at the same geographical location and time and the corresponding satellite hyperspectral data.Preprocessing of the water quality parameters and hyperspectral satellite data is necessary to normalize the distribution of the water quality parameters and ensure the hyperspectral data are processed within the spectral range of interest specific to this paper.The determination of characteristic spectral bands represents a crucial innovation in this research, allowing for efficient selection of relevant bands for different water quality parameters through correlation analysis with a regression model.Various regression modeling methods are presented in this section to evaluate and compare their performance in modeling, and to determine the optimal approach for regression modeling between the water quality parameters and calculated reflectance.Model evaluation is The hyperspectral data from the ZY1-02D satellite have been radiometrically corrected, bad pixels repaired, and spectrally calibrated.To meet the application requirements of quantitative remote sensing, radiometric calibration, atmospheric correction, and orthorectification must be performed [11], as shown in Figure 3.Among them, orthorectification is performed by using the built-in RPC file and Landsat 8 data after terrain correction as a reference image.One example of the ZY1-02D satellite image products is shown in Figure 4, in which the detailed information was listed including the produce time, longitude and latitude, and so on.

Method
In this section, we introduce the methodology employed in this paper, which includes data matching, data processing, determination of characteristic spectral bands, regression modeling, and model evaluation.The water quality parameters and hyperspectral data do not possess a direct one-to-one correspondence; the data matching method is to establish a consistent correspondence between water quality data collected at the same geographical location and time and the corresponding satellite hyperspectral data.Preprocessing of the water quality parameters and hyperspectral satellite data is necessary to normalize the distribution of the water quality parameters and ensure the hyperspectral data are processed within the spectral range of interest specific to this paper.The determination of characteristic spectral bands represents a crucial innovation in this research, allowing for efficient selection of relevant bands for different water quality parameters through correlation analysis with a regression model.Various regression modeling methods are presented in this section to evaluate and compare their performance in modeling, and to determine the optimal approach for regression modeling between the water quality parameters and calculated reflectance.Model evaluation is

Method
In this section, we introduce the methodology employed in this paper, which includes data matching, data processing, determination of characteristic spectral bands, regression modeling, and model evaluation.The water quality parameters and hyperspectral data do not possess a direct one-to-one correspondence; the data matching method is to establish a consistent correspondence between water quality data collected at the same geographical location and time and the corresponding satellite hyperspectral data.Preprocessing of the water quality parameters and hyperspectral satellite data is necessary to normalize the distribution of the water quality parameters and ensure the hyperspectral data are processed within the spectral range of interest specific to this paper.The determination of characteristic spectral bands represents a crucial innovation in this research, allowing for efficient selection of relevant bands for different water quality parameters through correlation analysis with a regression model.Various regression modeling methods are presented in this section to evaluate and compare their performance in modeling, and to determine the optimal approach for regression modeling between the water quality parameters and calculated reflectance.Model evaluation is necessary to clearly elaborate the specific parameters used to compare different regression models and to specify the calculation formulas for these comparison metrics.

Data Matching of Water Quality Parameters and Hyperspectral Satellite Data
The data matching method integrates geometric and temporal information to establish correspondence between water quality parameters and hyperspectral satellite data, enabling the spectral characteristics from the satellite data to represent the water quality parameters.
This paper realizes the heterogeneous data matching of water quality monitoring data and ZY1-02D satellite hyperspectral data of the same NSWAMS based on the location and time, whose principle is shown in Figure 5. Twenty scenes of hyperspectral data with the highest utilization rate were selected as the research data of this paper.
Remote Sens. 2023, 15, x FOR PEER REVIEW 7 of 26 necessary to clearly elaborate the specific parameters used to compare different regression models and to specify the calculation formulas for these comparison metrics.

Data Matching of Water Quality Parameters and Hyperspectral Satellite Data
The data matching method integrates geometric and temporal information to establish correspondence between water quality parameters and hyperspectral satellite data, enabling the spectral characteristics from the satellite data to represent the water quality parameters.
This paper realizes the heterogeneous data matching of water quality monitoring data and ZY1-02D satellite hyperspectral data of the same NSWAMS based on the location and time, whose principle is shown in Figure 5. Twenty scenes of hyperspectral data with the highest utilization rate were selected as the research data of this paper.The specific methods are shown in Figure 6, as follows: (1) Data extraction for a range of locations and times.On the Natural Resources Satellite Remote Sensing Cloud Service Platform, the time condition is "December 2020-August 2022", the geographical conditions are Suzhou, Shanghai, Jiaxing, Huzhou, and Wuxi, the satellite sampling conditions are the AHSI sensor of ZY1-02D and 0 cloud amount; a total of 61 scenes were found, of which 8 scenes had low-altitude cloud phenomenon, so 53 scenes were available for selection.Water quality monitoring data were The specific methods are shown in Figure 6, as follows: necessary to clearly elaborate the specific parameters used to compare different regression models and to specify the calculation formulas for these comparison metrics.

Data Matching of Water Quality Parameters and Hyperspectral Satellite Data
The data matching method integrates geometric and temporal information to establish correspondence between water quality parameters and hyperspectral satellite data, enabling the spectral characteristics from the satellite data to represent the water quality parameters.
This paper realizes the heterogeneous data matching of water quality monitoring data and ZY1-02D satellite hyperspectral data of the same NSWAMS based on the location and time, whose principle is shown in Figure 5. Twenty scenes of hyperspectral data with the highest utilization rate were selected as the research data of this paper.The specific methods are shown in Figure 6, as follows: (1) Data extraction for a range of locations and times.On the Natural Resources Satellite Remote Sensing Cloud Service Platform, the time condition is "December 2020-August 2022", the geographical conditions are Suzhou, Shanghai, Jiaxing, Huzhou, and Wuxi, the satellite sampling conditions are the AHSI sensor of ZY1-02D and 0 cloud amount; a total of 61 scenes were found, of which 8 scenes had low-altitude cloud phenomenon, so 53 scenes were available for selection.Water quality monitoring data were (1) Data extraction for a range of locations and times.On the Natural Resources Satellite Remote Sensing Cloud Service Platform, the time condition is "December 2020-August 2022", the geographical conditions are Suzhou, Shanghai, Jiaxing, Huzhou, and Wuxi, the satellite sampling conditions are the AHSI sensor of ZY1-02D and 0 cloud amount; a total of 61 scenes were found, of which 8 scenes had low-altitude cloud phenomenon, so 53 scenes were available for selection.Water quality monitoring data were obtained from each NSWAMS in Suzhou, Shanghai, Jiaxing, Huzhou, and Wuxi from December 2020 to August 2022 from the National Surface Water Automatic Monitoring Real-Time Data Release System.Each datum includes the name, time, water quality classification, temperature, pH, DO, COD Mn , NH3-N, TP, TN, EC, and TUB of the NSWAMS, together with latitude and longitude; (2) Determination if each NSWAMS is within the satellite data.For the ith scene of the hyperspectral satellite data sh i , the sampling date and time is t i , the four vertices are a i , b i , c i and d i , whose latitudes and longitudes are (lon ai , lat ai ), (lon bi , lat bi ), (lon ci , lat ci ), and (lon di , lat d i).Assuming that the water quality parameter set rwq i was collected at the same sampling time t i , and that the number of water quality parameters records in this set is n i , the latitude and longitude of the NSWAMS e i,j corresponding to the jth water quality parameter record rwq i,j is (lon ei,j , lat ei,j ), where j ∈ {1, 2, 3, ..., n}.The area method is used to determine if this NSWAMS is within the satellite hyperspectral data of this scene.The area of the parallelogram formed by a i , b i , c i , and d i is s i , and e i,j forms four triangles with each side of the quadrangle area, whose areas are s i,j,1 , s i,j,2 , s i,j,3 , and s i,j,4 , respectively.If s i is less than the sum of the four areas noted, then it proves that the NSWAMS is within the scene data, and these water quality parameter records are collected in the selected set swq i .Otherwise, if it is higher, it is outside the hyperspectral satellite data of this scene.If the NSWAMS is determined to be within the geographic location of the ith satellite data according to the method described above, then the water quality parameter records within the satellite data are collected into the dataset swq i .
(3) Selection of satellite data with the top 20 water quality parameter records.The number of water quality parameter records in this dataset is calculated, which is the number of NSWAMS in this satellite scene num i .The number of NSWAMS in each scene of satellite data is calculated using on the method described above, and the numbers of water quality parameter records of all scenes are sorted.The 20 scenes of satellite data with the highest number of NSWAMS are taken for analysis, which are the 20 scenes of satellite hyperspectral data with the highest effective information density; (4) Extraction of spectral value curve corresponding to each water quality parameter sample.According to the geographic information for the NSWAMSs collected from each scene, ENVI 5.3 software is used to extract the entire spectral value curve for the water body at the corresponding position in the hyperspectral satellite image, the spectral mean of scale 1 is taken as the spectral value [20].
All of the water quality parameter records and the spectral curves for the water body with the same position and time are collected.Using these methods, 188 records of water quality parameters at different times and locations and their corresponding satellite hyperspectral data in time and space were obtained, realizing the matching of 20 scenes of satellite hyperspectral data with the highest effective information density together with their water quality parameters.

Water Quality Data Preprocessing
According to the Environmental Quality Standards for Surface Water (GB3838-2002), the classification criteria for Class I-V waters with DO, COD Mn , NH3-N, TP, and TN are shown in Table 2.According to these classification criteria, the water classification distribution for the 188 records of various water quality parameters is shown in Figure 7a.It can be seen from Figure 7a that there is a phenomenon of excessive concentration in the distribution of each water quality parameter within a certain interval.For example, the proportion of class I DO is 78.27%, the class II proportion of CODMn is 63.83%, the sum class I and class II proportion of NH3-N is above 40%, the class II and class III proportion of TP is 51.60% and 37.23%, respectively, and the class V proportion of TN is 53.19%.In order to ensure the accuracy and generalization for the inversion results of various water quality parameters, it is necessary to ensure that each parameter is evenly distributed within each classification interval as much as possible to reduce the phenomenon of distribution concentration.The priority is given to removing data points that simultaneously include Class I DO, Class II COD, Class I or II NH3-N, Class II TP, and Class V and inferior TN.After analysis and screening, 90 records of water quality parameters were selected, and the water classification distribution for each water quality parameter is shown in Figure 7b.
As a result, the water quality monitoring data in relatively concentrated intervals were essentially deleted, resulting in a relatively uniform distribution.Table 3 shows the descriptive statistics of various water quality parameters for the 90 records.

Hyperspectral Satellite Data Preprocessing
The purpose of this paper is to determine the characteristic spectral bands of different water quality parameters, so as to provide the theoretical basis and application guidance for the band customization of the multispectral camera suitable for drones.The spectral range of interest is from the existing multispectral cameras and customizable multispectral cameras suitable for drones, as follows.
At present, there are many institutions engaged in the development of UAV multispectral imaging equipment, and the multispectral cameras in existing UAVs are mainly used for agricultural crop growth assessment, plant classification, forestry monitoring, etc., as shown in Table 4.It can be seen from Figure 7a that there is a phenomenon of excessive concentration in the distribution of each water quality parameter within a certain interval.For example, the proportion of class I DO is 78.27%, the class II proportion of COD Mn is 63.83%, the sum class I and class II proportion of NH3-N is above 40%, the class II and class III proportion of TP is 51.60% and 37.23%, respectively, and the class V proportion of TN is 53.19%.In order to ensure the accuracy and generalization for the inversion results of various water quality parameters, it is necessary to ensure that each parameter is evenly distributed within each classification interval as much as possible to reduce the phenomenon of distribution concentration.The priority is given to removing data points that simultaneously include Class I DO, Class II COD, Class I or II NH3-N, Class II TP, and Class V and inferior TN.After analysis and screening, 90 records of water quality parameters were selected, and the water classification distribution for each water quality parameter is shown in Figure 7b.
As a result, the water quality monitoring data in relatively concentrated intervals were essentially deleted, resulting in a relatively uniform distribution.Table 3 shows the descriptive statistics of various water quality parameters for the 90 records.

Hyperspectral Satellite Data Preprocessing
The purpose of this paper is to determine the characteristic spectral bands of different water quality parameters, so as to provide the theoretical basis and application guidance for the band customization of the multispectral camera suitable for drones.The spectral range of interest is from the existing multispectral cameras and customizable multispectral cameras suitable for drones, as follows.
Since the maximum band wavelength of the customized UAV multispectral camera is 900 nm, the 166 bands with the wavelength from 396 nm to 2501 nm are first deleted to 60 bands with the wavelength from 396 to 903 nm.The obtained spectral curves for different collected hyperspectral satellite data are shown in Figure 8.In Figure 8, the y-axis is the remote sensing reflectance and the x-axis is the center wavelength value of the spectral channel bands.It can be seen from Figure 7 that there is a peak in the wavelength range from 550 to 580 nm for the reflectance, which increases sharply between 390 and 580 nm, but some of them are decreased in the wavelength range from 390 to 490 nm.Then, the reflectance increases from 490 to 580 nm, and generally decreases from 580 to 756 nm.There is a small peak in reflectance at 765 nm.For the remaining wavelength range, the regular pattern is not obvious, generally maintaining the level of fluctuation or slightly decreasing.
In this paper, the customizable 16 spectral band combination set is referred to as CM16, and the spectral band combination set of other existing products is referred to as the product name.For example, the 4-multispectral band combination for Parrot multispectral cameras is referred to as the Par set, and so on.The band set name and spectral information of the band set are shown in Table 5.
In order to compare the inversion performance of the above 7 band sets on 7 different water quality parameters and to determine the characteristic spectral bands of each parameter, the following section describes the empirical and artificial neural network (ANN) methods used to fit the reflectance data for single-band, two-band combinations, and three-band combinations from the above 7 band sets with water quality parameters, and to determine the optimal 6 spectral band combinations based on the obtained characteristic spectral bands for each water quality parameter in the customization of a 6-band multispectral camera.To achieve this goal, a total of 26 bands involved in the above 7 band sets were first determined.Then, the reflectance at the corresponding wavelengths of the 26 bands was calculated using the interpolation method.Finally, the remaining bands were deleted, and the preprocessing of the hyperspectral data was completed.The spectral curves for the 26 bands are shown in Figure 9.
60 bands with the wavelength from 396 to 903 nm.The obtained spectral curves for different collected hyperspectral satellite data are shown in Figure 8.In Figure 8, the y-axis is the remote sensing reflectance and the x-axis is the center wavelength value of the spectral channel bands.It can be seen from Figure 7 that there is a peak in the wavelength range from 550 to 580 nm for the reflectance, which increases sharply between 390 and 580 nm, but some of them are decreased in the wavelength range from 390 to 490 nm.Then, the reflectance increases from 490 to 580 nm, and generally decreases from 580 to 756 nm.There is a small peak in reflectance at 765 nm.For the remaining wavelength range, the regular pattern is not obvious, generally maintaining the level of fluctuation or slightly decreasing.In this paper, the customizable 16 spectral band combination set is referred to as CM16, and the spectral band combination set of other existing products is referred to as the product name.For example, the 4-multispectral band combination for Parrot multispectral cameras is referred to as the Par set, and so on.The band set name and spectral information of the band set are shown in Table 5.In order to compare the inversion performance of the above 7 band sets on 7 different water quality parameters and to determine the characteristic spectral bands of each parameter, the following section describes the empirical and artificial neural network (ANN) methods used to fit the reflectance data for single-band, two-band combinations, and three-band combinations from the above 7 band sets with water quality parameters, and to determine the optimal 6 spectral band combinations based on the obtained characteristic spectral bands for each water quality parameter in the customization of a 6-band multispectral camera.To achieve this goal, a total of 26 bands involved in the above 7 band sets were first determined.Then, the reflectance at the corresponding wavelengths of the 26 bands was calculated using the interpolation method.Finally, the remaining bands were deleted, and the preprocessing of the hyperspectral data was completed.The spectral curves for the 26 bands are shown in Figure 9.

Determination of Characteristic Spectral Bands for Water Quality Parameters Based on the Correlation between Reflectance of Different Bands
From the perspective of water quality parameter measurement, the effective utilization of hyperspectral data leads to selection of the characteristic spectral band combina-

Determination of Characteristic Spectral Bands for Water Quality Parameters Based on the Correlation between Reflectance of Different Bands
From the perspective of water quality parameter measurement, the effective utilization of hyperspectral data leads to selection of the characteristic spectral band combinations for different water quality parameters.By using multiple characteristic bands, accurate inversion of each water quality parameter can be achieved, which can ensure the accuracy of water quality parameter measurement and the simplicity of spectral bands, remove redundant data, improve spectral data processing speed, and achieve efficient utilization of spectral data.
This paper proposes a method to determine the optimal characteristic bands based on the reflectance correlation of different bands, which is shown in Figure 10.For the given band set, the number of bands contained in the band set is n bs .The steps of the approach follow.
Remote Sens. 2023, 15, x FOR PEER REVIEW 12 of 26 tions for different water quality parameters.By using multiple characteristic bands, accurate inversion of each water quality parameter can be achieved, which can ensure the accuracy of water quality parameter measurement and the simplicity of spectral bands, remove redundant data, improve spectral data processing speed, and achieve efficient utilization of spectral data.This paper proposes a method to determine the optimal characteristic bands based on the reflectance correlation of different bands, which is shown in Figure 10.For the given band set, the number of bands contained in the band set is nbs.The steps of the approach follow.(1) Determination of a high correlation two-band set.The determination coefficient  [26] between the reflectance data corresponding to each two-band combination in a given band set was calculated by Equation (1).
where n is the number of the reflectance data samples,  represents the reflectance data corresponding to band A, and  represents the reflectance data corresponding to band B.
The two bands with the determination coefficient  greater than 0.9 were considered to have the same effect in the same characteristic spectral band combination.Therefore, they cannot appear simultaneously in a characteristic spectral band combination containing two or more bands [34].The dataset of the two-band combinations with  higher than 0.9 is expressed as S.
The maximum number of spectral bands nbmax contained in a spectral band combination and the number of different spectral band combinations with different numbers of bands could also be determined so that the spectral band combination cannot contain the two highly correlated bands.
(2) Calculated reflectance of the spectral band combination without high correlation between two bands.For the cith spectral band wavelength combination  , , consisting of the ith, jth, …, and zth bands wavelength,  ,  represents arbitrary wavelength combination of two bands from  , .If  ,  does not belong to the dataset S, then (1) Determination of a high correlation two-band set.The determination coefficient R 2 [26] between the reflectance data corresponding to each two-band combination in a given band set was calculated by Equation (1).
where n is the number of the reflectance data samples, R A rs represents the reflectance data corresponding to band A, and R B rs represents the reflectance data corresponding to band B. The two bands with the determination coefficient R 2 greater than 0.9 were considered to have the same effect in the same characteristic spectral band combination.Therefore, they cannot appear simultaneously in a characteristic spectral band combination containing two or more bands [34].The dataset of the two-band combinations with R 2 higher than 0.9 is expressed as S.
The maximum number of spectral bands n bmax contained in a spectral band combination and the number of different spectral band combinations with different numbers of bands could also be determined so that the spectral band combination cannot contain the two highly correlated bands.
(2) Calculated reflectance of the spectral band combination without high correlation between two bands.For the cith spectral band wavelength combination S n b ,ci , consisting of the ith, jth, . .., and zth bands wavelength, λ ii , λ jj represents arbitrary wavelength combination of two bands from S n b ,ci .If λ ii , λ jj does not belong to the dataset S, then the reflectance data corresponding to the wavelengths in the spectral band combinations S n b ,ci can be used to calculate the combination reflectance R n b ,ci with Equation (2).
The definition and restriction conditions of the notations are listed in Table 6.
Table 6.Definition and restriction conditions of the notations.

Notations Definition Restriction Conditions
S n b ,ci The cith spectral band wavelength combination The ith, jth, . .., and zth band wavelength The number of bands included in the combination S n b ,ci The number of the maximum value of the combinations with n b bands Calculated reflectance of spectral band combination without high correlation two bands - (3) Characteristic spectral bands determination.The combination reflectance R n b ,ci corresponding to the spectral band combinations S n b ,ci containing one to n bmax bands that meet the requirements were traversed to build the regression models with the selected method for inversion with different water quality parameters.The models with the best performance are used to determine the different characteristic spectral band combinations and the number of bands included in the combinations for different water quality parameters.Using the same method mentioned above, other band sets Par, Mic, DJ3, DJ4, MS600, and AQ600 were fitted, and the characteristic spectral band combination for each water quality parameter was selected.The band set that can achieve optimal results was determined by comparing the performance of different band combination models.The characteristic spectral bands of each water quality parameter were summarized within the optimal band set.
(4) Optimal spectral bands selection.By ensuring accurate monitoring of the required water quality parameters while satisfying the overall band quantity requirement, the optimal spectral bands were selected based on the specific monitoring requirements for water quality parameters and the total number of bands that was needed.This approach aimed to achieve precise remote sensing measurements of water quality parameters within the specified number of bands.

Regression Modeling with the Empirical Method
Considering that empirical models are usually one-band, two-band, and three-band models, this paper adopts a one-, two-, and three-band reflectance index to establish the inversion model for water quality parameters [35].The reference two-band indexes are to calculate the band ratio (BR) [36] and the differential spectral index (NDSI) [37] of the reflectance of the two bands.The three-band reference indexes are to calculate the threeband index (TBI) [38], the enhanced three-band index (ETBI) [39], and the baseline height index (BH) [40].
The calculation equation for the single band reflectance data value is expressed as Equation The equations for the calculated reflectance of the two-band combination are expressed as Equations ( 4) and (5).
The equations for the calculated reflectance of the three-band combination are expressed as Equations ( 6)- (8).
In this study, the relationship between these different variables and water quality parameters was established using linear least squares regression fitting.In each regression analysis conducted in this section, the water quality parameter of interest was considered as the response variable, such as DO, COD Mn , NH3-N, TP, TN, TUB, and EC.The corresponding variables, including R 1,ci , R BR 2,ci , R NDSI 2,ci , R TBI 3,ci , R ETBI 3,ci , and R BH 3,ci , calculated by Equations ( 3)-( 8), were included as covariates.There was a one-to-one correspondence between the response variable and the respective covariate in each regression analysis.

Regression Modeling with the ANN Method
The inversion model construction method based on ANN in this study is shown in Figure 11.Firstly, the calculated reflectance and water quality data are divided into a training set and a test set.The training set accounts for 70%, in which the validation set accounts for 10% and the test set accounts for 30%, and the data are preprocessed, which is normalizing the data by mapping the minimum and maximum values to the range [−1, 1].Next, the ANN model is constructed with 10 hidden neurons and trained using the Levenberg-Marquardt backpropagation algorithm.The network weights, biases, and other parameters are initialized at the beginning and the ANN is trained by accepting the training set as input and undergoing iterative training with forward propagation and loss function.In each iteration, a subset of the training set data is utilized to update the network's weights and biases.The gradient of network parameters is computed using the backpropagation algorithm, and the Levenberg-Marquardt algorithm is applied to optimize the weights and biases.Simultaneously, the network's parameters are adjusted based on the performance metrics of the validation set, assessing the network's generalization ability.To prevent overfitting, training is halted if the network shows no significant improvement in the validation set.After training, the trained ANN is independently evaluated using the testing set.The network's output is computed by passing the testing set samples through the network, and its performance is evaluated by comparing the network's output to the corresponding ground-truth values.The  and root mean square error (RMSE) [26] are used to evaluate the model between the measured water quality parameters and predicted ones with the trained model.The optimal band or band combination of the model evaluation, i.e., the band or band combination model with the best correlation of each water quality parameter, is determined as the characteristic spectral band of the water quality parameter.
For the ANN method [41], the calculation equation of the single-band reflectance data value is expressed as Equation (3).The equations for the calculated reflectance of the two-band combination are expressed as Equation (9).
The equations for the calculated reflectance of the combination with 3-7 band combination are expressed as Equations ( 10)- (14).

Model Evaluation
Since the objective of this study was to compare the performance of water quality inversion regression models with varying numbers of spectral bands, the adjusted coefficient of determination  [42] rather than the raw  was employed to evaluate the models.This was primarily because the former metric provides a truer assessment of model performance by accounting for the influence of the number of bands.As the The R 2 and root mean square error (RMSE) [26] are used to evaluate the model between the measured water quality parameters and predicted ones with the trained model.The optimal band or band combination of the model evaluation, i.e., the band or band combination model with the best correlation of each water quality parameter, is determined as the characteristic spectral band of the water quality parameter.
For the ANN method [41], the calculation equation of the single-band reflectance data value is expressed as Equation (3).The equations for the calculated reflectance of the two-band combination are expressed as Equation (9).
The equations for the calculated reflectance of the combination with 3-7 band combination are expressed as Equations ( 10)-( 14).

Model Evaluation
Since the objective of this study was to compare the performance of water quality inversion regression models with varying numbers of spectral bands, the adjusted coefficient of determination R 2 [42] rather than the raw R 2 was employed to evaluate the models.This was primarily because the former metric provides a truer assessment of model performance by accounting for the influence of the number of bands.As the spectral band is added, if this added band is meaningful, then R 2 will increase.If the added band is a redundant feature, then R 2 will decrease.R 2 is calculated by Equation (15).
where R 2 is calculated by Equation ( 16).
where n is the number of samples in the dataset, r i represents the raw measured values, p i represents the predicted values using the regression models, and np is the number of bands.

Result of High Correlation of Two Bands and Computation Reduction Ratio of CM16 Band Set
The determination coefficient R 2 between the reflectance data corresponding to each two-band combination for the CM16 band set are shown in Figure 12.
Remote Sens. 2023, 15, x FOR PEER REVIEW 16 of 26 spectral band is added, if this added band is meaningful, then  will increase.If the added band is a redundant feature, then  will decrease. is calculated by Equation (15).
where  is calculated by Equation (16).
where n is the number of samples in the dataset, ri represents the raw measured values, pi represents the predicted values using the regression models, and np is the number of bands.

Result of High Correlation of Two Bands and Computation Reduction Ratio of CM16 Band Set
The determination coefficient R 2 between the reflectance data corresponding to each two-band combination for the CM16 band set are shown in Figure 12.The combinations of the two spectral bands that cannot appear at the same time are shown in the Table 7, with a total of 20 groups.The combinations of the two spectral bands that cannot appear at the same time are shown in the Table 7, with a total of 20 groups.The maximum number of spectral bands included in a spectral band combination and the number of different spectral band combinations with different numbers of bands were calculated, as shown in the Table 8.The reduction proportion refers to the decrease in computational load enabled by the proposed approach.Without this approach, it would be necessary to exhaustively enumerate and evaluate all possible band combinations, which scales exponentially with the number of bands.However, by intelligently pruning away invalid and redundant band combinations before evaluation, our method retains only 3.98% of the complete set of combinations.Thus, the discarded 96.02% of combinations that do not need to be explicitly evaluated lead to the stated reduction in computation.In summary, these percentages quantify the improvement in computational efficiency gained by avoiding brute-force evaluation of all combinations through the selective analysis proposed in this work.For instance, for combinations with seven bands, the number for the enumeration method will be 11,440.However, with the help of the proposed method described in this paper, the number of combinations that meet the requirements is only 25, which means only 0.2% computational load remained and 99.78% of the computational load was reduced.As can be seen from the table, the maximum number of spectral bands included in a spectral band combination is 7. Compared with the enumeration method, the method proposed in this paper can reduce the calculation of characteristic spectral band combi- The empirical model parameters corresponding to each water quality parameter in the table with the best performance are shown in bold in Table 9. Taking R 2 as the evaluation index, the accuracy of the optimal performance model R NDSI   2   for DO is the highest among the different water quality parameters, with R 2 reaching 0.309 and MAPE 19.65%.This is a three-band model, and the corresponding center wavelengths of the bands are 610, 650, and 680 nm, respectively.This band is combined from the band set CM16.The second is the optimal performance model R BH  3 for TUB, which is a three-band model.The corresponding center wavelengths of the bands are 570, 720, and 840 nm, respectively, which are from the band set CM16.Its R 2 reaches 0.208, but its MAPE is only 45.84%, which has a large relative error.The next were TP, EC, COD Mn , and TN, with R 2 of 0.105, 0.101, 0.90, and 0.077, respectively.It is worth noting that although the four R 2 are similar, the MAPE for COD Mn and EC is between 20% and 25%, while the MAPE for TP and TN is between 47% and 52%.NH3-N had the worst performance with R 2 ; less than 0 and its optimal performance model was R 1 with MAPE reaching 74.29%.It worth noting that the center wavelengths of the corresponding bands of EC are 450 and 730 nm, respectively, which are from the band set DJ4.
For different empirical models and among the seven water quality parameters, there are two best performance models with three bands, which are R TBI 3 and R BH 3 , respectively.There are three best performance models with three bands, among which R BR 2 accounts for one and R 2 accounts for two.There are two best performance models with one band.Most of the bands corresponding to the optimal performance model are from CM16, with the exception of the EC model, which is from the DJ4 band set.
In summary, among the band sets of different products, the fitting result of CM16 bands is the best, indicating that, generally, the more bands that can be selected, the better the fitting result.It shows that the empirical models with different band combinations cannot effectively determine the characteristic spectral bands for each water quality parameter, so it is necessary to use the ANN method to carry out the fitting inversion between the calculated reflectance of different band combinations with different band number and each water quality parameter, so as to determine and select the characteristic spectral bands for each water quality parameter.

Result of the ANN Method
Figure 13 depicts the performance results for the different ANN regression models, employing R 2 as the evaluation index.
Figure 13 illustrates that the ANN method shows substantially better performance than the empirical method for the regression models of various water quality parameters.Among the optimal ANN regression models for different water quality parameters, the NH3-N model has the poorest performance, as measured by R 2 of 0.41; nevertheless, this still exceeds the top empirical model (for DO) with an R 2 of 0.309.The optimal model was for COD Mn with an R 2 of 0.68, followed by TUB, TP, TN, EC, DO, and NH3-N.The R 2 values for the optimal TUB and TP ANN models all exceeded 0.6; the R 2 for the optimal TN ANN model was 0.58, which is a single-band model; and the R 2 values for all the optimal EC, DO, and NH3-N ANN models were in the range from 0.4 to 0.5.With the exception of TN, the optimal ANN models for the other water quality parameters were all three-band models, whereas the TN ANN model had one band.
For the various band sets, all bands corresponding to the optimal performance ANN models belonged to the CM16 band set.The ANN models for COD Mn , NH3-N, TP, and TUB in the CM16 band set evidenced obvious advantages, with minimum R 2 differences greater than 0.1 compared to the other band sets.The R 2 differences were all less than 0.1 between the ANN models for DO, TN, and EC in the CM16 band set compared to other band sets.Nevertheless, compared to other band sets, CM16 still demonstrated considerable dominance over other band sets, owing to the ample spectral band options it provides.The existing multispectral lens products are suitable for agriculture and forestry and have large errors in the inversion measurement of water quality parameters.Figure 13 illustrates that the ANN method shows substantially better performance than the empirical method for the regression models of various water quality parameters.Among the optimal ANN regression models for different water quality parameters, the NH3-N model has the poorest performance, as measured by  of 0.41; nevertheless, For ANN models with different band numbers, most models showed performance that increased with more bands except TN for band set CM16.It is necessary to study the performance of different band number models to determine the optimal number of bands for different water quality parameters.

Result of the ANN Method
Figure 14 illustrates the performance of ANN models of water quality parameters with varying numbers of spectral bands.As evidenced in Figure 14, all ANN models of water quality parameters exhibit a consistent pattern in which model accuracy, as measured by R 2 , initially increases with additional bands but subsequently decreases with the exception of TN models.For the ANN regression model of DO and NH3-N, R 2 values increased monotonically from one to four bands and then decreased monotonically from four to seven bands.For the ANN regression model of CODMn, TP, EC, and NH3-N, R 2 limbed monotonically from one to three bands but declined monotonically thereafter from three to seven bands.Counterintuitively, the model for TN manifests the greatest performance with one band.Overall, the R 2 value of the models decreased as the number of bands increased.However, the four-band model and seven-band model had higher R 2 values than the three-band model and six-band model, respectively.greater than 0.1 compared to the other band sets.The  differences were all less than 0.1 between the ANN models for DO, TN, and EC in the CM16 band set compared to other band sets.Nevertheless, compared to other band sets, CM16 still demonstrated considerable dominance over other band sets, owing to the ample spectral band options it provides.The existing multispectral lens products are suitable for agriculture and forestry and have large errors in the inversion measurement of water quality parameters.
For ANN models with different band numbers, most models showed performance that increased with more bands except TN for band set CM16.It is necessary to study the performance of different band number models to determine the optimal number of bands for different water quality parameters.
Figure 14 illustrates the performance of ANN models of water quality parameters with varying numbers of spectral bands.As evidenced in Figure 14, all ANN models of water quality parameters exhibit a consistent pattern in which model accuracy, as measured by  , initially increases with additional bands but subsequently decreases with the exception of TN models.For the ANN regression model of DO and NH3-N,  values increased monotonically from one to four bands and then decreased monotonically from four to seven bands.For the ANN regression model of CODMn, TP, EC, and NH3-N,  limbed monotonically from one to three bands but declined monotonically thereafter from three to seven bands.Counterintuitively, the model for TN manifests the greatest performance with one band.Overall, the  value of the models decreased as the number of bands increased.However, the four-band model and seven-band model had higher  values than the three-band model and six-band model, respectively.In summary, despite the increasing availability of information with additional spectral bands, model performance does not improve indefinitely.For most water quality parameters, model efficacy reaches an apex with either three or four bands, beyond which superfluous information degrades predictive accuracy.The anomalous case of TN highlights the idiosyncrasies that can emerge in complex models.It can be extrapolated that a paucity of information precludes achieving optimal model accuracy due to insufficient critical data.Additionally, redundant information introduces random noise into the data, thereby undermining accuracy.
Therefore, there exist an optimal number of characteristic spectral bands for different water quality parameters.The optimal number of bands for DO and NH3-N is four, the optimal number of bands for CODMn, TP, EC, and TUB is three, and the optimal In summary, despite the increasing availability of information with additional spectral bands, model performance does not improve indefinitely.For most water quality parameters, model efficacy reaches an apex with either three or four bands, beyond which superfluous information degrades predictive accuracy.The anomalous case of TN highlights the idiosyncrasies that can emerge in complex models.It can be extrapolated that a paucity of information precludes achieving optimal model accuracy due to insufficient critical data.Additionally, redundant information introduces random noise into the data, thereby undermining accuracy.
Therefore, there exist an optimal number of characteristic spectral bands for different water quality parameters.The optimal number of bands for DO and NH3-N is four, the optimal number of bands for COD Mn , TP, EC, and TUB is three, and the optimal number of bands for TN is one, because the ANN models with these optimal numbers of bands demonstrated the best performance among the various models.
Figure 15 shows the performance and spectral band information for the optimal ANN models of seven water quality parameters.It can be seen that the ANN model for COD Mn has the best performance among the water quality parameters, with an R 2 of 0.68 and a MAPE of 14.02%.Though the ANN models for TUB and TP have relatively high R 2 values of 0.67 and 0.61, respectively, their MAPE values of 36.19% and 28.09% are not low.The R 2 values of EC and DO ANN regression models are 0.49 and 0.43, respectively, which do not show the advantage among water quality parameters; their MAPE values remain at low levels ranging from 16% to 18%.As with the results of the empirical method and the ANN method with other band sets, the performance of the NH3-N ANN model was the worst, with the lowest R 2 of 0.54 and the highest MAPE of 65.85%.number of bands for TN is one, because the ANN models with these optimal numbers of bands demonstrated the best performance among the various models.
Figure 15 shows the performance and spectral band information for the optimal ANN models of seven water quality parameters.It can be seen that the ANN model for CODMn has the best performance among the water quality parameters, with an  of 0.68 and a MAPE of 14.02%.Though the ANN models for TUB and TP have relatively high  values of 0.67 and 0.61, respectively, their MAPE values of 36.19% and 28.09% are not low.The  values of EC and DO ANN regression models are 0.49 and 0.43, respectively, which do not show the advantage among water quality parameters; their MAPE values remain at low levels ranging from 16% to 18%.As with the results of the empirical method and the ANN method with other band sets, the performance of the NH3-N ANN model was the worst, with the lowest  of 0.54 and the highest MAPE of 65.85%.Furthermore, Figure 15 shows that the characteristic spectral band combinations for COD Mn and TP were the same, namely 410, 490, and 840 nm, which belonged to the set of DO characteristic spectral bands.The DO characteristic spectral bands differed from these two indices only in having an additional 720 nm band.The characteristic spectral bands for EC were 410, 570, and 720 nm, with the 410 and 720 nm bands overlapping with DO and the 570 nm band overlapping with NH3-N.The characteristic spectral bands for NH3-N were 490, 570, 680, and 840 nm, with the 490 n and 840 nm bands overlapping with DO, COD Mn , and TP.The characteristic spectral band for TN was a single band with a center wavelength of 610 nm, and the characteristic spectral bands for TUB were 530, 660, and 780 nm.The biggest difference from the characteristic spectral bands of them between those of other water quality parameters was that there were no overlapping bands.

Discussion
This study also explored the relationship between characteristic spectral bands in the model as the number of bands increased, that is, whether the highly correlated bands in the model with fewer bands would also appear in the model with the best performance with more bands.However, the results did not show that the bands in the model with fewer bands would also appear in the model with the best performance of more bands.Therefore, this content was not discussed in detail.
This paper suggests that the spectral bands with wavelengths of 410, 490, 570, 680, 720, and 840 nm should be used if the researcher needs to customize a six-band multispectral camera to monitor the water quality parameters of DO, COD Mn , NH3-N, TP, and EC and effectively ensure the inversion accuracy of COD Mn , DO, and EC.The inversion accuracy of TN and TUB cannot be ensured.However, due to the common situation that TN exceeds the standard in general water bodies and that TUB is not used to classify water quality, the lack of accurate inversion of TN and TUB will not affect the water quality assessment.If there are other requirements, i.e., the water quality parameters in the research interest of the investigator, the optimal combination of six spectral bands can be determined according to the actual requirements, so as to achieve the optimal inversion result of all these water quality parameters.
The dataset in this paper was obtained from all of the NSWAMS of Suzhou, Shanghai, Jiaxing, Huzhou, and Wuxi in the Changjiang Delta region, covering the data of December 2020, January, November, December 2021, and March and August 2022.The data are diverse and can represent the characteristics of different time periods, and the model is suitable for the inversion of water quality parameters in this region.However, due to the good water quality and low degree of pollution, the dataset used for the training model has fewer class V and inferior class V water quality samples.The performance of the models in water quality inversion and classification is not good.At the same time, the applicability and universality of the model are directly related to the sample dataset during the model training.Therefore, with the progress of this research, the universality of the model can be effectively expanded by adding more regions and more periods of datasets to the model training.In addition, due to different target water quality parameters and different selected bands, there will certainly be different cameras with different spectral band combinations for different purposes, and the bands to be customized should be determined according to the actual needs.

Conclusions
This study discussed the determination of the optimal characteristic spectral bands for different water quality parameters with a proposed novel approach, which is based on the correlation between reflectance of different bands and regression modeling with the ANN method.Using fused ZY1-02D hyperspectral images and water quality data from December 2020 to August 2021 around Taihu Lake, the proposed approach was tested.The result showed that it can effectively reduce the computation by 96.02% and quickly find the characteristic bands to improve the modeling efficiency.Comparing different band sets of multispectral cameras, the CM16 band set with 16 bands leads to the best performance, suggesting that more spectral options enable better modeling results.Compared to typical empirical methods, the ANN model shows significant advantages in estimating various water quality parameters.Each parameter has an optimal number of characteristic bands, with model accuracy first increasing and then decreasing as more bands are added, except for TN.

Figure 1 .
Figure 1.Overall technical flowchart for this work.

Figure 1 .
Figure 1.Overall technical flowchart for this work.

Figure 2 .
Figure 2. Research area of water quality and satellite data.

Figure 2 .
Figure 2. Research area of water quality and satellite data.

Figure 5 .
Figure 5. Principle of heterogeneous data matching between hyperspectral satellite data and water quality parameters records.

Figure 6 .
Figure 6.Detailed steps in heterogeneous data matching between hyperspectral satellite data and water quality parameters records.

Figure 5 .
Figure 5. Principle of heterogeneous data matching between hyperspectral satellite data and water quality parameters records.

Figure 5 .
Figure 5. Principle of heterogeneous data matching between hyperspectral satellite data and water quality parameters records.

Figure 6 .
Figure 6.Detailed steps in heterogeneous data matching between hyperspectral satellite data and water quality parameters records.

Figure 6 .
Figure 6.Detailed steps in heterogeneous data matching between hyperspectral satellite data and water quality parameters records.

Figure 7 .
Figure 7. Classifications distribution of water quality parameters: (a) original 188 records; (b) 90 records after analysis and screening.

Figure 7 .
Figure 7. Classifications distribution of water quality parameters: (a) original 188 records; (b) 90 records after analysis and screening.

Figure 8 .
Figure 8. Remote sensing hyperspectral spectral curve with the wavelength from 390 to 900 nm.

Figure 8 .
Figure 8. Remote sensing hyperspectral spectral curve with the wavelength from 390 to 900 nm.

Figure 9 .
Figure 9. Spectral curves for multispectral data with 26 spectral bands.

Figure 9 .
Figure 9. Spectral curves for multispectral data with 26 spectral bands.

Figure 10 .
Figure 10.Approach for determining the characteristic spectral bands of water quality parameters.

Figure 10 .
Figure 10.Approach for determining the characteristic spectral bands of water quality parameters.

Figure 11 .
Figure 11.ANN model construction and training method.

Figure 11 .
Figure 11.ANN model construction and training method.

Figure 12 .
Figure 12.Cloud image of reflectance correlation of 16 spectral bands.

Figure 12 .
Figure 12.Cloud image of reflectance correlation of 16 spectral bands.

Figure 13 Figure 13 .
Figure 13 depicts the performance results for the different ANN regression models, employing  as the evaluation index.

Figure 13 .
Figure 13.R 2 for the ANN regression model using different band sets with one, two, and three bands for DO (a), COD Mn (b), NH3-N (c), TP (d), TN (e), EC (f), and TUB (g).

Figure 14 .
Figure 14. for ANN models of different water quality parameters with different numbers of spectral bands.

Figure 14 .
Figure 14.R 2 for ANN models of different water quality parameters with different numbers of spectral bands.

Table 1 .
Main parameters of the ZY1-02D satellite hyperspectral camera.

Table 1 .
Main parameters of the ZY1-02D satellite hyperspectral camera.

Table 2 .
Classification criteria for water quality parameters.

Table 3 .
Number of different classifications of various water quality parameters after screening.

Table 3 .
Number of different classifications of various water quality parameters after screening.

Table 4 .
Index parameters of existing multispectral cameras for drones.

Table 5 .
Information for different multispectral band combination sets.

Table 5 .
Information for different multispectral band combination sets.

Table 7 .
Twenty groups of two-band combinations with high correlation.

Table 8 .
Twenty groups of two-band combinations with high correlation.