Retrieval of Chlorophyll-a Concentrations in the Coastal Waters of the Beibu Gulf in Guangxi Using a Gradient-Boosting Decision Tree Model

Remote sensing for the monitoring of chlorophyll-a (Chl-a) is essential to compensate for the shortcomings of traditional water quality monitoring, strengthen red tide disaster monitoring and early warnings, and reduce marine environmental risks. In this study, a machine learning approach called the Gradient-Boosting Decision Tree (GBDT) was employed to develop an algorithm for estimating the Chl-a concentrations of the coastal waters of the Beibu Gulf in Guangxi, using Landsat 8 OLI image data as the image source in combination with field measurements of Chl-a concentrations. The GBDT model with B4, B3 + B4, B3, B1 − B4, B2 + B4, B1 + B4, and B2 − B4 as input features exhibited higher accuracy (MAE = 0.998 μg/L, MAPE = 19.413%, and RMSE = 1.626 μg/L) compared with different physics models, providing a new method for remote sensing inversion of water quality parameters. The GBDT model was used to study the spatial distribution and temporal variation of Chl-a concentrations in the coastal sea surface of the Beibu Gulf of Guangxi from 2013 to 2020. The results showed a spatial distribution with high concentrations in nearshore waters and low concentrations in offshore waters. The Chl-a concentration exhibited seasonal changes (concentration in summer > autumn > spring ≈ winter).


Introduction
Chlorophyll-a (Chl-a) is an important index that can reflect phytoplankton biomass and the state of eutrophication. The concentration of Chl-a increases as the phytoplankton biomass increases, and an increase in phytoplankton may cause red tides, with the potential to threaten public health [1] and wildlife [2] and being harmful to the environment [3,4]. The Beibu Gulf in Guangxi receives more than 120 small-and medium-sized inflows carrying a large amount of organic matter and inorganic salts, and it is likely to experience eutrophication and red tides. From 2014 to 2017, several abnormal water quality events occurred in the Beibu Gulf of Guangxi, including two red tides. Consequently, water quality monitoring in the Guangxi Beibu Gulf is of significance for protecting the water quality and environment of the Beibu Gulf and ensuring the health of its residents [5].
Traditional monitoring methods cannot completely reflect the spatial and temporal distribution of the water quality because of the small coverage of such methods, which are limited by high costs. However, satellite remote sensing enables automated monitoring of water quality parameters, including the Chl-a and total suspended solids. Such techniques benefit from lower costs and greater spatial and temporal coverage [6], and such studies have been undertaken overseas and in China [7][8][9][10].
In recent years, machine learning has been used to retrieve water quality. Machine learning enhances the accuracy of inversion and the generalization ability through model The Beibu Gulf (107 • 57 E~109 • 48 E, 21 • 00 N~22 • 15 N, Figure 1) is a coastal region in Guangxi Province, included in the administrative region of Qinzhou, Beihai, and Fangchenggang. The study area can be roughly divided into the Qinzhou Bay, Fangcheng Bay, Dafeng Estuary Bay, Nanliu Estuary Bay, Tieshan Port Bay, and Pearl Bay.
Located south of the Tropic of Cancer, the Beibu Gulf of Guangxi is dominated by a subtropical climate with an oceanic monsoon, exhibiting transitional characteristics from subtropical to tropical [38]. The average sea surface temperature in the Beibu Gulf of Guangxi is approximately 22.6 • C, with high temperatures experienced from June to August and low temperatures occurring from December to March. Rainfall is cyclical, with the wet season ranging from May to October and the dry season ranging from November to the following April. The annual average rainfall is approximately 1500 mm, and the rivers in Guangxi are mainly replenished by rainwater, particularly during the wet season and a small amount in the dry season. Appl. Sci. 2021, 11, x FOR PEER REVIEW 3 of 19 Located south of the Tropic of Cancer, the Beibu Gulf of Guangxi is dominated by a subtropical climate with an oceanic monsoon, exhibiting transitional characteristics from subtropical to tropical [38]. The average sea surface temperature in the Beibu Gulf of Guangxi is approximately 22.6 °C, with high temperatures experienced from June to August and low temperatures occurring from December to March. Rainfall is cyclical, with the wet season ranging from May to October and the dry season ranging from November to the following April. The annual average rainfall is approximately 1500 mm, and the rivers in Guangxi are mainly replenished by rainwater, particularly during the wet season and a small amount in the dry season.

In Situ Data
Chl-a concentrations were measured every 30 min at the automatic monitoring station of the Marine Environmental Monitoring Center of the Guangxi Zhuang Autonomous Region [39] (Figure 2). The quality of the coastal water at a depth of 0.5 m was monitored every half an hour by a multi-parameter probe (6600V2-4, YSI, Yellow Springs, OH, USA) produced by Xylem, and the concentration of Chl-a was determined by a in vivo fluorescence method.

In Situ Data
Chl-a concentrations were measured every 30 min at the automatic monitoring station of the Marine Environmental Monitoring Center of the Guangxi Zhuang Autonomous Region [39] (Figure 2). The quality of the coastal water at a depth of 0.5 m was monitored every half an hour by a multi-parameter probe (6600V2-4, YSI, Yellow Springs, OH, USA) produced by Xylem, and the concentration of Chl-a was determined by a in vivo fluorescence method.
The instantaneous value of the Chl-a concentration closest to the transit time of Landsat 8 (11:11 a.m.) was selected as the model input dataset.

Satellite Data Acquisition and Pre-Processing
The Landsat 8 OLI satellite data with a nominal 30-m spatial resolution used in this study were accessed from the United States Geological Survey (USGS) portal (https: //glovis.usgs.gov (accessed on 1 January 2021)). The images (path-row: 125-045) covered the coastal waters of the Beibu Gulf of Guangxi to the maximum extent, and the 13 automatic monitoring stations established by the Marine Environmental Monitoring Center of Guangxi are in the coverage. 34 scene images with low cloud cover were selected, ranging from 2013 to 2020 (Table 1).  The instantaneous value of the Chl-a concentration closest to the transit time of Landsat 8 (11:11 a.m.) was selected as the model input dataset.

Satellite Data Acquisition and Pre-Processing
The Landsat 8 OLI satellite data with a nominal 30-m spatial resolution used in this study were accessed from the United States Geological Survey (USGS) portal (https://glovis.usgs.gov (accessed on 1th January 2021)). The images (path-row: 125-045) covered the coastal waters of the Beibu Gulf of Guangxi to the maximum extent, and the 13 automatic monitoring stations established by the Marine Environmental Monitoring Center of Guangxi are in the coverage. 34 scene images with low cloud cover were selected, ranging from 2013 to 2020 (Table 1).   The Landsat 8 OLI satellite images were radiologically calibrated and atmospherically corrected before further processing. By radiometric calibration, the digital number (DN) recorded by the sensor could be converted into the spectral radiance and to the Top of the Atmosphere (TOA) reflectance. The surface reflectance may have changed after the atmospheric transmission, so the atmospheric correction was required. The error of reflectance reduced after atmospheric correction and could be used for the retrieval of Chl-a concentration.
In this study, all of the images were processed for radiometric calibration and atmospheric correction using the FLAASH model in the ENVI 5.3.1 software package. The corrected image reduced the influence of water vapor particles in the air and was clearer than the image before correction. The spectral curve of the pixel after atmospheric correction was closer to the actual spectral curve of the ground object and more in line with the requirements of inversion as shown in Figure 3.

concentration.
In this study, all of the images were processed for radiometric calibration and atmospheric correction using the FLAASH model in the ENVI 5.3.1 software package. The corrected image reduced the influence of water vapor particles in the air and was clearer than the image before correction. The spectral curve of the pixel after atmospheric correction was closer to the actual spectral curve of the ground object and more in line with the requirements of inversion as shown in Figure 3.

Calibration Dataset
With the pre-processing of remote sensing images accomplished by radiometric calibration and atmospheric correction of the image, the measured data of the Chl-a concentration were matched with the spectral data of the monitoring sampling points, producing 117 samples in total ( Table 2).

Calibration Dataset
With the pre-processing of remote sensing images accomplished by radiometric calibration and atmospheric correction of the image, the measured data of the Chl-a concentration were matched with the spectral data of the monitoring sampling points, producing 117 samples in total ( Table 2). Clouds and shadows on the fog surface may have caused data anomalies, resulting in deviations between the inversion results and field data. To improve the accuracy of the inversion model, outliers needed to be removed.
A boxplot was used to filter abnormal data. The reflectance of each band for the 117 samples was calculated (Figure 4), 7 abnormal data points in the study samples were identified and removed, and 110 samples were used in the study thereafter.

GBDT Model
The Gradient-Boosting Decision Tree (GBDT) algorithm was proposed by Friedman in 1999 [40]. The algorithm restricts weak learners from using only the Classification and Regression Tree (CART) model, a widely used model for constructing decision trees for both classification and regression problems. When building a regression tree using the CART model, the feature selection index generally uses the node minimum sample variance. The larger the sample variance, the greater the node data scatter with low purity. The CART branches through the variance threshold of each node. When all the variance is lower than the threshold value of each node, or after reaching the set stop conditions, the CART decision tree is completed. In the GBDT algorithm, the CART decision tree is associated with the boosting algorithm. In general, the residuals are calculated and evaluated after each iteration and processed as the input by the next iteration, thus minimizing the loss function and improving the fitting accuracy of the model. When the residuals reach a lowest value, or the setting termination condition has been reached, the model is constructed, and the regression result will be exported. The specific step of the GBDT regression algorithm is shown in the following sequence of equations and in Figure 5. Clouds and shadows on the fog surface may have caused data anomalies, resulting in deviations between the inversion results and field data. To improve the accuracy of the inversion model, outliers needed to be removed.
A boxplot was used to filter abnormal data. The reflectance of each band for the 117 samples was calculated (Figure 4), 7 abnormal data points in the study samples were identified and removed, and 110 samples were used in the study thereafter.

GBDT Model
The Gradient-Boosting Decision Tree (GBDT) algorithm was proposed by Friedman in 1999 [40]. The algorithm restricts weak learners from using only the Classification and Regression Tree (CART) model, a widely used model for constructing decision trees for both classification and regression problems. When building a regression tree using the CART model, the feature selection index generally uses the node minimum sample variance. The larger the sample variance, the greater the node data scatter with low purity. The CART branches through the variance threshold of each node. When all the variance is lower than the threshold value of each node, or after reaching the set stop conditions, the CART decision tree is completed. In the GBDT algorithm, the CART decision tree is associated with the boosting algorithm. In general, the residuals are calculated and evaluated after each iteration and processed as the input by the next iteration, thus minimizing the loss function and improving the fitting accuracy of the model. When the residuals reach a lowest value, or the setting termination condition has been reached, the model is constructed, and the regression result will be exported. The specific step of the GBDT regression algorithm is shown in the following sequence of equations and in Figure 5.
1. Initialize the cart learner: 2. In round t, the negative gradient of each sample is calculated: 3. The CART regression tree Tt is obtained by fitting (xi, rt,i), i = 1, 2, …, m, and the leaf node region is divided into Rt,j, j = 1, 2, …, J; 4. Traverse (referring to one visit to each node in the tree (or graph) along a certain search route) the node region, and calculate the output value of each leaf node Rt, namely the best fitting value ct,j: 5. Update the learner: 6. Repeat these steps until the termination condition is reached, and the final strong learner expression is obtained by adding the weak learners as follows: To avoid over-or underfitting of the model, the setting of the GBDT model was determined by grid searching with cross-validation. By 10-fold cross-validation, the method of cross validation used in this study, the dataset was separated into 10 subsamples. After one of the subsamples was randomly used as a testing set with the rest as a training set, the GBDT model was constructed, and the performance was evaluated. This process was repeated 10 times. By grid research, all the potential setting was traversed with 10-fold cross-validation, and the parameters with best performance were given. In this study, the mean square error (MSE) was set as a loss function, and the learning rate was set as 0.1. The number of the CART decision trees was set as 100, and the max depth of the decision tree was set as 3.
The inversion accuracy of the remote sensing inversion model was assessed using 1. Initialize the cart learner: 2. In round t, the negative gradient of each sample is calculated: 3. The CART regression tree T t is obtained by fitting (x i , r t,i ), i = 1, 2, . . . , m, and the leaf node region is divided into R t,j , j = 1, 2, . . . , J; 4. Traverse (referring to one visit to each node in the tree (or graph) along a certain search route) the node region, and calculate the output value of each leaf node R t , namely the best fitting value c t,j : 5. Update the learner: 6. Repeat these steps until the termination condition is reached, and the final strong learner expression is obtained by adding the weak learners as follows: To avoid over-or underfitting of the model, the setting of the GBDT model was determined by grid searching with cross-validation. By 10-fold cross-validation, the method of cross validation used in this study, the dataset was separated into 10 subsamples. After one of the subsamples was randomly used as a testing set with the rest as a training set, the GBDT model was constructed, and the performance was evaluated. This process was repeated 10 times. By grid research, all the potential setting was traversed with 10-fold cross-validation, and the parameters with best performance were given. In this study, the mean square error (MSE) was set as a loss function, and the learning rate was set as 0.1. The number of the CART decision trees was set as 100, and the max depth of the decision tree was set as 3.
The inversion accuracy of the remote sensing inversion model was assessed using the mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE), which are defined as follows: where n is the number of data pairs, the subscript i denotes individual data points, and y and f represent the measured and estimated values, respectively. The correlation coefficient (R 2 ) was also measured to show how well the variation of one model explained the variation in the concentration of Chl-a. Generally, the largest R 2 with the smallest RMSE gives the best prediction model. In this study, the models with correlation coefficients (R 2 ) exceeding 0.7 would be selected to verify the inversion accuracy.
Theil-Sen and Mann-Kendall trend analysis were used to analyze the variation of Chl-a concentrations in the coastal sea surface of the Beibu Gulf of Guangxi. Theil-Sen and Mann-Kendall trend analysis includes Theil-Sen slope estimation and the Mann-Kendall significance test, which does not require the dataset to meet the normal distribution on the time series, nor does it require a dataset correlation between time series, which is insensitive to outliers in time series and has a strong ability to avoid measurement errors in datasets or discrete data.
The remote sensing images with sensing dates ranging from 2013 to 2020 were converted into the trained GBDT model, and the output results were imported into ArcGIS 10.5 for raster processing. After elimination of the outlier values and Inverse Distance Weighted (IDW) interpolation, the spatial and temporal distributions of chlorophyll were visualized for analysis.

Performance Assessment
In this study, single bands and the single-band ratio, band combination, and water index of the Landsta8 OLI images in the study area were used to establish the feature library of the GBDT model. The single-sample Kolmogorov-Smirnov test (K-S test) was used in SPSS software to determine whether these variables were in line with the normal distribution. The results showed that the value of progressive significance of the test samples was greater than 0.05, proving that the variables were in line with the normal distribution. In this case, Pearson's correlation analysis was used to test the importance of the features.
Features with high importance (correlation coefficient higher than 0.6) are shown in Table 3. The features with correlation coefficients higher than 0.7 (B4, B3 + B4, B3, B1 − B4, B2 + B4, B1 + B4, and B2 − B4) were selected as the input variables. The input features of the GBDT model were added successively, and the accuracy of the inversion results was evaluated and compared ( Table 4). The results demonstrated that the inversion accuracy was enhanced as more variables were added, suggesting that additional variables could significantly improve the performance of the GBDT model for the retrieval of the Chl-a concentration, and the GBDT model with all the selected variables performed with a higher accuracy (R 2 = 0.778). The inversion results and the fitting of the measured values of the GBDT model constructed with B4, B3 + B4, B3, B1 − B4, B2 + B4, B1 + B4, and B2 − B4 as the input features are shown in Figure 6. The GBDT model performed well, as indicated (MAE = 0.998 µg/L, MAPE = 19.414%, RMSE = 1.626 µg/L, and R 2 = 0.778). The inversion results and the fitting of the measured values of the GBDT model co structed with B4, B3 + B4, B3, B1 − B4, B2 + B4, B1 + B4, and B2 − B4 as the input featu are shown in Figure 6. The GBDT model performed well, as indicated (MAE = 0.998 μg MAPE = 19.414%, RMSE = 1.626 μg/L, and R 2 = 0.778).    Table 5. The concentration of Chl-a in the coastal sea surface of the Beibu Gulf of Guangxi was higher in the nearshore coastal waters and lower in the offshore waters, and it gradually decreased from north to south. The concentration of Chl-a in the Nanliu Estuary Bay was the highest, and the average of the whole region was 11.469 µg/L.   The concentration of Chl-a in the coastal sea surface of the Beibu Gulf of Guangxi was higher in the nearshore coastal waters and lower in the offshore waters, and it gradually decreased from north to south. The concentration of Chl-a in the Nanliu Estuary Bay was the highest, and the average of the whole region was 11.469 μg/L. The means of the Dafeng Estuary Bay, Beihai, Qinzhou Bay, Pearl Bay, and Fangcheng Bay were 8.198, 7.461, 6.600, 5.031, and 3.372 μg/L, respectively.

Temporal Variations of Chl-a
The Chl-a concentrations in the four seasons in 2019 are presented in Figure 8.

Temporal Variations of Chl-a
The Chl-a concentrations in the four seasons in 2019 are presented in Figure 8.  The thermal and dynamic structures of the sea surfaces of the Beibu Gulf of Guangxi are affected by subtropical monsoons. The spatial and temporal differences in the Chl-a concentration in the Beibu Gulf were caused by the transport of nutrients. The concentration of Chl-a in the coastal sea surface exhibited clear seasonal changes. The average concentration of Chl-a in the summer was the highest (8.312 µg/L), while it was moderate during autumn (7.714 µg/L) and spring (6.954 µg/L) and lowest in winter (6.680 µg/L).

Theil-Sen and Mann-Kendall Trend Analysis
Theil-Sen and Mann-Kendall trend analysis demonstrated variations in the Chl-a concentrations in the coastal sea surfaces of the Guangxi Beibu Gulf (Figure 9), and the statistical results of each trend are shown in Table 6. The thermal and dynamic structures of the sea surfaces of the Beibu Gulf of Guangxi are affected by subtropical monsoons. The spatial and temporal differences in the Chl-a concentration in the Beibu Gulf were caused by the transport of nutrients. The concentration of Chl-a in the coastal sea surface exhibited clear seasonal changes. The average concentration of Chl-a in the summer was the highest (8.312 μg/L), while it was moderate during autumn (7.714 μg/L) and spring (6.954 μg/L) and lowest in winter (6.680 μg/L).

Theil-Sen and Mann-Kendall Trend Analysis
Theil-Sen and Mann-Kendall trend analysis demonstrated variations in the Chl-a concentrations in the coastal sea surfaces of the Guangxi Beibu Gulf (Figure 9), and the statistical results of each trend are shown in Table 6.   The area of this study area was 4141.8 km 2 , of which the area with an obvious decrease in its Chl-a concentration was 193.55 km 2 , the area with a less obvious decrease was 356.72 km 2 , the area with no obvious change was 383.69 km 2 , the area with a less obvious increase was 724.45 km 2 , and the area with an obvious increase was 761.49 km 2 . The remaining 1721.900 km 2 in the study area showed no significant change. The concentration of Chl-a in the coastal sea surface has exhibited an increase in recent years.
This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.

Comparison of Different Models
Two machine learning models were used for comparison in this study. An artificial neural network is a machine learning model that can explore the nonlinear relationship between the input variables and target data though training and adjusting the inside interconnected processing neurons [41]. The features of high prediction accuracy, selfadaptation, and robustness make it widely used for retrieval in waters with complex optical characteristics [42]. The support vector machine (SVM) is a useful tool for nonlinear statistical learning and regression analysis [43], whose training provides the support vectors to separate the classes in a multidimensional attribute space (the inland water's trophic status classification is based on machine learning and remote sensing data).
The artificial neural network and SVM model were trained in MATLAB software. Given the same input variables as the GBDT model, the performance of them for the estimation of the Chl-a concentration in the coastal waters of the Beibu Gulf in Guangxi were evaluated.
The results from verifying the accuracies of different models are compared in Figure 10 and Table 7. The GBDT model performed well, as indicated in the statistical metrics. The results of the analyses of the accuracy demonstrated that the GBDT model exhibited the highest inversion accuracy for the Chl-a concentration, with an RMSE of 1.626; the remote sensing inversion model for the Chl-a concentration based on the GBDT algorithm exhibited advantages in the inversion accuracy.
The GBDT model performed better than the B4/B1 model and the other models tested. While the B4/B1 model reasonably estimated the Chl-a concentration (R 2 = 0.706; Figure 10b), it exhibited deviations at high and low Chl-a concentrations. The floating algae index (FAI), a water index strongly correlated with Chl-a which was proposed by Hu in 2009 [44], exhibited a relatively large deviation from the in situ data (R 2 = 0.591, RMSE = 2.226 µg/L, MAPE = 35.884%, and MAE = 1.843 µg/L) ( Table 6; Figure 10f).   Among the machine learning algorithms tested, the GBDT algorithm exhibited the best performance, while the SVM had the poorest performance (R 2 = 0.527, RMSE = 2.986 µg/L, MAPE = 39.784%, and MAE = 2.085 µg/L). Given the same input variables as the GBDT model, the neural network had a higher inversion accuracy than the traditional statistical models (R 2 = 0.706), showing a capacity to estimate the concentration of Chl-a in the coastal water. In general, the neural network tended to underestimate the Chl-a concentration, and the reason for this might be that the samples used to train and validate the neural network were inadequate. In a previous study (Song et al. [45]), the artificial neural network performed well for the estimation of the TSM and Chl-a concentration, but it required a large dataset for training. With a smaller dataset and faster computation, the GBDT model could generate a prediction algorithm, allowing better generalization and thus making it more appropriate for estimating the temporal variation of the Chl-a concentration than other models.
The concentration of Chl-a in the coastal sea surface of the Beibu Bay of Guangxi obtained by remote sensing inversion ranged from 0 to 35 µg/L, similar to the field survey results of Yang Bin and Zhong Qiuping et al. [46], indicating that the retrieval effect of the selected remote sensing inversion model for the Chl-a concentration was close to the field data, and the remote sensing inversion results had high reference significance. The average distribution of the Chl-a concentration in the coastal waters of the Guangxi Beibu Gulf is shown in Figure 11. The concentration of coastal sea surface chlorophyll was high in the nearshore waters and low in the offshore waters, and it gradually decreased from north to south. Nutrients in the Beibu Gulf in Guangxi are mainly transported from coastal land sources [47], and the pollution inputs from the Nanliu River are the largest. The Nanliu River basin is large, bringing together industrial, agricultural, and urban sewage from adjacent land, which carries a large amount of nutrients. According to the Marine Environment Quality Bulletin of the Guangxi Zhuang Autonomous Region, in 2016, the inflow COD Cr of the Nanliu River in 2016 was 174,049 t, and the total amount of ammonia nitrogen, nitrate nitrogen, and nitrite nitrogen was 9868 t. The estuarine and coastal zones are transitional zones between land and sea with relatively shallow water depths, so river inflows with high-input terrigenous nutrient content mix with offshore salt water. The annual average SST is about 20.3~29.9 • C. High light intensity and high SST promote the growth and reproduction of phytoplankton and algae, resulting in an increase in the Chl-a concentration [48]. SST promote the growth and reproduction of phytoplankton and algae, resulting in an increase in the Chl-a concentration [48]. The runoff diluted water lifts the level in the nearshore sea, and the offshore sea level is relatively low; nutrients then plume out to sea from the estuary. The Beibu Gulf in Guangxi has a tropical and subtropical monsoon climate. The monsoon strengthens the emergence of a current alongshore area, which is also affected by the Coriolis force, and the seaward runoff moves westward along the coast. Therefore, from the perspective of the source, the concentration of Chl-a in the coastal waters of the Beibu Gulf in Guangxi The runoff diluted water lifts the level in the nearshore sea, and the offshore sea level is relatively low; nutrients then plume out to sea from the estuary. The Beibu Gulf in Guangxi has a tropical and subtropical monsoon climate. The monsoon strengthens the emergence of a current alongshore area, which is also affected by the Coriolis force, and the seaward runoff moves westward along the coast. Therefore, from the perspective of the source, the concentration of Chl-a in the coastal waters of the Beibu Gulf in Guangxi was higher in the nearshore water and lower in the offshore water due to the input of terrestrial nutrients. From the perspective of transport, the nutrients in the Beibu Gulf in Guangxi diffuse from the estuary to the offshore sea. Under the influence of the monsoon, the nutrients flow westward along the coast, and the concentration of Chl-a gradually decreases when moving westward.

Temporal Variation of Chl-a
The concentration of Chl-a in the coastal sea surface of the Beibu Gulf in Guangxi exhibited strong seasonal changes, with the following ranking being apparent: summer > autumn > spring and winter ( Figure 12). The Chl-a concentration in summer and autumn was mainly affected by the climate [49,50]. In summer, the Beibu Gulf receives abundant rainfall. The wet season ranges from July to September, and the average monthly precipitation exceeds 100 mm, accounting for approximately 55-70% of the total annual rainfall. River runoff is the largest in summer, and many rivers along the coast flow into the Beibu Gulf, taking agricultural and industrial wastewater into the gulf. The inflows have strong diffusion force, carrying large amounts and high concentrations of nutrients. As the sea level is raised by the runoff, the resultant concentration gradient strengthens the water exchange and increases the thickness of the mixing layer. There are high temperatures in summer, with the highest SST being up to 34.1 °C. Under such environmental conditions, phytoplankton proliferate in large numbers. Therefore, the concentration of Chl-a was the highest in the summer, and the area with a high value was concentrated in the estuarine area of the region. The southwest monsoon prevails in summer, generating a coastal component of the wind force, and it flows along the coast, driven by the monsoon [51]. The runoff raised the sea level and further promoted southwest flow along the coast by the gradient in the sea level. There- The Chl-a concentration in summer and autumn was mainly affected by the climate [49,50]. In summer, the Beibu Gulf receives abundant rainfall. The wet season ranges from July to September, and the average monthly precipitation exceeds 100 mm, accounting for approximately 55-70% of the total annual rainfall. River runoff is the largest in summer, and many rivers along the coast flow into the Beibu Gulf, taking agricultural and industrial wastewater into the gulf. The inflows have strong diffusion force, carrying large amounts and high concentrations of nutrients. As the sea level is raised by the runoff, the resultant concentration gradient strengthens the water exchange and increases the thickness of the mixing layer. There are high temperatures in summer, with the highest SST being up to 34.1 • C. Under such environmental conditions, phytoplankton proliferate in large numbers. Therefore, the concentration of Chl-a was the highest in the summer, and the area with a high value was concentrated in the estuarine area of the region. The southwest monsoon prevails in summer, generating a coastal component of the wind force, and it flows along the coast, driven by the monsoon [51]. The runoff raised the sea level and further promoted southwest flow along the coast by the gradient in the sea level. Therefore, in summer, nutrients along the Beibu Gulf are transported from east to west. The Chl-a concentration declined gradually from east to west. In autumn, rainfall decreases, and runoff into the sea decreases. The river-diluted water contracts to the shore, and the concentration of Chl-a decreases compared with that in the summer.
In spring and winter, the river runoff was the smallest, the terrestrial nutrient input was lower, the sea surface temperature was the lowest (19.0-24.0 • C), the growth of phytoplankton was inhibited, and the Chl-a concentration in the sea surface was the lowest. The wind is light over the Beibu Gulf in spring, and the effect of the Beibu Gulf monsoon on aquatic mixing is greatly reduced, causing a smaller mixing layer. Additionally, controlled by the reduction in flow, the Chl-a concentration is generally low in the spring and winter in the Beibu Gulf. The climate data of the sampling point (GX10 referenced as an example) on the sensing date of the satellite is shown in Table 8.

Conclusions
Using Landsat 8 OLI remote sensing images combined with measured Chl-a concentrations, the GBDT model was used to study the coastal waters of the Beibu Gulf in Guangxi and analyze the spatial-temporal distribution of the Chl-a concentration. The main research results are as follows:

1.
Compared with the performance of different models, the GBDT model can significantly improve the accuracy of Chl-a concentration inversion, proving that it can be a new method for remote sensing inversion of the water quality parameters. When B4, B3 + B4, B3, B1 − B4, B2 + B4, B1 + B4, and B2 − B4 were considered the characteristic variables of the GBDT model, the inversion accuracy of the model was the highest (MAE = 0.998 µg/L, MAPE = 19.413%, RMSE = 1.626 µg/L, and R 2 = 0.778).

2.
The spatial distribution of the Chl-a concentration was highest in the nearshore and lowest in the offshore waters in the Beibu Gulf in Guangxi. The Chl-a concentration was highest in the summer, and the concentration in autumn was lower, while concentrations in spring and winter were the lowest. The ranking of Chl-a concentrations, from high to low, across multiple bays was as follows: Nanliu River Estuary Bay, Dafeng River Estuary Bay, Qinzhou Bay, Beihai Pearl Harbor, and Fangcheng Bay.
Limited by the revisiting period of the satellite and the quality of the satellite images, the data used to train and validate the GBDT model may be relatively small in size, which may have influenced the inversion accuracy of the model. To estimate and learn the spatial and temporal distribution of Chl-a concentrations more precisely, an increased amount of data may be included in future work in combination with multi-source satellite remote sensing data.