Next Article in Journal
Integration of Energy Simulations and Life Cycle Assessment in Building Refurbishment: An Affordability Comparison of Thermal Insulation Materials through a New Sustainability Index
Previous Article in Journal
A Digital Product Passport for Critical Raw Materials Reuse and Recycling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Calibration of Sentinel-2 Surface Reflectance for Water Quality Modelling in Binh Dinh’s Coastal Zone of Vietnam

1
Vietnam National Space Center (VNSC), Vietnam Academy of Science and Technology (VAST), 18 Hoang Quoc Viet, Cau Giay, Hanoi 122100, Vietnam
2
Environmental Insitute of Technology (EIT), Vietnam Academy of Science and Technology (VAST), 18 Hoang Quoc Viet, Cau Giay, Hanoi 122100, Vietnam
*
Author to whom correspondence should be addressed.
Sustainability 2023, 15(2), 1410; https://doi.org/10.3390/su15021410
Submission received: 6 November 2022 / Revised: 29 December 2022 / Accepted: 3 January 2023 / Published: 11 January 2023

Abstract

:
Coastal zones are critically important ecosystems that are closely tied to human activities, such as tourism, urbanization, transport, and aquaculture. However, managing and monitoring sea water in the coastal areas is often challenging due to the diversity of the pollution sources. Traditional approaches of onsite measurement and surveys have limitations in terms of cost, efficiency and productivity compared with modern remote sensing methods, particularly for larger and longer observations. Optical remote sensing imagery has been proven to be a good data source for water quality assessment in general and for seawater studies in particular with the use of advanced techniques of data processing such as machine learning (ML) algorithms. However, optical remote sensing data also have their own disadvantages as they are much affected by climatic conditions, atmospheric gas and particles as a source of noise in the data. This noise could be reduced, but it is still unavoidable. This study aims to model seawater quality parameters (total suspended solids (TSS), chlorophyll-a (chla), chemical oxygen demand (COD), and dissolved oxygen (DO)) along a 134 km sea coastal area of the Binh Dinh province by applying the current robust machine learning models of decision tree (DT), random forest (RF), gradient boosting regression (GBR), and Ada boost regression (ABR) using Sentinel-2 imagery. To reduce the atmospheric effects, we conducted onsite measurements of sea surface reflectance (SSR) using the German RAMSES-TriOS instrument for calibration of the Sentinel-2 level 2A data before inputting them to the ML models. Our modeling results showed an improvement of the model accuracy using calibrated SSR compared with the original Sentinel-2 level 2A SSR data. The RF predicted the most accurate seawater quality parameters compared with in situ field-measured data (mean R2 = 0.59 using original Sentinel-2 level 2A SSR and R2 = 0.70 using calibrated SSR). The chla was the most precise estimate (R2 = 0.74 when modelled by the RF model) flowing by DO, COD and TSS. In terms of seawater quality estimation, this accuracy is at a good level. The results of the seawater quality distributions were strongly correlated with coastal features where higher values of TSS, chla, COD, and DO are near the river mouths and urban and tourist areas. These spatial water quality data could be extremely helpful for local governments to make decisions when the modelling is continuously conducted (using big data processing), and it is highly recommended for more applications.

1. Introduction

1.1. Seawater Quality Challenges

Coastal waters make up only 0.5% of the ocean’s volume and at most 8–10% of its surface, but they are much more significant in terms of ecological, economic and social values [1]. Economic aspects such as tourism, coastal recreation, transportation, aquaculture, and property values depend on coastal waters [2]. Additionally, coastal waters are essential habitats for numerous marine animals, supporting biodiversity by creating complex ecosystems [3]. Vietnamese marine exclusive economic zones are located on the long coastline of 3260 km and have an area of more than 1 million km². However, this also makes the country extremely vulnerable to natural disasters, particularly when the effects of climate change are expected to increase in the future. Nearly 130 species of fish are of great economic importance among the more than 2000 fish species living in the Vietnamese seas. There is an annual allowable catch of 50,000–60,000 and 60,000–70,000 tons from the production of 1600 species of crustaceans and 2500 species of mollusks, respectively. Nearly 50% of domestic product is made up of marine products of the coastal and maritime provinces, corresponding to approximately half of the country’s GDP [4].
We are now in an “Ocean Era”, where utilizing marine resources and shipping have grown in significance for socio-economic development. The more the economic development in coastal and marine environments are accelerated, the greater the damage the ocean receives. The activities on the land of, the exploration of and the exploitation of resources on the continental shelf and seabed can discharge a large amount of pollutants and toxic substances into the marine environment [5]. Wastes of continental origin carried into the seas include domestic suspended solids and hospital waste from urban and concentrated residential areas, mine waste, waste from industrial zones, pesticides from agricultural production, and organic waste from coastal agriculture areas. Most of these wastes have not been properly treated [6] and may be directly discharged into rivers flowing into the sea, carrying sediments, plastic, chemicals, metals, oil residue, and even radioactive substances [5,6,7,8,9]. For example, in Vietnam, according to the estimate of the Institute of Mechanics, just the solid waste directly dumped into the sea has reached 5200–10,300 tons per day [10]. These thoughtless actions have had an inevitable result: pollution and water quality decline. In more than 80% of the nearby coastal ports, the environmental quality did not meet the standards for the environmental protection of functioning marine zones [5,9].
Pollution by short or longer lasting chemicals with cumulative effects on the coastal marine environment could be extremely damaging to coastal waters [11]. More than 80% of all marine pollution comes from land-based sources, which are mainly from traffic, agriculture and industries [12]. In parts of the world, manufacturing facilities discharge toxic waste into the ocean, including mercury. Sewage, along with plastic items, contributes to ocean pollution when it is negligently dumped into the ocean [13]. The operations associated with aquaculture also result in the release of antibiotic residues as well as nutrients from silage and slurry manure [14]. Numerous minor sources such as septic tanks, automobiles, vehicles, and boats as well as larger ones such as farms, ranches and forested regions are all non-point sources of pollution.
The ocean is one of the most important parts of the human environment with habitat deterioration and public health problems being closely linked to water contamination. In recent years, there has been much focus on the declining water quality caused by a rise in contaminants such as heavy metals, pesticides and infectious and non-communicable diseases. Monitoring these hazards in a timely manner can prevent both habitat destruction and harm to public health. When choosing treatment options, planning, operating, and reusing water or waste, it is necessary to consider the concentration of wastewater elements and indications of infectious diseases. To implement remedial procedures or keep track of public health, it is necessary to monitor the changing concentration of contaminants over time. Scientists have methods to assess and monitor seawater quality [15,16,17,18], but another method using biosensors has gained a lot of attention in recent years due to its effectiveness as an analytical instrument and its superior capabilities for monitoring pollution and public health. This testing is quick, easily implemented, inexpensive, onsite, and only needs a small instrument to monitor multiple pollutants at once. In order to detect targets based on electrical, thermal and other signals, biosensors often use biochemical reactions aided by biological receptors, including bacteria, enzymes and antibodies. In order to identify pollutants in wastewater such as organics, inorganics, hazardous pollutants, microbes, and non-communicable and infectious diseases, biosensors are a crucial method [19,20,21,22,23,24].

1.2. Remote Sensing Approaches in Seawater Quality

In the early 1970s, remote sensing (RS) techniques for monitoring water quality (WQ) began [25], with the spectral and thermal differences in emitted energy from water surfaces being measured with these early techniques. In recent decades, researchers and scientists mainly paid attention to improving inversion and machine learning methods using various remote sensing data sources (multi- and hyperspectral sensors) to strive for high-accuracy estimations of water quality parameters [26]. Remote sensing applications can be applied for inland [27], sea [28,29,30] and brackish water [31], and the spatial resolution of applied RS data normally corresponds to the scales of the study areas, ranging from global [32] to small river tributaries and ponds [33,34]. There are hundreds of WQ indicators; however, the major factors affecting WQ in water bodies are “suspended sediments (turbidity), algae (i.e., chlorophylls, carotenoids), chemicals (i.e., nutrients, pesticides, metals), dissolved organic matter (DOM), thermal releases, aquatic vascular plants, pathogens, and oils” [25], all of which are commonly found in RS applications. The gaps in WQ parameters that can be estimated using remote sensing techniques are gradually narrowed by increasing the parameters that are remotely extracted. However, there is no evidence of synthetic aperture radar (SAR) having a remote sensing ability for WQ parameter estimation. The hyperspectral RS technique has its advantages in WQ parameter detection and has been widely used to evaluate the water quality conditions of various aquatic ecosystems [35], and detecting chla is a very common application of the hyperspectral RS technique.
Nowadays, the RS technique is improving to an advanced level in terms of both data acquisition and data processing techniques [36]. Obviously, more data are available, including commercial and open sources, with a large number of spatial/temporal and spectral options. This situation creates a favorable condition for research scientists to use these RS data at low or no cost with continuous observations [37] in comparison to traditional point sampling methods. However, the RS technique has several limitations as well. QW parameters are complex and rapidly vary over time, and the RS technique can detect just some of them when they are sensitive to the reflectance of radiation on the water surface. In addition, a well-known RS technique for WQ detection using optical sensors depended on sunlight and was affected (with lost information) by clouds and cloud shadows, resulting in the inability to observe at night [38]. Spatial and temporal resolutions of RS data are sometimes not adequate for detecting the changing WQ parameters over space and time, hence requiring large in situ datasets to calibrate the remote sensing-based extractions to ensure the useable accuracy of the study results. There is still difficultly in re-implicating previous WQ study outcomes as WQ quickly varies from region to region; hence the re-implication of other methods and models could lead to larger uncertainties.
Despite these issues, the RS technique has improved in its abilities in WQ parameter estimates and monitoring, particularly in larger areas. However, due to the complexity of coastal marine ecosystems [39] (with hundreds of WQ parameters), only some of the WQ indicators can be currently extracted using the RS technique [40]. Although the RS applied in this field has a long historical development from statistical regression models (linear/multi-linear regression) and parametric regression methods to current machine learning (ML) models, there are still gaps in this field.
Machine learning methods have been proven as a result of their advancement in water quality modelling. Common applications of ML approaches are for predicting the concentrations of suspended solids (SS), chlorophyll-a (chla) and turbidity. The commonly used models are the artificial neural network (ANN), random forest (RF), cubist regression (CB), and support vector regression (SVR) models [39]. Peterson et al. (2019) modeled WQ parameters (applied for Landsat imagery) including blue-green algae phycocyanin, total suspended solids (TSS), formazin nephelometric units (FNU), and total dissolved solids (TDS) using the five ML models of multi-linear regression (MLR), partial least-squares regression (PLSR), Gaussian process regression (GPR), support vector regression (SVR), and extreme learning machine regression (ELR) [41].
Sun et al. (2022) applied the back-propagation (BP), GPR, random forest regression (RFR), and SVR models to predict total nitrogen, total phosphorus and chemical oxygen demand (COD) using hyper spectral imagery [42]. Compared with ML techniques, physical-based models depend on the inherent optical properties (IOP) of water and on extensive field data [43], and it is challenging to derive the IOP from satellites [39]. On the other hand, ML models may require fewer data for their training procedure [44]. A current gap in the RS data processing is not in the modelling technique but in the RS data pre-processing of atmospheric corrections due to a lack of ground truth data for validations. Hence, we conducted onsite field sea surface reflectance measurements supporting the provided atmospheric correction and hypothesized that the SWQ parameters could be estimated at higher accuracy. This is a novel aspect of this study.

1.3. Objectives of the Study

To evaluate common remote-sensing based WQ parameters in the near coastal zone, we rely on machine learning models integrated with Sentinel-2 data by performing both RS surface reflectance and WQ parameter calibrations for the 2020–2021 periods. In situ surface reflectance and WQ parameters were measured in the sea corresponding to the time of the Sentinel-2 acquisitions. Section 2 describes the materials and methods, including a description of study area, data sources, data processing, and the used machine learning models. Section 3 presents a discussion of the study results, and Section 4 provides conclusions and remarks.

2. Materials and Methods

2.1. Study Area

Binh Dinh has an area of 6025 km2 with a coast that is 134 km long, stretching north from Quy Nhon city to the Hoai Nhon district. The province has a number of attractive tourist beaches and a unique coast interspersed with many lagoons and bays, which are favorable locations for tourism development and aquaculture. However, due to the effects of rain, floods and typhoons in recent years, the coast of Binh Dinh has had significant erosion, with a serious disruption to people’s lives in the province [45,46]. Figure 1 presents the location of Binh Dinh Province in Vietnam, the sampling locations and the coastal seawater modelled area on a map. In the study area, we collected data from 147 points and classified them into 2 groups, with 2020 as blue circles and 2021 as red circles.
The Binh Dinh province has a complex topography, descending from west to east. Behind the midland and coastal regions, the western most portion of the province is the hilly region on the eastern border of the Nam Truong Son mountain range. High mountain ranges, low hills with small valleys perpendicular to the Truong Son mountain range, basins, and coastal plains split by the mountains are the common topographical formations. The main topographic forms of the province are highland, hill, delta, and coastal area. As one of the gateways to the sea of the Central Highlands provinces, the sea is crucial to people’s lives. With 134 km of coastline, Binh Dinh has many beautiful beaches, some of them being hundreds of hectares wide, still very unspoiled, and having smooth white sand. Its coastal area includes sand dunes; the dunes running along the coast have an average width of about 2 km, with their shapes and extent changing over time. There are river mouths with water exchanges between the rivers and the sea. Hence, the sea not only provides people resources and helps trading with foreigners, but it is also a great resource for the development of marine resorts [45,46]. In recent studies, the sea water quality of Binh Dinh was found to be seriously polluted. There are 61 erosive locations 16,507 m long in the Binh Dinh province. There are 54 points of river bank erosion with a total length of 14.2 km, and the coast has 7 eroded points with a total length of 2315 m [46]. The Binh Dinh coastal areas are economically critical but suffer from serious water pollution, which is the reason for its choice as a case study.

2.2. Data Collection

2.2.1. Remote Sensing Data

The Sentinel-2A satellite was launched on 23 June 2015 and the Sentinel-2B satellite was launched on 7 March 2017 as part of the European Commission’s Copernicus program. The surface reflected radiance values are measured in 13 bands from visible to short-wave infrared bands by multispectral instruments (MSI) mounted in both the Sentinel-2A and B satellites. Land monitoring, climate change, emergency management, and security are the main themes of the Sentinel-2 mission [47]. The water quality measurement was conducted in September and October in 2020 and 2021. We downloaded 8 images from Sentinel-2A and B that were sensed in September, October and November in 2020 and 2021 when the study areas were cloud-free. The image sensing data of the Sentinel-2A and B used for this study are summarized in Table 1, including the information of the central wavelength, bandwidth and spatial resolution of each sensor.

2.2.2. Onsite Measurement of Seawater Quality (TSS, Chla, COD and DO), Sea Surface Reflectance

Field data were collected at 147 sites, with the first stage being in September 2020 and the second stage being in October 2021. The positions of these 147 sampling points are shown on the map in Figure 1. The water quality parameters of total suspended solids (TSS), chlorophyll-a, chemical oxygen demand (COD), and dissolved oxygen concentration (DO) were measured at every station. Water samples were collected at a depth of 0.2 m then filtered on site using Whatman GF/F glass fiber filters (47 mm diameter, nominal pore size 0.45 µm) and using low vacuum pressure with the volume of filtered seawater being 500 mL. The filter was flushed with 50 mL of distilled water to remove residual salt. The DO of the seawater was measured on site with a Hanna HI 9147-04 dissolved oxygen meter.
The TSS concentration was determined by the gravimetric method [49]. The Whatman GF/F filters were pre-prepared in the laboratory by rinsing each filter with 50 mL of distilled water followed by drying at 60 °C in an oven for over 12 h. We stored the filtered TSS samples in a cool dry place, where they were dried to a constant weight within the tolerance weight limit of 0.001 mg/L at 103–105 °C. We then took them back to the laboratory and gravimetrically measured them.
To determine the chla concentration in the seawater samples, filters were placed in falcon tubes, which were wrapped in aluminum foil and stored in an ice box containing ice packs until analysis. The chlorophyll-a was extracted by the 90% acetone from the filter. The extracted liquid was centrifuged and kept still for 12 h, and the supernatant was then measured in a UV-VIS spectrophotometer (model Labomed UVS-2700, Los Angeles, CA, USA) by the spectrophotometric trichromatic method [49]; the chla was calculated using the absorbance at 750 nm, 663 nm, 645 nm, and 630 nm. The absorbance reading at 750 nm is a correction for turbidity. Calculate the concentrations of chla in the extracts (Ca) by inserting the corrected optical densities in the following equations [50]:
C a = 11.85 × a b s o r b a n c e   664 1.54 × a b s o r b a n c e   647 0.08 × a b s o r b a n c e   630
where absorbance 664, 647 and 630 are the corrected optical densities (with a 1 cm light path) at the respective wavelengths.
After determining the concentration of chla in the extract, calculate its amount per unit volume as follows [50]:
c h l a m g m 3 = C a x   e x t r a c t   v o l u m e   L V o l u m e   o f   s a m p l e   m 3  
The in situ irradiance within the sea water column was obtained using a RAMSES-TriOS hyperspectral spectroradiometer (TriOS GmbH, Oldenburg, Germany) at the same time of water sampling for the seawater quality parameter determinations. A description of the RAMSES-TriOS and measurement can be found in [51], and in this study, we conducted a similar operation.

2.3. Workflow of Methodology

The main concept is how to connect the remote sensing spectral data (water columns) with the water quality parameter concentrations despite the non-linear relationship between them. We relied on current machine learning models to regress this relation on the Sentinel-2 images and to map the WQ parameters. To accomplish this study goal, we designed the procedures for the methodological approach (Figure 2), which comprised three main parts: (1) the Sentinel-2 data processing, where the Sentinel-2 top-of-atmospheric data are processed to acquire surface reflectance (at the 147 water quality sampling points) using the Sen2cor tool of ESA (gaining level 2A surface reflectance data) and are calibrated using machine learning models and field-measured seawater surface (SSR) reflectance data (gaining calibrated SSR reflectance data); (2) machine learning modelling, establishing the regression of the level 2A surface reflectance and calibrated SSR reflectance data with field-measured water quality parameters (WQP); and (3) simulating WQPs and mapping the seawater quality (SWQ) distribution. As the Sentinel-2 re-processing is a common task reported in the literature, in the following sub-sections, we present the machine learning models we used, as well as the accuracy assessments and WQ mapping tasks.

2.3.1. Machine Learning Models

We utilized the decision tree (DT), random forest (RF), gradient boosting regression (GBR), and Ada boost regression (ABR) algorithms, four commonly used and robust models, for both Sentinel-2 data calibration and SWQ estimation. In all of the ML modelling, we separated 70% of ground truth data for the training phase and 30% for the model testing phase. The details of the ML models are as follows:
  • Decision tree (DT): A decision tree is an algorithm for the solution of classification and regression problems and has its origins in machine learning theory. The basic concept of a DT is to split a complex decision into several simple ones, possibly leading to a solution that is easier to interpret [52]. In a decision tree method, features of data (i.e., sentinel-2 bands) are predictor variables, while the WQ parameter to be estimated is referred to as the target variable. When the target variable is discrete, it is known as a decision tree classification; when the target variable is continuous, it is known as a decision tree regression [53].
  • Random forest (RF): Breiman (2001) proposed the random forest (RF) classifier as a nonparametric and ensemble technique in 2001. RF is a “combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest” [54]. Because the model can consist of many decision trees and each tree is established from a random subset of training data with a random subset of predictor variables, the RF is distinguished from traditional statistical methods [55]. Furthermore, compared with other decision tree methods, it has the advantage of using fully grown trees that are not pruned [56], so it is highly recommended for use.
  • Gradient boosting regression (GBR): The weak performances of other machine learners can be boosted by a GBR. It is an ensemble-based decision tree method, and each regression tree learns the residual of each tree conclusion. The main aim of GBR modelling is to reduce the model residual along the gradient direction from previous residuals in the model performance. Mixed data types can be handled. For the final results, the model integrates outputs from all regression trees [54,57].
  • Ada boost regression (ABR): The ABR is a boosting algorithm proposed for regression problems [58]. It came from the idea of filtering out the examples with a relative estimation error that is higher than the pre-set threshold values and afterwards follows the Ada boost procedure [59]. Because Solomatine and Shrestha (2004) recommended the ABR for hydrological modelling, we tested it and compared it with the other above ML algorithms.

2.3.2. Accuracy Assessment

To evaluate the quality of model performance, we used four error indicators, including the coefficient of determination (R2) solved by Equation (3) and the mean square error (MSE) calculated by Equation (4), root mean square error (RMSE) by Equation (5), absolute Bias values indicating higher uncertainty (Equation (6)) and the adjusted R2 (R2adj). The R2 varies from minus infinity to 1. The model performance approaches perfection when the R2 is reaching asymptotic values, i.e., 1. Smaller M S E , R M S E and B i a s values indicate better model performance [60,61,62,63].
R 2 = 1 i = 1 n y i y ^ i 2 i = 1 n y i y ¯ i 2
M S E = 1 n i = 1 n y i y ^ i 2
R M S E = i = 1 n y i y ^ i 2 n
B i a s = 1 n i = 1 n y i y ^ i
where y ^ i represents the predicted value of y i and y ¯ i is the mean of observed data. n is the number of predicted values. In addition, we used the adjusted R2 calculated as in [64] for the evaluation of SSR calibration as the MSE, RMSE and Bias coefficients are not suitable for small SSR values.

2.3.3. Water Quality Mapping Method

We generated a spatial SWQ distribution for the entire study area based on the best model regression result of the SWQ parameters with the SSR data in the modelling phase. For water and land separation, we used the normalized different water index (NDWI) by applying a threshold [40]. We used the open-source GIS software QGIS to generate all SWQ maps. As the SWQ quality can sharply vary in spatial terms, for a better map presentation, the natural break (Jenks) classification was applied.

3. Results and Discussions

3.1. Machine Learning Model Calibration Using RAMSES-TriOS Measurement

Table 2 shows the goodness of fit of the four ML models (at the calibration phase) by calculating the coefficient of determination (R2) and adjusted coefficient of determination (Ad. R2) as the determination of the proportion of Sentinel-2 seawater surface reflectance (SSR) processed at level 2A (as the dependent variable) and the field-measured seawater surface reflectance (the independent variable). Generally speaking, the R2 and R2adj values presented the highest agreement (most of them greater than 0.70) between the Sentinel-2 and the in situ-measured SSR at all wavelengths (B1-B8A). However, each model performed differently. The RF and GBR predicted the SSR slightly more accurately than the results of the DT and ABR for all of the nine bands. The mean R2 and Ad. R2 values indicate that the models estimated the longer wavelength bands (B5–B8A) compared with the shorter wavelength bands (B1–B4, with the lowest accuracy found in B3 and B4). The Ad. R2 values were marginally smaller than the R2 values of all models and bands. The Ad. R2 coefficient was used as it fairly reflects the correlation between the testing and predicted variables, having the ability to present minus values compared with the R2 statistic [65]. Although ML models are very common now, this experiment is unique as it was very difficult to conduct due to the restricted conditions of the times of field survey to obtain SSR data should they vary one hour around the sensing times of the Sentinel-2 (at 10:35 in Binh Dinh province) due to weather conditions (cloud-free, clear sky e.g.,), field work in the sea and expensive instruments used. Hence, more studies have been conducted on land rather than in seawater [66]. In addition, there are many uncertainty indicators (or coefficients), hence raising the difficulty in choosing suitable ones. In this experiment, we also calculated the MSE, RMSE and Bias; however, they were all zero when rounded at three decimals as they were computed from small SSR values (most being smaller than 0.15). Therefore, the MSE, RMSE and Bias might not be appropriate for statistics of small values, but the R2 and R2adj coefficients are effective.

3.2. Identification of Atmospheric Effect Remaining in Sentinel-2 Sea Surface Reflectance Level 2A Using RAMSE-TriOS Measurement

Figure 3 presents clear differences between the calibrated Sentinel-2 seawater surface reflectance (SSR) (red dots) with larger values in comparison with the in situ measured SSR data (dark blue dots) of the 147 total stations in the Binh Dinh sea. This likely indicated the effect of the atmosphere on the Sentinel-2 level 2A reflectance on the seawater surface (which can be called residuals after the Sen2cor data used). The residuals were approximately estimated as the areas between the red and dark blue polynomial curves of each wavelength band. In the shorter wavelength bands (B1–B4), the areas were larger than the longer wavelength bands (B5–B8A), but there was an exception for B6 and B7 with more variation with different stations. In general, the so-called residuals of the Sentinel-2 level 2A SSR was around 10% (0.1); however, they were not necessarily removed by ML models, although the models adjusted the Sentinel-2 level 2A SSR values to generate higher correlation between the training and predicted data. Therefore, we had similar red and dark blue polynomial curves. These adjusted Sentinel-2 level 2A SSR data will be used for modelling the water quality parameters later on. Theoretically, in case there is no interception of the particles, gases etc. in the atmosphere on the S2 received RS, the signals obtained by the S2 sensor will equal the signal recorded by the optical spectrum instruments measured at the same wavelengths, which are a very short distance from the seawater surface (we tried to manage the distances around 30 cm). However, our atmosphere is contaminated [67], even by heavy metals [68], and this will unavoidably increasingly affect any satellite sensors to some extent. Even Sen2cor cannot be completely corrected due to the effect of the atmosphere, which is designed for a unique potential of the Sentinel-2 data obtaining SR (level 2A) [69]. The Sentinel-2 level 2A data can still be improved [70] for certain purposes of investigations.

3.3. Machine Learning Modelling Water Quality Parameters using the Calibrated Sentinel-2 Level 2A

Cross-comparisons of the results of different ML models estimating water quality parameters of total suspended solids (TSS), chlorophyll-a (chla), chemical oxygen demand (COD), and dissolved oxygen (DO) scattered with the field measured data are shown in Figure 4. In the rows, the random forest (RF) model predicted all WQ parameters more precisely (higher convergence of the orange dots to the linear diagonal blue line) than the other models, and the model accuracy was degraded when followed by the GBR, ABR and decision tree (DT) models. In each column, the different WQ indicator parameters are predicted by the same models. The chla presented the most sensitive variable and had an ability to be accurately extracted from the Sentinel-2 data, as supported by Quang et al. [40]. The TSS was the most divergent parameter from the diagonal blue line, and the COD and DO were moderately correlated with the remote sensing data.
WQ modelling using remote sensing data is popular these days, but due to the highly varied aquatic environments, current ML models have limitations in the ability to estimate some WQ parameters that are sensitive to optical spectrum reflectance among the hundreds of WQ parameters [71]. The typical discussion around the data available for the ML modelling is critical, and more accurate ML models rely on larger training data and what WQ is predicted. A study by Nida et al. [72] recommended an ML model with a gradient boosting algorithm for a highly accurate WQ estimation. As the accuracy of any prediction is of concern, we used four uncertainty coefficients to evaluate our models in the following section.

3.4. Machine Learning Calibrations (R2, MSE, RMSE, Bias) for WQ Parameter Estimation

We conducted evaluations of the ML model performances based on the values of the coefficient of determination (R2), mean square error (MSE), root mean square error (RMSE) and absolute bias for both the original and calibrated S2 SSR inputs for comparisons (Table 3). The RF and Ada boost regression models presented the best models for the chla estimation (R2 = 0.9). On the other hand, the TSS model had the worst accuracy (with R2 = 0.60 and 0.52 in the case of RF and ABR, respectively) compared with the other WQ parameters. The COD was simulated slightly better than the TSS. The DO was acceptable with an R2 value of around 0.7, and the other error indicators were low. Interestingly, most of the model uncertainty indicators were improved to some extent using the calibrated S2 SR compared with using the original S2 level 2A SR in both different ML models and WQ parameters. This was not an optimum modelling, but we assume that the results are acceptable; more importantly, we found the appropriate ML model (RF) for WQ modelling among the tested models. The chla, COD and DO were more sensitive to the optical remote sensing data, which is why more studies have been conducted to investigate these parameters [27,31,73]. However, few studies have worked on the TSS parameter [31]. The improvement of the WQ parameters using calibrated S2 SSR data is the novel contribution of this study. Based on a recent investigation of a scholar search engine, a few similar studies have been performed, such as those by Ansper (2018), Vahtmäe et al. (2021) and Banerjee and Shanmugam (2021) [74,75,76].

3.5. Variable Importance of S2 Bands

The ML and deep learning models allow for a transition from the limiting white box/black box modelling processes, but these learning models still have challenges in detecting errors [77]. The behaviors of inputs are difficult to assess in the modelling procedures. Figure 5 shows the results of the bands’ importance, in other words, the sensitivities of the Sentinel-2 band to the QW parameters as target features, which was illustrated using the feature importance module included in the Scikit-learn package [78]. Band 3 (B3) was the most effective or important band in all ML models following by B4, particularly for the TSS estimations. All bands had some relative importance in each model. However, there was not much dominance for B1 and B4 in the TSS model as compared with B3 in other models. Assuming that good models will explore the most sensitive inputs and ignore the effects of unimportant inputs to generate accurate outputs, this should still be verified by more experiments to ensure this conclusion. Some similar tests for the Sentinel-2 band importance were conducted by Pham el al. [79] and Lui et al. [80].

3.6. Maps of Water Quality

Figure 6 shows the 2-year (2020–2021) mean WQ spatial distributions of the total suspended solids (TSS), chlorophyll-a (chla), chemical oxygen demand (COD), and dissolved oxygen (DO) in Binh Dinh’s coastal sea as the best results of the RF model regression. The border of the investigation area was set at 10–20 km seaward from the 147 water quality sample locations. The remote sensing coverage extends further out to sea, but we limited this based on our assumption that the model error will increase if examined further from the 147 sample points. As can be seen in the maps, the TSS values were higher (greater than 21 mg/m3) at the north and south of the map but lower in the middle (lower than 21 mg/m3) and remained at medium high values. In contrast, the maps of chla, COD and DO indicated lower values distributed in the north, which gradually increased to the south and then varied below 5 mg/m3. In more detail, there was a common trend of higher concentration of all four parameters in the Quy Nhon Bay, which receives water from three rivers in the Thi Nai lagoon before flowing to the bay. On the contrary, some areas with contiguous steeper inland terrain (but inclined southwards about 45 degrees, which might have been affected by the sea current) had very minor values for all four parameters. Interestingly, the areas with higher SWQ parameter concentration were strongly related to river discharge and closer to the urban zones (e.g., Quy Nhon city), where domestic water waste might be dumped to the sea. Although an investigation on the cause of SWQ distribution is out of the scope of this study, comparing the WQ spatial distribution results with the highlighted coastal geographic features of beaches, aquaculture and dense residential areas might reveal an increase of all four SWQ parameters near these features. These results were also supported by Quang et al. and Datta et al. [40,81].

4. Conclusions, Limitations and Future Research

This study provides insights into the effective use of optical remote sensing data for estimating seawater quality indicators based on the performances of advanced machine learning models. In addition, the atmospheric effects on optical remote sensors were investigated, and we concluded that even the Sentinel-2 data processed at level 2A (with already applied atmospheric corrections) can be further processed to obtain finer data when in situ measurements of sea surface reflectance are collected [82,83]. Our study results showed ML model accuracy variations, and the random forest (RF) is highly recommended for use. The maps of SWQ distribution linked to other geographic features inland could be useful for local authorities to make decisions based on comparisons to national coastal water quality benchmarks (such as national standards as mentioned in Pham et al. (2010) and Linh et al. (2015) [84,85]). Appropriate and rapid decisions are needed to support better coastal water management, not only along the Vietnamese coastline, but also in other regions in the world.
One of the main limitations of the ML models is that their processes are only black-and-white [86]. Hence, it is difficult to manage and change the model parameters. In addition, supervised ML models require large amounts of input data and computational resources [87]. For a multidisciplinary study, it is increasingly difficult to obtain these inputs. Another limitation is that the optical remote sensing data are affected by clouds and atmospheric particles. Gases in the atmosphere are unavoidable, and this problem is more serious in tropical regions with cloud cover for much of the year such as in Vietnam. The SAR sensors are not affected by weather and clouds (long wave lengths) [38], but there is not yet much evidence of the uses of SAR data for water quality assessment and estimations.
We recommend further work using big data processing for national and regional scales using ML methods and deep learning models that can deploy numerous remote sensing data sources and cloud computational platforms such as the Earth Engine Compute Unit (EECU), Google cloud and Amazon Web Service (AWS). These can enable individuals or small research groups to implement big research projects. For higher accuracy, longer observations (10 years or more) and more data for model training and validation are also strongly recommended. Once ML models are well calibrated, they could be used for other predictions in the future monitoring of water quality if the input data are continuously available.

Author Contributions

Conceptualization, N.H.Q. and N.T.D. (Nguyen Tran Dinh); methodology, N.H.Q. and L.T.S.; validation, N.H.Q. and N.T.D. (Nguyen Tran Dinh); formal analysis, N.H.Q., L.T.S. and N.T.D. (Nguyen Tran Dien); resources; writing—original draft preparation, N.H.Q., L.T.S. and N.T.D. (Nguyen Tran Dinh); writing—review and editing, N.H.Q. and L.T.S.; project administration, N.T.D. (Nguyen Tran Dien) and L.T.S.; funding acquisition, N.T.D. (Nguyen Tran Dien) and L.T.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Department of Science and Technology of the Binh Dinh province under the research project entitled “Researching on applications of remote sensing technology to support monitoring the seawater quality of Binh Dinh province for local aquaculture and vicinity”, grant number ĐTDLCN.11/20.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are available from the authors upon request.

Acknowledgments

The authors sincerely acknowledge the financial support from the national project “Researching on applications of remote sensing technology to support monitoring the seawater quality of Binh Dinh province for local aquaculture and vicinity” (grant number: ĐTDLCN.11/20).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hoepffner, N.; Zibordi, G. Remote Sensing of Coastal Waters, in Encyclopedia of Ocean Sciences, 2nd ed.; Steele, J.H., Ed.; Academic Press: Oxford, UK, 2009; pp. 732–741. [Google Scholar]
  2. Peng, M.; Oleson, K. Beach recreationalists’ willingness to pay and economic implications of coastal water quality problems in Hawaii. Ecol. Econ. 2017, 136, 41–52. [Google Scholar] [CrossRef] [Green Version]
  3. Freeman, A.M., III. The benefits of water quality improvements for marine recreation: A review of the empirical evidence. Mar. Resour. Econ. 1995, 10, 385–406. [Google Scholar] [CrossRef]
  4. MONRE. Report on Maritime Environment and National Islands in the 2016–2020 Period; Ministry of Natural Resources and Environment: Hanoi, Vietnam, 2021; pp. 1–160. (In Vietnamese) [Google Scholar]
  5. Wang, Z.; Bu, C.; Li, H.; Wei, W. Seawater environmental Kuznets curve: Evidence from seawater quality in China’s coastal waters. J. Clean. Prod. 2019, 219, 925–935. [Google Scholar] [CrossRef]
  6. Ahuja, S. Monitoring Water Quality: Pollution Assessment, Analysis, and Remediation; Newnes; Elsevier: Oxford, UK, 2013; pp. 1–374. [Google Scholar]
  7. El Zrelli, R.; Rabaoui, L.; Alaya, M.B.; Daghbouj, N.; Castet, S.; Besson, P.; Michel, S.; Bejaoui, N.; Courjault-Radé, P. Seawater quality assessment and identification of pollution sources along the central coastal area of Gabes Gulf (SE Tunisia): Evidence of industrial impact and implications for marine environment protection. Mar. Pollut. Bull. 2018, 127, 445–452. [Google Scholar] [CrossRef] [PubMed]
  8. Ortiz-Lozano, L.; Granados-Barba, A.; Solís-Weiss, V.; García-Salgado, M.A. Environmental evaluation and development problems of the Mexican Coastal Zone. Ocean. Coast. Manag. 2005, 48, 161–176. [Google Scholar] [CrossRef]
  9. Wang, Z.; Qi, G.; Wei, W. China’s coastal seawater environment caused by urbanization based on the seawater environmental Kuznets curve. Ocean. Coast. Manag. 2021, 213, 105893. [Google Scholar] [CrossRef]
  10. Hạ, T.Đ.; Hòa, N. Đánh giá chất lượng nước vùng cửa sông và biển ven bờ để định hướng giải pháp công nghệ xử lý phù hợp cho mục đích cấp nước sinh hoạt. Tạp Chí Khoa Học Công Nghệ Xây Dựng. 2011, 10, 9–2011. [Google Scholar]
  11. Abrol, Y.P.; Raghuram, N.; Sachdev, M.S. (Eds.) Agricultural Nitrogen Use and Its Environmental Implications; IK International Pvt Ltd: New Delhi, India, 2007; pp. 1–120. [Google Scholar]
  12. Cicin-Sain, B.; Balgos, M.; Appiott, J.; Wowk, K.; Hamon, G. Oceans at Rio+ 20, How well Are We Doing in Meeting the Commitments from the 1992 Earth Summit and the 2002 World Summit on Sustainable Development? DE (USA) Global Ocean Forum: Newark, NJ, USA, 2011. [Google Scholar]
  13. Ofiara, D.D.; Seneca, J. Biological effects and subsequent economic effects and losses from marine pollution and degradations in marine environments: Implications from the literature. Mar. Pollut. Bull. 2006, 52, 844–864. [Google Scholar] [CrossRef]
  14. Mateo-Sagasta, J.J.; Zadeh, S.M.; Turral, H.; Burke, J. Water Pollution from Agriculture: A Global Review. Executive Summary; Food and Agriculture Organization of the United Nations: Rome, Italy; International Water Management Institute on behalf of the Water Land and Ecosystems Research Program: Colombo, Sri Lanka, 2017; pp. 1–35. [Google Scholar]
  15. Adamo, F.; Attivissimo, F.; Carducci, C.G.C.; Lanzolla, A.M.L. A smart sensor network for sea water quality monitoring. IEEE Sens. J. 2014, 15, 2514–2522. [Google Scholar] [CrossRef]
  16. Bourouhou, I.; Salmoun, F. Sea water quality monitoring using remote sensing techniques: A case study in Tangier-Ksar Sghir coastline. Environ. Monit. Assess. 2021, 193, 1–12. [Google Scholar] [CrossRef]
  17. Melloul, A.; Goldenberg, L. Monitoring of seawater intrusion in coastal aquifers: Basics and local concerns. J. Environ. Manag. 1997, 51, 73–86. [Google Scholar] [CrossRef]
  18. Zompanti, A.; Grasso, S.; Sabatini, A.; Vollero, L.; Pennazza, G.; Santonico, M. A Multi-Sensor System for Sea Water Iodide Monitoring and Seafood Quality Assurance: Proof-of-Concept Study. Sensors 2021, 21, 4464. [Google Scholar] [CrossRef]
  19. Bernhard, K.; Stahl, C.; Martens, R.; Köhler, H.R.; Triebskorn, R.; Scheurer, M.; Frey, M. Two novel real time cell-based assays quantify beta-blocker and NSAID specific effects in effluents of municipal wastewater treatment plants. Water Res. 2017, 115, 74–83. [Google Scholar] [CrossRef]
  20. Chouler, J.; Cruz-Izquierdo, Á.; Rengaraj, S.; Scott, J.L.; Di Lorenzo, M. A screen-printed paper microbial fuel cell biosensor for detection of toxic compounds in water. Biosens. Bioelectron. 2018, 102, 49–56. [Google Scholar] [CrossRef]
  21. Ejeian, F.; Etedali, P.; Mansouri-Tehrani, H.A.; Soozanipour, A.; Low, Z.X.; Asadnia, M.; Taheri-Kafrani, A.; Razmjou, A. Biosensors for wastewater monitoring: A review. Biosens. Bioelectron. 2018, 118, 66–79. [Google Scholar] [CrossRef]
  22. Pasternak, G.; Greenman, J.; Ieropoulos, I. Self-powered, autonomous Biological Oxygen Demand biosensor for online water quality monitoring. Sens. Actuators B: Chem. 2017, 244, 815–822. [Google Scholar] [CrossRef]
  23. Peixoto, P.S.; Machado, A.; Oliveira, H.P.; Bordalo, A.A.; Segundo, M.A. Paper-Based Biosensors for Analysis of Water. In Biosensors for Environmental Monitoring; Rinken, T., Kivirand, K., Eds.; IntechOpen: London, UK, 2019. [Google Scholar] [CrossRef] [Green Version]
  24. Shen, J.; Zhou, X.; Shan, Y.; Yue, H.; Huang, R.; Hu, J.; Xing, D. Sensitive detection of a bacterial pathogen using allosteric probe-initiated catalysis and CRISPR-Cas13a amplification reaction. Nat. Commun. 2020, 11, 1–10. [Google Scholar]
  25. Ritchie, J.C.; Zimba, P.; Everitt, J. Remote sensing techniques to assess water quality. Photogramm. Eng. Remote Sens. 2003, 69, 695–704. [Google Scholar] [CrossRef] [Green Version]
  26. Wang, X.; Yang, W. Water quality monitoring and evaluation using remote sensing techniques in China: A systematic review. Ecosyst. Health Sustain. 2019, 5, 47–56. [Google Scholar] [CrossRef] [Green Version]
  27. Sharaf El Din, E.; Zhang, Y.; Suliman, A. Mapping concentrations of surface water quality parameters using a novel remote sensing and artificial intelligence framework. Int. J. Remote Sens. 2017, 38, 1023–1042. [Google Scholar] [CrossRef]
  28. Ramdani, F.; Wirasatriya, A.; Jalil, A. Monitoring The Sea Surface Temperature and Total Suspended Matter Based on Cloud-Computing Platform of Google Earth Engine and Open-Source Software. In IOP Conference Series: Earth and Environmental Science; IOP Publishing: Bristol, UK, 2021. [Google Scholar]
  29. Merchant, C.J.; Embury, O.; Bulgin, C.E.; Block, T.; Corlett, G.K.; Fiedler, E.; Good, S.A.; Mittaz, J.; Rayner, N.A.; Berry, D.; et al. Satellite-based time-series of sea-surface temperature since 1981 for climate applications. Sci. Data 2019, 6, 223. [Google Scholar] [CrossRef] [PubMed]
  30. Harmel, T.; Chami, M.; Tormos, T.; Reynaud, N.; Danis, P.A. Sunglint correction of the Multi-Spectral Instrument (MSI)-SENTINEL-2 imagery over inland and sea waters from SWIR bands. Remote Sens. Environ. 2018, 204, 308–321. [Google Scholar] [CrossRef]
  31. Sent, G.; Biguino, B.; Favareto, L.; Cruz, J.; Sá, C.; Dogliotti, A.I.; Palma, C.; Brotas, V.; Brito, A.C. Deriving Water Quality Parameters Using Sentinel-2 Imagery: A Case Study in the Sado Estuary, Portugal. Remote Sens. 2021, 13, 1043. [Google Scholar] [CrossRef]
  32. Maritorena, S.; Siegel, D.; Peterson, A. Optimization of a semianalytical ocean color model for global-scale applications. Appl. Opt. 2002, 41, 2705–2714. [Google Scholar] [CrossRef] [PubMed]
  33. Abd-Elrahman, A.; Croxton, M.; Pande-Chettri, R.; Toor, G.S.; Smith, S.; Hill, J. In situ estimation of water quality parameters in freshwater aquaculture ponds using hyperspectral imaging system. ISPRS J. Photogramm. Remote Sens. 2011, 66, 463–472. [Google Scholar] [CrossRef]
  34. Oron, G.; Gitelson, A. Real-time quality monitoring by remote sensing of contaminated water-bodies: Waste stabilization pond effluent. Water Res. 1996, 30, 3106–3114. [Google Scholar] [CrossRef]
  35. Koponen, S.; Pulliainen, J.; Kallio, K.; Hallikainen, M. Lake water quality classification with airborne hyperspectral spectrometer and simulated MERIS data. Remote Sens. Environ. 2002, 79, 51–59. [Google Scholar] [CrossRef]
  36. Quang, N.H.; Quinn, C.H.; Stringer, L.C.; Carrie, R.; Hackney, C.R.; Van Hue, L.T.; Van Tan, D.; Nga, P.T.T. Multi-Decadal Changes in Mangrove Extent, Age and Species in the Red River Estuaries of Viet Nam. Remote Sens. 2020, 12, 2289. [Google Scholar] [CrossRef]
  37. Usali, N.; Ismail, M. Use of remote sensing and GIS in monitoring water quality. J. Sustain. Dev. 2010, 3, 228. [Google Scholar] [CrossRef]
  38. Quang, N.H.; Tuan, V.A.; Hao, N.T.P.; Hang, L.T.T.; Hung, N.M.; Anh, V.L.; Phuong, L.T.M.; Carrie, R. Synthetic aperture radar and optical remote sensing image fusion for flood monitoring in the Vietnam lower Mekong basin: A prototype application for the Vietnam Open Data Cube. Eur. J. Remote Sens. 2019, 52, 599–612. [Google Scholar] [CrossRef] [Green Version]
  39. Hafeez, S.; Wong, M.S.; Ho, H.C.; Nazeer, M.; Nichol, J.; Abbas, S.; Tang, D.; Lee, K.H.; Pun, L. Comparison of machine learning algorithms for retrieval of water quality indicators in case-II waters: A case study of Hong Kong. Remote Sens. 2019, 11, 617. [Google Scholar] [CrossRef]
  40. Quang, N.H.; Nguyen, M.N.; Paget, M.; Anstee, J.; Viet, N.D.; Nones, M.; Tuan, V.A. Assessment of Human-Induced Effects on Sea/Brackish Water Chlorophyll-a Concentration in Ha Long Bay of Vietnam with Google Earth Engine. Remote Sens. 2022, 14, 4822. [Google Scholar] [CrossRef]
  41. Peterson, K.T.; Sagan, V.; Sidike, P.; Hasenmueller, E.A.; Sloan, J.J.; Knouft, J.H. Machine learning-based ensemble prediction of water-quality variables using feature-level and decision-level fusion with proximal remote sensing. Photogramm. Eng. Remote Sens. 2019, 85, 269–280. [Google Scholar] [CrossRef]
  42. Sun, X.; Zhang, Y.; Shi, K.; Zhang, Y.; Li, N.; Wang, W.; Huang, X.; Qin, B. Monitoring water quality using proximal remote sensing technology. Sci. Total Environ. 2022, 803, 149805. [Google Scholar] [CrossRef]
  43. Zhang, Y.; Wang, Y.; Wang, Y.; Xi, H. Investigating the impacts of landuse-landcover (LULC) change in the pearl river delta region on water quality in the pearl river estuary and Hong Kong’s coast. Remote Sens. 2009, 1, 1055–1064. [Google Scholar] [CrossRef] [Green Version]
  44. Kim, Y.H.; Im, J.; Ha, H.K.; Choi, J.K.; Ha, S. Machine learning approaches to coastal water quality monitoring using GOCI satellite data. GIScience Remote Sens. 2014, 51, 158–174. [Google Scholar] [CrossRef]
  45. Nguyen, K.L.; Nguyen, L.T.D.; Le, H.T.; Nguyen, D.L.; Vo, N.Q.T.; Le, V.P.; Nguyen, D.N.; Nguyen, T.T.T.; Pham, G.D.; Phuong, D.N.D.; et al. Assessing Impacts of Land Use Change and Climate Change on Water Resources in the La Vi Catchment, Binh Dinh Province. In TORUS 3—Toward an Open Resource Using Services: Cloud Computing for Environmental Data; Wiley: Hoboken, NJ, USA, 2020; pp. 191–210. [Google Scholar]
  46. Vo, N.D.; Vo, T.; Nguyen, C. Landsat image processing application for Binhdinh Shoreline Change. In International Conference on Asian and Pacific Coasts; Springer: Hanoi, Vietnam, 2019. [Google Scholar]
  47. Kuc, G.; Chormański, J. Sentinel-2 imagery for mapping and monitoring imperviousness in urban areas. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, 42, 43–47. [Google Scholar] [CrossRef] [Green Version]
  48. ESA. “MultiSpectral Instrument (MSI) Overview”. Sentinel Online. European Space Agency. Available online: https://earth.esa.int/web/sentinel/technical-guides/sentinel-2-msi/msi-instrument (accessed on 11 October 2022).
  49. Rice, E.W.; Baird, R.B.; Eaton, A.D.; Clesceri, L.S. Standard Methods for the Examination of Water and Wastewater; American Public Health Association: Washington, DC, USA, 2012; Volume 10. [Google Scholar]
  50. Sterman, N.T. Spectrophotometric and Fluorometric Chlorophyll Analysis; Experimental Phycology, A Laboratory Manual; Lobban, S.C., Chapman, D.J., Kremer, B.P., Eds.; Cambridge University Press: New York, NY, USA, 1988; pp. 35–39. [Google Scholar]
  51. Yang, C.; Ye, H.; Tang, S. Seasonal Variability of Diffuse Attenuation Coefficient in the Pearl River Estuary from Long-Term Remote Sensing Imagery. Remote Sens. 2020, 12, 2269. [Google Scholar] [CrossRef]
  52. Xu, M.; Watanachaturaporn, P.; Varshney, P.K.; Arora, M.K. Decision tree regression for soft classification of remote sensing data. Remote Sens. Environ. 2005, 97, 322–336. [Google Scholar] [CrossRef]
  53. Breiman, L.; Friedman, J.; Olshen, R.; Stone, C. Classification and regression trees. Wadsworth Int. Group 1984, 8, 452–456. [Google Scholar]
  54. Breiman, L. Random forests. Mach. Learn 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  55. Noi, P.T.; Degener, J.; Kappas, M. Comparison of multiple linear regression, cubist regression, and random forest algorithms to estimate daily air surface temperature from dynamic combinations of MODIS LST data. Remote Sens. 2017, 9, 398. [Google Scholar] [CrossRef]
  56. Quinlan, J.R. C4. 5, Programs for Machine Learning; Morgan Kaufmann Publishers: San Mateo, CA, USA, 2014. [Google Scholar]
  57. Brodley, C.E.; Utgoff, P. Multivariate Versus Univariate Decision Trees; Department of Computer Science, University of Massachusetts: Amherst, MA, USA, 1992; COINS Technical Report 92-8 January 1992. [Google Scholar]
  58. Solomatine, D.; Shrestha, D. AdaBoost. RT: A boosting algorithm for regression problems. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541), Budapest, Hungary, 25–29 July 2004; IEEE: Piscataway, NJ, USA, 2004. [Google Scholar]
  59. Freund, Y.; Schapire, R. Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference; ICML: Murray Hill, NJ, USA, 1996. [Google Scholar]
  60. Menard, S. Coefficients of determination for multiple logistic regression analysis. Am. Stat. 2000, 54, 17–24. [Google Scholar]
  61. Allen, D.M. Mean square error of prediction as a criterion for selecting variables. Technometrics 1971, 13, 469–475. [Google Scholar] [CrossRef]
  62. Kilgus, C.; Gore, W. Root-mean-square error in encoded digital telemetry. IEEE Trans. Commun. 1972, 20, 315–320. [Google Scholar] [CrossRef]
  63. Voinov, V.G.e.; Nikulin, M. Unbiased Estimators and Their Applications: Volume 1, Univariate Case; Springer Science & Business Media: Berlin, Germany, 2012; Volume 263. [Google Scholar]
  64. Mittlböck, M. Calculating adjusted R2 measures for Poisson regression models. Comput. Methods Programs Biomed. 2002, 68, 205–214. [Google Scholar] [CrossRef]
  65. Wooldridge, J.M. A note on computing r-squared and adjusted r-squared for trending and seasonal data. Econ. Lett. 1991, 36, 49–54. [Google Scholar] [CrossRef]
  66. Djamai, N.; Fernandes, R. Active learning regularization increases clear sky retrieval rates for vegetation biophysical variables using Sentinel-2 data. Remote Sens. Environ. 2021, 254, 112241. [Google Scholar] [CrossRef]
  67. Dadkhah-Aghdash, H.; Zare-Maivan, H.; Heydari, M.; Sharifi, M.; Lucas-Borja, M.E.; Naidu, R. Air pollution from gas refinery through contamination with various elements disrupts semiarid Zagros oak (Quercus brantii Lindl.) forests, Iran. Sci. Rep. 2022, 12, 1–11. [Google Scholar] [CrossRef]
  68. Uzoekwe, S.A.; Izah, S.C.; Aigberua, A.O. Environmental and human health risk of heavy metals in atmospheric particulate matter (PM10) around gas flaring vicinity in Bayelsa State, Nigeria. Toxicol. Environ. Health Sci. 2021, 13, 323–335. [Google Scholar] [CrossRef]
  69. Main-Knorn, M.; Pflug, B.; Louis, J.; Debaecker, V.; Müller-Wilm, U.; Gascon, F. Sen2Cor for sentinel-2. In Image and Signal Processing for Remote Sensing XXIII; SPIE: Bellingham, WA, USA, 2017. [Google Scholar]
  70. Tavares, M.H.; Lins, R.C.; Harmel, T.; Fragoso, C.R., Jr.; Martínez, J.M.; Motta-Marques, D. Atmospheric and sunglint correction for retrieving chlorophyll-a in a productive tropical estuarine-lagoon system using Sentinel-2 MSI imagery. ISPRS J. Photogramm. Remote Sens. 2021, 174, 215–236. [Google Scholar] [CrossRef]
  71. Quang, N.H.; Tuan, V.A.; Hang, L.T.T.; Dien, N.T.; Son, L.T.; Minh, N.N. Modelling seawater quality of rach gia bay of vietnam, using sentinel-2 imagery processed in the google earth engine. TNU J. Sci. Technol. 2022, 227, 88–96. [Google Scholar]
  72. Nasir, N.; Kansal, A.; Alshaltone, O.; Barneih, F.; Sameer, M.; Shanableh, A.; Al-Shamma’a, A. Water quality classification using machine learning algorithms. J. Water Process Eng. 2022, 48, 102920. [Google Scholar] [CrossRef]
  73. Loisel, H.; Vantrepotte, V.; Ouillon, S.; Ngoc, D.D.; Herrmann, M.; Tran, V.; Mériaux, X.; Dessailly, D.; Jamet, C.; Duhaut, T.; et al. Assessment and analysis of the chlorophyll-a concentration variability over the Vietnamese coastal waters from the MERIS ocean color sensor (2002–2012). Remote Sens. Environ. 2017, 190, 217–232. [Google Scholar] [CrossRef]
  74. Ansper, A. Sentinel-2/msi applications for european union water framework directive reporting purposes. Master Thesis, Faculty of Science and Technology, University of Tartu, Tartu, Estonia, 2018. [Google Scholar]
  75. Vahtmäe, E.; Kotta, J.; Lõugas, L.; Kutser, T. Mapping spatial distribution, percent cover and biomass of benthic vegetation in optically complex coastal waters using hyperspectral CASI and multispectral Sentinel-2 sensors. Int. J. Appl. Earth Obs. Geoinf. 2021, 102, 102444. [Google Scholar] [CrossRef]
  76. Banerjee, S.; Shanmugam. Novel method for reconstruction of hyperspectral resolution images from multispectral data for complex coastal and inland waters. Adv. Space Res. 2021, 67, 266–289. [Google Scholar] [CrossRef]
  77. Narwaria, M. Does explainable machine learning uncover the black box in vision applications? Image Vis. Comput. 2022, 118, 104353. [Google Scholar] [CrossRef]
  78. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  79. Pham, T.D.; Le, N.N.; Ha, N.T.; Nguyen, L.V.; Xia, J.; Yokoya, N.; To, T.T.; Trinh, H.X.; Kieu, L.Q.; Takeuchi, W. Estimating Mangrove Above-Ground Biomass Using Extreme Gradient Boosting Decision Trees Algorithm with Fused Sentinel-2 and ALOS-2 PALSAR-2 Data in Can Gio Biosphere Reserve, Vietnam. Remote Sens. 2020, 12, 777. [Google Scholar] [CrossRef] [Green Version]
  80. Liu, N.; Qing, S.; Wang, F.; Diao, R.; Yue, Y. Quality control based Chlorophyll-a estimation with two-band and three-band algorithms using Sentinel-2 MSI data in a complex inland lake, China. Geocarto Int. 2022, 37, 1–27. [Google Scholar] [CrossRef]
  81. Datta, S.; Karmakar, S.; Islam, M.N.; Karim, M.E.; Kabir, M.H.; Uddin, J. Assessing landcover and water uses effects on water quality in a rapidly developing semi-urban coastal area of Bangladesh. J. Clean. Prod. 2022, 336, 130388. [Google Scholar] [CrossRef]
  82. Hommersom, A.; Kratzer, S.; Laanen, M.; Ansko, I.; Ligi, M.; Bresciani, M.; Giardino, C.; Beltrán-Abaunza, J.M.; Moore, G.; Wernand, M.R.; et al. Intercomparison in the field between the new WISP-3 and other radiometers (TriOS Ramses, ASD FieldSpec, and TACCS). J. Appl. Remote Sens. 2012, 6, 063615. [Google Scholar] [CrossRef] [Green Version]
  83. Zibordi, G.; Ruddick, K.; Ansko, I.; Moore, G.; Kratzer, S.; Icely, J.; Reinart, A. In situ determination of the remote sensing reflectance: An inter-comparison. Ocean. Sci. 2012, 8, 567–586. [Google Scholar] [CrossRef] [Green Version]
  84. Anh, P.T.; Kroeze, C.; Bush, S.R.; Mol, A.P. Water pollution by intensive brackish shrimp farming in south-east Vietnam: Causes and options for control. Agric. Water Manag. 2010, 97, 872–882. [Google Scholar] [CrossRef]
  85. Linh, V.T.T.; Kiem, D.T.; Ngoc, P.H.; Phu, L.H.; Tam, P.H.; Vinh, L.T. Coastal sea water quality of Nha Trang bay, Khanh Hoa, Viet Nam. J. Shipp. Ocean. Eng. 2015, 5, 123–130. [Google Scholar]
  86. Loyola-Gonzalez, O. Black-box vs. white-box: Understanding their advantages and weaknesses from a practical point of view. IEEE Access 2019, 7, 154096–154113. [Google Scholar] [CrossRef]
  87. Liu, Y.; Bi, S.; Shi, Z.; Hanzo, L. When machine learning meets big data: A wireless communication perspective. IEEE Veh. Technol. Mag. 2019, 15, 63–72. [Google Scholar] [CrossRef]
Figure 1. Study area of the Binh Dinh province and its sea.
Figure 1. Study area of the Binh Dinh province and its sea.
Sustainability 15 01410 g001
Figure 2. Working procedures to accomplish the study goals; NDWI stands for normalized different water index; SWQ is the acronym for seawater quality.
Figure 2. Working procedures to accomplish the study goals; NDWI stands for normalized different water index; SWQ is the acronym for seawater quality.
Sustainability 15 01410 g002
Figure 3. Scatter plots of the calibrated SR of the nine bands (B1–B8A) vs. RAMSES-TriOS SRs measured at the same wavelengths; SR stands for surface reflectance.
Figure 3. Scatter plots of the calibrated SR of the nine bands (B1–B8A) vs. RAMSES-TriOS SRs measured at the same wavelengths; SR stands for surface reflectance.
Sustainability 15 01410 g003
Figure 4. Correlations between machine learning model estimates using the calibrated Sentinel-2 surface reflectance data and the in situ field-measured water quality parameters.
Figure 4. Correlations between machine learning model estimates using the calibrated Sentinel-2 surface reflectance data and the in situ field-measured water quality parameters.
Sustainability 15 01410 g004
Figure 5. Relative importance variables (Sentinel-2 bands) for the target features (seawater quality parameters: total suspended solids (TSS), chlorophyll-a (chla), chemical oxygen demand (COD), and dissolved oxygen (DO)).
Figure 5. Relative importance variables (Sentinel-2 bands) for the target features (seawater quality parameters: total suspended solids (TSS), chlorophyll-a (chla), chemical oxygen demand (COD), and dissolved oxygen (DO)).
Sustainability 15 01410 g005
Figure 6. Map of the modelled seawater quality (total suspended solids (TSS), chlorophyll-a (chla), chemical oxygen demand (COD), and dissolved oxygen (DO)) of Binh Dinh’s coastal areas.
Figure 6. Map of the modelled seawater quality (total suspended solids (TSS), chlorophyll-a (chla), chemical oxygen demand (COD), and dissolved oxygen (DO)) of Binh Dinh’s coastal areas.
Sustainability 15 01410 g006
Table 1. Spectral bands for the Sentinel-2 sensors (information adopted from [48]) used in this study; band 1 to 8A.
Table 1. Spectral bands for the Sentinel-2 sensors (information adopted from [48]) used in this study; band 1 to 8A.
BandsSentinel-2ASentinel-2B
Central
Wavelength (µm)
Bandwidth (µm)Central
Wavelength (µm)
Bandwidth (µm)Spatial Resolution (m)
Band 10.4430.0200.4420.02060
Band 20.4930.0650.4920.06510
Band 30.5600.0350.5590.03510
Band 40.6650.0300.6650.03110
Band 50.7040.0140.7040.01520
Band 60.7410.0140.7390.01320
Band 70.7830.0190.7800.01920
Band 80.8330.1050.8330.10410
Band 8A0.8650.0210.8640.02120
Band 90.9450.0190.9430.02060
Band 101.3740.0291.3770.02960
Band 111.6140.0901.6100.09420
Band 122.2020.1742.1860.18420
Used images9 September 2020
29 October 2020
3 November 2021
4 September 2020
14 September 2020
24 September 2020
9 September 2021
29 September 2021
Table 2. Computed coefficient of determination (R2) and adjusted coefficient of determination (R2adj) by the machine learning models in the process of Sentinel-2 level 2A seawater surface reflectance calibration.
Table 2. Computed coefficient of determination (R2) and adjusted coefficient of determination (R2adj) by the machine learning models in the process of Sentinel-2 level 2A seawater surface reflectance calibration.
RegressionModelsB1B2B3B4B5B6B7B8B8A
R2Decision Tree0.790.790.710.640.800.840.830.780.82
Random Forest0.800.830.760.770.810.850.840.860.84
Gradient Bosting R.0.800.830.810.770.810.850.830.820.84
Ada Boost R.0.790.800.720.720.790.840.840.800.81
Mean0.800.810.750.730.800.850.830.820.82
R2adjDecision Tree0.770.770.700.630.780.820.810.830.80
Random Forest0.780.810.730.750.790.840.830.840.82
Gradient Bosting R.0.780.800.790.750.790.840.810.810.82
Ada Boost R.0.770.780.700.700.770.830.820.790.79
Mean0.780.790.730.710.790.830.820.820.81
Table 3. Accuracy assessment of seawater quality predictions by machine learning models: total suspended solids (TSS), chlorophyll-a (chla), chemical oxygen demand (COD), and dissolved oxygen (DO).
Table 3. Accuracy assessment of seawater quality predictions by machine learning models: total suspended solids (TSS), chlorophyll-a (chla), chemical oxygen demand (COD), and dissolved oxygen (DO).
ParametersML ModelsOriginal S2 Level 2A SSRCalibrated S2 SR using In Situ SSR
R2MSE
(mg/m3)
RMSE
(mg/m3)
Bias
(mg/m3)
R2MSE
(mg/m3)
RMSE
(mg/m3)
Bias
(mg/m3)
TSSDecision Tree0.24109.703.3148.960.57181.284.2621.15
Random Forest0.5399.323.15120.830.6087.932.97100.58
Gradient Bosting R.0.50107.273.28102.340.5693.673.0680.18
Ada Boost R.0.42118.313.4496.820.52109.403.3175.39
ChlaDecision Tree0.642.611.614.170.842.291.515.34
Random Forest0.752.621.624.940.902.211.486.36
Gradient Bosting R.0.692.611.624.620.882.081.445.70
Ada Boost R.0.712.641.584.840.902.181.486.56
CODDecision Tree0.2612.743.5712.100.4613.373.6612.88
Random Forest0.4210.783.2813.540.6111.133.3417.10
Gradient Bosting R.0.4310.383.2212.440.5911.483.3917.08
Ada Boost R.0.4110.453.2312.920.669.753.1216.38
DODecision Tree0.6710.373.2217.350.689.233.0415.53
Random Forest0.6710.633.2617.920.7010.773.2819.01
Gradient Bosting R.0.6411.543.4018.480.6711.453.3819.29
Ada Boost R.0.5413.233.6419.220.749.793.1318.61
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Quang, N.H.; Dinh, N.T.; Dien, N.T.; Son, L.T. Calibration of Sentinel-2 Surface Reflectance for Water Quality Modelling in Binh Dinh’s Coastal Zone of Vietnam. Sustainability 2023, 15, 1410. https://doi.org/10.3390/su15021410

AMA Style

Quang NH, Dinh NT, Dien NT, Son LT. Calibration of Sentinel-2 Surface Reflectance for Water Quality Modelling in Binh Dinh’s Coastal Zone of Vietnam. Sustainability. 2023; 15(2):1410. https://doi.org/10.3390/su15021410

Chicago/Turabian Style

Quang, Nguyen Hong, Nguyen Tran Dinh, Nguyen Tran Dien, and Le Thanh Son. 2023. "Calibration of Sentinel-2 Surface Reflectance for Water Quality Modelling in Binh Dinh’s Coastal Zone of Vietnam" Sustainability 15, no. 2: 1410. https://doi.org/10.3390/su15021410

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop