Mapping Chlorophyll-a Concentrations in the Kaštela Bay and Braˇc Channel Using Ridge Regression and Sentinel-2 Satellite Images

: In this paper, we describe a method for the prediction of concentration of chlorophyll-a (Chl-a) from satellite data in the coastal waters of Kaštela Bay and the Braˇc Channel (our case study areas) in the Republic of Croatia. Chl-a is one of the parameters that indicates water quality and that can be measured by in situ measurements or approximated as an optical parameter with remote sensing. Remote sensing products for monitoring Chl-a are mostly based on the ocean and open sea monitoring and are not accurate for coastal waters. In this paper, we propose a method for remote sensing monitoring that is locally tailored to suit the focused area. This method is based on a data set constructed by merging Sentinel 2 Level-2A satellite data with in situ Chl-a measurements. We augmented the data set horizontally by transforming the original feature set, and vertically by adding synthesized zero measurements for locations without Chl-a. By transforming features, we were able to achieve a sophisticated model that predicts Chl-a from combinations of features representing transformed bands. Multiple Linear Regression equation was derived to calculate Chl-a concentration and evaluated quantitatively and qualitatively. Quantitative evaluation resulted in R 2 scores 0.685 and 0.659 for train and test part of data set, respectively. A map of Chl-a of the case study area was generated with our model for the dates of the known incidents of algae blooms. The results that we obtained are discussed in this paper. and Generalized Additive Models—GAMs) and Landsat-8 imagery. The results showed that GAM outper-Electronics


Introduction
Coastal water quality monitoring is an important activity for retrieving information about the state of a marine ecosystem. In the Republic of Croatia, the degree of sea bathing water quality is regulated by the Regulation on Sea bathing water quality (OG 73/08) and the EU Directive on management of bathing water quality No 2006/7/EC. Consequently, it is necessary to constantly monitor the quality of the sea and the cleanliness of the coast, primarily because of human and animal health, fishing, and also because of tourism as one of the main economic incomes and activities of people on the Adriatic coast, especially in the summer season [1]. This is why responsible authorities schedule in situ measurements of various water quality indicators for continuous, albeit sparse, monitoring.
One of the indicators that depicts the quality of marine and freshwater is the concentration of Chlorophyll-a (Chl-a). Besides merely showing the quality of the water, Chl-a can also be used as a proxy for phytoplankton biomass and that is why it is an indicator of water quality and trophic status [2]. This is why it is important to continuously monitor Chl-a in a body of water.
In this study, we analyzed the possibility of estimating Chl-a in coastal environments, specifically in the Kaštela Bay and the Brač Channel in Croatia, by using satellite remote sensing to complement the existing in situ monitoring with continuous and more frequent assessment.
Characteristics of the coastal environment result in the complexity of the marine ecosystem and pose challenges to the monitoring of water quality indices. Increases in natural or anthropogenic nutrients result in higher concentrations of phytoplankton and marine organisms, which we measure as a concentration of Chl-a [3]. The concentration of Chl-a is usually measured by taking samples of water and making laboratory analysis of collected samples. This way, we can establish the concentration level for only a limited number of points of the research area, which makes this approach not only expensive and weather-dependent but also does not provide a broader picture of the distribution of Chl-a concentration [4].
Thus, in this paper, we offer an approach to calculate Chl-a concentration using the images obtained by Sentinel-2 satellite instruments. With this approach, we can obtain a broader picture of Chl-a distribution and concentrations because Chl-a is an optically active component that has a characteristic spectral signature allowing its detection with optical sensors onboard satellites. Moreover, Chl-a is connected with important processes in the sea (e.g., eutrophication and algae blooms); so, with remote sensing, continuous monitoring of changes in the sea becomes possible.

Related Work
Water quality monitoring using multispectral or hyperspectral remote sensing represents a great challenge in order to find the relationship between in situ data and reflectances obtained by various sensors. There are many optical and nonoptical parameters that affect water quality and can be estimated using satellite data [5,6]. The main research focus is on the optical properties of water such as chlorophyll [7], turbidity [8], total suspended matters [9], and colored dissolved organic matters [10], which affect the reflected radiation of the sea and, thus, can be measured remotely.
Many studies show a different usage of observations and methods for retrieving the concentration of Chl-a for different types of monitored waters. The widely used Ocean color (OC3) algorithm for the calculation of Chl-a concentration based on three bands (B01, B02, and B03), representing blue to green ratios, was proposed in [11]. This algorithm is an empirical, with parameters trained on data originating from ocean and coastal waters. Fitted parameters are published for various satellites. Sentinel-2 parameters are published in [12]. Case-2 Regional/Coast Colour (C2RCC) [13] is a physics-based processor for the calculation of various water parameters (such as Chl-a, total suspended matter (TSM), and colored dissolved organic matter (CDOM)). This algorithm utilizes a neural network trained on a large amount of data for inverting top-of-atmosphere (TOA) radiance to water-leaving reflectance. Implementation for various satellites is available as a part of ESA's Sentinel Toolbox SNAP [14]. Types of water studied by various researches are lakes, rivers, oceans, estuaries, and coastal waters. Studies focus on certain study areas and use only certain satellite data. Methods for estimating parameters vary from fitting a formula parameter to machine learning algorithms. In [15], authors compared Sentinel-2A and Earth Observing-1 (EO1) for estimation of Chl-a and turbidity, where Sentinel-2A showed better results in retrieving concentration for both parameters (Chl R 2 : 0.72 vs. 0.57 and TU R 2 : 0.7 vs. 0.63). Further, the authors used reflectance ratio for mapping Chl-a, which can be seen in several other studies using different reflectance values for developing algorithms [2,16,17]. Sentinel-2 and Landsat-8 have good spatial resolution, which makes them suitable for monitoring the water quality along the coast [12,[18][19][20][21], as opposed to Sentinel-3, SeaWiFS (Sea-Viewing Wide Field-of-View Sensor), and MODIS (Moderate Resolution Imaging Spectroradiometer), which have large spatial resolutions better for monitoring the open sea [22][23][24].
Several studies show that using regression models can have a big potential for retrieving the concentration of Chl-a in coastal environments and lakes [15,[25][26][27][28]. For example, the authors in [3] compared predictive performance on three different regression models (Simple Linear Regression-SLR, Multiple Linear Regression-MLR, and Generalized Additive Models-GAMs) and Landsat-8 imagery. The results showed that GAM outper-formed SLR and MLR, while the MLR performed better in predicting log-transformed Chl-a values. Furthermore, in [29], the authors compared Ordinary Least Square (OLS) with Ridge Regression (OPT) to estimate Chl-a, turbidity, and suspended sediment based on relationship with Landsat reflectance. Ridge Regression estimation showed better results compared with Ordinary Least Square analysis.

Proposed Method
The aim of this paper is to reveal the correlation between in situ Chl-a data and Sentinel-2 satellite band reflectance data using Ridge Regression for estimating the coefficients of Multiple Linear Regression to retrieve Chl-a concentration. The aforementioned studies used only predefined formulas and bands to develop a predictive model, while we present a regression model that takes all of the Sentinel-2 band reflectance data into account. Our modeling approach is purely data-oriented and aimed towards discovering the formula from the available data. The implemented regression model provided coefficients to create a formula used to estimate Chl-a concentration, which will be presented as a map of Chl-a distribution in the area of the Kaštela Bay and the Brač Channel revealing the spatial distribution of the Chl-a parameter and overall state of the aquatorium.
The main contribution of this paper is a novel method for developing a sophisticated and reliable model for predicting Chl-a from Sentinel-2. The developed model is tailored specifically for the study area. This is achieved by augmenting the original data set in two dimensions-horizontally (by performing feature augmentation by transforming the feature values) and vertically (by adding value 0 to the Chl-a measurements for the locations on land without Chl-a). The horizontal augmentation improved the correlation of predicted and measured Chl-a concentrations by providing a more sophisticated model with nonlinear dependencies on the spectral reflectances. The vertical augmentation enriched the data set with new data items. In this way, the regression algorithm receives not only the features of locations with Chl-a concentration but the locations without the Chl-a as well, making the data set more balanced. This method enabled creation of a model that is sophisticated and fitted to the features of the study area. Although the model is precisely fitted for the case study area and past measurements, the evaluation of the model shows that our model does not overfit. The reliability of the model is qualitatively validated on the algae bloom incident. Finally, the two novel contributions of the paper can be summed up as follows: • A method for developing a sophisticated formula for estimation of Chl-a values in coastal waters where we have a limited number of measurements. • A new regression formula for estimating Chl-a concentrations in coastal waters of study area. Figure 1 shows the study area, which is located in the Kaštela Bay and the Brač Channel, near Split-the second largest city in the Republic of Croatia. Both locations are coastal water bodies. Kaštela Bay is the largest bay in central Dalmatia and it is about 14.8 km-long, 6.6 km-wide and up to 60 m-deep [30]. The river Jadro (near the city of Solin) and the stream Pantana (near the city of Trogir) flow into the bay. The Brač Channel is a sea passage between the mainland of Dalmatia and the island of Brač, and is about 50 km-long, 5 to 13 km-wide and up to 78 m-deep. The river Cetina in the city of Omiš flows into the Brač Channel, which deposits sand and creates a shoal that stretches up to 800 m from the shore [31]. The study area location is depicted in Figure 1. Water body of this area is a swimming area for many tourists and local citizens during warm summer periods, and also an area of rich biodiversity for fishery [32]. Due to the high number of people living on the shore of this area, this is also one of the most vulnerable areas when analyzing the risk of pollution. Two rivers in the study area (Jadro and Cetina) bring pollution and nutrients that feed the sea microorganisms responsible for sea blooms. Another cause of higher Chl-a is sewage discharged into the sea, so feces cause contaminants such as the presence of coliform bacteria, e.g., Escherichia Coli and Enterococci. All of these various types of pollution affect animal and human health, and cause economical problems related to tourism and fishing. It is, therefore, necessary to monitor and react timely to these problems, especially in the summertime when it is a swimming season.

In Situ Data Set
The in situ measurements were performed by Hrvatske Vode, a legal entity for water management in the Republic of Croatia. The samples were collected over the study area on thirteen different stations ( Figure 2) and at different depths (0 m, 5 m, 10 m, 20 m, 30 m, and 50 m) during the years 2017, 2018, and 2019. The sea sampling frequency was once a month and performed in the middle of each month on dates depending on weather conditions. Not all thirteen stations were included in each measurement. For 2017 and 2019, each measurement in one month was performed for seven different stations, and for 2018, measurements were performed for six different stations. Besides Chl-a, other water quality parameters were measured from the samples, but in the scope of this paper we will focus only on Chl-a. Table 1 shows the Chl-a concentration statistics for all thirteen measurement stations.
The Chl-a concentration was measured by the fluorimetric method, where the samples were filtered under vacuum up to 65 KPa through Whatman GF/C glass filters, with a pore diameter of about 1 µm. The devices used to determine the concentration of Chl-a were a centrifuge, a TD-700 fluorimeter, and a homogenizer. The sample volume ranges from 200 to 500 mL, depending on the area and sampling season. The total number of samples for all years of measurements, stations, and depths is 480.
It is important to stress that the in situ Chl-a measurement was performed by a team from the national authority for water management, and followed the official standard procedure as prescribed in [33]. The data set consisting of in situ measurements was obtained via official channels and delivered to us via e-mail from a representative of the legal entity that owns the data. In the scope of this paper, we consider the values provided by the in situ Chl-a measurement as ground truth values and have no reason to doubt the validity of measurement. All presented results are based on the assumption that the in situ measurements are accurate and precise.
Since Chl-a is a pigment that is optically active and produces a characteristic spectral signature, we are interested in the concentration of Chl-a that can be detected using optical sensors onboard satellites. We did not only use the value of Chl-a concentration measured on the sea surface, but we also took into account the values measured at different depths.
Remotely sensed values depend on the concentration of Chl-a in the water column that is visible to the sensor. Thus, we used in situ Chl-a measurements on depths lower than measured Secchi depth. The values of the Secchi depth were also measured in situ on the same dates and locations as the Chl-a measurements. After we calculated the mean Chl-a value for each station based on the Chl-a concentration data for different depths that are lower than the measured Secchi depth value, the final set of in situ data was constructed and contained 121 measurement values.

Satellite Data
The Sentinel-2 is an Earth observation mission comprised of a constellation of two identical satellites (Sentinel-2A and Sentinel-2B). It has a revisit time of 5 days of the same area using both satellites. Further, both of these satellites contain a MultiSpectral Instrument (MSI) that can sample 13 spectral bands. The spatial resolutions and central wavelengths [34] of the 13 spectral bands of Sentinel-2 are shown in Table 2.
In this study, we used Sentinel Level-2A atmospherically corrected data products. The Level-2A product provides Bottom of Atmosphere (BOA) reflectance images that are processed from associated Level-1C images with Sen2Cor processor [35]. We did not consider different atmospheric correction procedures nor any other available satellite products indicating the data quality, besides filtering outliers indicating clouds. In the scope of this paper, we only focused on finding the correlations between band values and measured Chl-a values regardless of other parameters. Final images are available from Sentinel Hub API [36].
We obtained a high-quality image where the influence of the atmosphere on the light that reaches the sensor reflects from the Earth's surface. We collected 3024 TIFF (32-bit float) images of all bands except band B10, which was not available through Sentinel Hub at the time of writing of this paper. A total of 252 different scenes covering the study area for 2017, 2018, and 2019 were taken for processing. The cloud coverage of each scene is less than 20%. Images were downloaded from the Sentinel Hub EO Browser [36] page by implementing a script in the Python programming language that uses the API requests (credit: Modified Copernicus Sentinel data 2017/Sentinel Hub). Due to the higher time resolution of the Sentinel-2 satellite, we could not retrieve the satellite image for the exact day of each sampling, so we retrieved the images of the nearest day the image was available, either before or after the sampling date.

Methodology
The methodology includes several stages required to obtain results. Based on the described and collected in situ and satellite data, we created a data set on which data transformations were applied. After building the data set, we applied Ridge Regression on the data set and constructed a Multiple Linear Regression (MLR) formula to estimate Chl-a concentrations in our study area. Figure 3 shows the initial procedure for collecting in situ and satellite data for constructing a data set, as described in the previous sections. Data set of in situ Chl-a values has been constructed by aggregating the mean value of the measured Chl-a for different depths that are less than the measured Secchi depth. In this way, we calculate the mean Chl-a concentration of each measurement at thirteen individual stations. The range of the measured Chl-a value is between 0.1475 µg/L and 2.435 µg/L (average value of 0.596281 µg/L) for all three years of measurement (2017, 2018, and 2019).

Data Set Construction
To retrieve satellite data, we downloaded 32-bit float .tiff Sentinel Level-2A satellite images of each band georeferenced in the WGS 84 coordinate system (EPSG: 4326).
For all in situ measurement stations, we read a value of each Sentinel Level-2A band from the image. Since the image contains 16-bit numbers ranging up to 65,535, we applied a factor of 1/10,000 similar to [37] on pixel values in order to obtain a value capable of storing and manipulation. In situ and satellite data were merged based on the measurement station and the date of the sampling and observation.
For the dates when in situ measurement missions were carried out, we selected Sentinel Level-2A images for each available band and extracted band values at the pixels associated with the location of the measurement station. In this way, we associated 12 band reflectance values with one Chl-a measurement value. Since the Sentinel-2 revisit period for the study area is 5 days, we wanted to check the time difference between the sampling and observation. In our final data set, 82% of images were obtained on the same day as thein situ measurement, 12% with one day difference, and for the rest of the data we allowed up to 10 days difference. We tested the performance by allowing only the smaller difference but no significant improvement of the algorithm was noticed; so, we decided to keep the data of up to 10 day difference in order to feed the machine learning algorithm with more data. The distribution of time differences of the used data set is shown in Figure 4. For comparison, Sentinel 3 revisit time is one day, so we would be able to obtain an image on each day of the measurement but with the cost of lower spatial resolution of the images. This resulted in a data set of a total of 118 records representing band values associated with in situ Chl-a samples. Table 3 contains descriptive statistics of associated in situ Chl-a and band values. The span of minimal and maximal values and standard deviation show that the dataset contains diverse values. This was expected since the measuring stations are located on 13 different geographical locations and the measurements are performed during three years.

Multiple Linear Regression Model
Multiple linear regression (MLR) is a statistical modeling technique that extends simple linear regression. This model was built in order to predict a response variable (y) from multiple explanatory variables (x 1 , x 2 , ..., x n ) [38]. The goal of this model is to represent a linear relationship between many independent predictor variables (bands) and a single dependent variable (Chl-a) as a single functional formula [39]. The equation [40] for the MLR model is as follows (1): where y is a dependent variable, which, in our case, is a concentration of Chlorophyll-a; β 0 is a constant, often referred to as bias or intercept; β i is a regression coefficient for an independent variable X i ; and x i (i=1,2,...,n) are independent variables that are, in our case, the values of Sentinel-2 reflectance bands.
Multiple linear regression modeling is a task of finding the optimal values of β 0 intercept and β i coefficients to make the prediction more accurate.

Ridge Regression
Multicollinearity is the existence of a nearly linear relationship between regression variables, predictor variables, or input/exogenous variables [41]. It may lead to inaccurate estimates of regression coefficients [42] when using traditional regression modeling methods. To avoid this problem, we used a statistical technique called Ridge Regression, which was first introduced by Hoerl and Kennard [43]. The concept of this method is based on the Least Squares Estimator (LSE), reducing the mean square error and retrieving a more stabilized regression coefficients [29]. Ridge Regression Estimator (RRE) of β [41] is defined in (2) for the multiple linear model (1): where β RR n is a regression coefficient; X is an independent variable that presents a design matrix; Y is a response vector; k is a tuning parameter; k∈ [0, ∞ ; and I p is identity matrix.
The equation above shows that the ridge βs change with k-value and become the same as LSE βs if k is equal to zero.

Statistical Evaluation Indices
The evaluation of the predictive performance of Ridge Regression Estimator model was evaluated based on several criteria such as the coefficient of multiple determination (R 2 ) and Root Mean Square Error (RMSE).
The R 2 metric [44], also called Pearson squared correlation, is equal to the square of the correlation coefficient and represents a linear consistency. Thus, proportion of the variation and the in situ and Sentinel Level-2A observations as a linear consistency is explained by the linear regression. This metric can be calculated as follows (3): where y i is the value of the ith sample, y represents the mean of variable y, andŷ i is the estimated value of the ith sample [44]. RMSE is used to measure the difference between the estimated value of the estimator and the actual value. This is the perfect measure of accuracy for comparing the difference in prediction errors from different estimators on a given variable [45]. The equation [46] for this measure can be seen in (4): In the shown equation, x i represents the ith in situ observation, y i represents the ith value predicted by the algorithm based on Sentinel Level-2A observation, and N is the number of match-ups.

Data Set Augmentation
Data set augmentation [47] is a process often used to increase the performance of machine learning algorithms. Performance of the algorithm depends on the data set provided. Feeding algorithms with a larger data set results in more accurate predictions. Since obtaining new data items is not always easy, authors synthetically expand a training set using augmentation techniques. Data set augmentation refers to simple and complex transformation of data items in a dataset in order to obtain a larger set of data and achieve better performance of a machine learning algorithm.
In our approach, we applied data set augmentation in two directions, which is illustrated in Figure 5. The original data set we collected consists of spectral values at the measuring stations measured by instrument on Sentinel-2 satellite (orange part) and Chl-a values measured at the same measuring stations (green part). The original data set is augmented among two axes-horizontal and vertical. With vertical augmentation, we increased the number of rows in the data set by adding fictional measurements on hard land where we are sure that the concentration of Chl-a is zero. With horizontal augmentation, we increased the number of columns in the data set by adding new features that are achieved by transforming the original features.

Vertical Augmentation
Vertical augmentation was performed by adding new fictional measurements to the data set of in situ Chl-a measurements. We fabricated in situ measuring stations for Chl-a on arbitrary locations where we were sure that the Chl-a concentration is always zero. We selected coordinates for these fabricated measuring stations to be located on landmarks such as airports, rooftops, and stone pits in the case study area. We selected a total of six such locations in order to make our data set more balanced.
In further processing, these fictional measurements were treated the same as real measurements with Chl-a concentration measured to be zero. Our intuition was that, besides enriching the data set with more data, these new data items will teach the algorithm the features of locations that do not have Chlorophyll. As we will show later, this resulted in the exclusion of objects on the sea (such as boats) and land from mapping Chl-a concentrations.

Horizontal Augmentation
Some variables do not conform to assumptions of linear relationship, normality, independence, and homoscedasticity of error terms, which may produce misleading results [48]. Since Ridge Regression is limited only on linear relationship of exploratory variables and result, we performed feature augmentation on bands to increase and change the data set. Each band is taken as a special feature in the data set. In this way, our data set is expanded horizontally by adding four different transformations on features values: logarithmic transformation, square transformation, square root transformation, and reciprocal transformation, as can be seen in (5), (6), (7) and (8), respectively: where B i represents each band value (B01-B09, B11-B12, and B8A). Transformation is the application of a mathematical function to each value in a data set, where each set value is replaced by a transformed value, which can generally be written as in (9).
Performed data transformations (vertical and horizontal) result in a data set with overall 60 features denoting bands reflectances and one feature related to Chl-a concentration. In this way, we have enriched the data set, so that the model has enough data to learn and can provide better accuracy by using a more complex prediction function.

Results
Ridge Regression model was developed on the constructed data set in a way of training the model on 80% of the data set, and testing and validating the model on the remaining 20% of the data set. Train set and Test set were constructed by randomly shuffling and splitting items into 80% and 20% of items, respectively. Further, we evaluated the constructed model and results using two approaches: • Quantitative evaluation was performed on training and testing portions of the data set; • Qualitative evaluation was performed by inspecting resulting images of Chl-a distribution for certain dates and discussing in the scope of the known events.
First, we wanted to make sure that the vertical augmentation we used on our data set is a valid approach and will yield better results. We developed a model on the original data set of 12 features and evaluated the model score. Then, we developed a model trained on the 80% of the vertically augmented data set including fabricated chlorophyll-free stations, and evaluated the model score. The results shown in Table 4 indeed show better values of evaluation measures for both train and test portions of the data set for the model trained with vertically augmented data set. The percentage of data taken for training (80%) and testing (20%) is the same for each constructed data set. Furthermore, in order to objectively verify that the horizontal augmentation is a valid approach to our problem, we trained the Ridge Regression model and observed the results for five different variations of features, from original bands to bands with all four data transformations applied.
This way, we ensured that the Ridge Regression model is not overfitted with additional features. Table 5 shows combinations of bands and transformations that we used for incremental Ridge Regression model testing. The model shows the most accurate results by taking into account all of the transformed features and original band values. The R 2 metric for train data is 0.685 and for test data is 0.6599, while RMSE metric is 0.2254 for train and 0.2051 for test data. This shows that Chl-a is quite correlated with transformed reflectance values of Sentinel-2 bands.  Figure 6 shows the relationship between values of in situ and predicted Chl-a values in the form of an error plot. From the error plot, we can notice that for both the test and the train data, the predicted values of Chl-a follow the same trend. Although the prediction is not perfectly accurate, for measurements with higher values of Chl-a, predicted values are higher, while for measurements with low Chl-a, predicted values are lower. For a certain number of measurements denoted as zero Chl-a, we have low predicted Chl-a but not exactly zero since for these points' reflectances are measured on land, which has different behavior than the sea. The predictions are made using a formula based on Equation (10).
Equation (10)  Among selected features, we noticed that three of the transformed features of band B04 ((B04) 2 , √ B04, and log(B04)) have higher coefficients. B04 is usually used in combination with vegetation red edge. Values of bands B05 or B8A are often proposed for retrieving Chl-a in coastal waters [21] or lakes [49]. Further, there is a correlation with bands B06 and B8A in two different features, which are vegetation red edge bands usually used for retrieving various vegetation indices (e.g., NDVI-Normalized difference vegetation index).  As a result, we can conclude that the Ridge Regression gives bias to bands B01, B02, B04, B05, B06, B8A, B11, and B12, which makes them important features in detecting Chl-a concentration in the area of Kaštela Bay and the Brač Channel.
In order to further investigate the validity of the obtained formula, we observed the spatial distribution of the predicted Chl-a value for the whole water body of the study area. Estimated Formula (10) is applied to .tiff band images by using the raster calculator in the free and open-source GIS (Geographic Information System) software-QGIS [50]. Figure 7 shows mapped Chl-a in the Kaštela Bay and the Brač Channel using the newly proposed formula and Sentinel Level-2A satellite data from 30 September 2017. Higher concentration of Chl-a can be spotted at the coastline where the circulation of water is lower. The obtained map confirms our assumptions that Chl-a is higher in the nearby vicinity of the rivers Jadro and Cetina and along the coast.
Furthermore, in Figure 7, it is possible to see how the applied formula separates the land from the sea quite well, as well as the identification of other water surfaces and the concentrations of Chl-a in them (the example of the river Cetina). Estimated formula assigns negative values to land and objects without chlorophyll, such as boats in the sea. We assume that this is the effect of vertical augmentation since the algorithm has learned that the features of the hard objects on land have low values of Chl-a. Of course, there are some pixels that are classified as Chl-a on land, but this is a model error that can be ignored.

Comparison with Other Methods for Chl-a Estimation
As we stressed in the related work section, many scientists have already developed different algorithms for the assessment of Chl-a concentration that are based on remote sensing. In this paper, we selected the two most common algorithms for implementation in order to check the performance of these algorithms over our data set, and to compare them to our algorithm. These two algorithms are as follows: • Ocean color (OC3) algorithm, as proposed in [11], and coefficients adopted from the study in [12]. OC3 was implemented as a function to calculate the Chl-a value having the same band values as our algorithm. The resulting value was evaluated having in situ measurement as ground truth. • Case-2 Regional/Coast Colour (C2RCC) algorithm [13]. We used the C2RCC processor provided in the SNAP processing toolbox from ESA. We collected the Sentinel-2 images from https://scihub.copernicus.eu/ (accessed on 12 October 2021) for several dates of the in situ measurements, processed the whole data set using C2RCC S2-MSI processor, and created a Chl-a map of the area. The processor used default parameters since other algorithms we compared did not use any external data or information. Predicted values of Chl-a concentration were further extracted only on the coordinates of measurement and compared with the in situ measurements.
The results of the evaluation are shown in Table 7. By comparing the shown values, we can conclude that according to both evaluation measures, our model results in better predictions compared with in situ measurement. This is not surprising, since our model is trained with in situ measurement data and tested on a portion of the data set that was not used for training but originated from the same source. However, what surprised us is that the correlation coefficient R 2 was negative for both OC3 and C2RCC. This shows that predictions of these two models are not correlated at all with in situ measurements. Negative R 2 score values means that the model does not follow the trend of the data. Both the OC3 and C2RCC algorithms predict the values of Chl-a that are not correlated at all with the in situ measurements. This means that Chl-a in our study area cannot be predicted with commonly used algorithms but needs a more suitable model, such as the one we developed in this research.

Qualitative Validation
In this section, we will demonstrate the usefulness of mapping Chl-a concentrations. First, we will show the resulting Chl-a map in case of occurrence of unwanted phenomenon related to Chl-a-Algae blooms. Algae blooms occur under certain weather conditions such as changes in temperature and higher rainfall in the spring; moreover, algal blooms are affected by an increased intake of nutrient salts of nitrogen and phosphorus that serve as food for phytoplankton [51].
Some algae can release toxins that negatively affect human health, but also lead to the extinction of fish and negative impacts on all seafood-dependent animals [52]. With algae blooms where the sea is bacteriologically clean, there is no danger to human health but it is an inconvenience to bathers during the summer season.
Detecting algae blooms is challenging, due to the different cell sizes and types of phytoplankton [53]. Chl-a can indicate a potential increase in algae, so maps of Chl-a concentrations calculated from satellite images can provide information on the possible spatial distribution of algae. Figure 8 shows a photo of the beach in Kaštela Bay on 1 April 2021, where the algae blooms can be seen. This incident was reported in the media and discussed on social media. To see if our method for retrieving Chl-a can be used to detect changes in the sea during the algae blooms, we downloaded the first available satellite image taken on 2 April 2021 and applied Formula (10) on image bands (Figure 9). In Figure 9, the same scale of Chl-a concentration range is applied as in Figure 7. The Sentinel Level-2A image (Figure 7) was taken in the period when there were no reported algae blooms in media for our study area. Based on the publicly available data published by the Ministry of Environmental Protection and Energy [54], which is a legal entity of the Republic of Croatia, in the summer and autumn months, the phytoplankton algae Dinophysis caudata from the Dinoflagellata group were recorded in relatively high abundance in almost all areas of the Adriatic sea. Dinophysis caudata algae produces toxic metabolites such as okadaic acid and DTX toxin. Based on these facts, the abundance of the algae population is potentially captured in the image (Figure 7), but we cannot claim this with certainty. First, we can notice that the green color, which represents a higher concentration of Chl-a, is more pronounced at the time of algae blooms in Figure 9 compared with Figure 7 when there were no algae blooms in the sea. Comparing scenes with and without algae blooms can be the first indicator that changes are happening in the sea.
On Figure 9, we marked the location where the image (Figure 8) was taken with the ellipse. It is visible that the whole coast shows the highest concentration of Chl-a when the algae bloom incident occurred. This does not mean that any increase of Chl-a indicates the algae bloom incident but shows that our model detects the Chl-a of algae on the water surface. Nevertheless, our model also excludes objects in the sea such as boats, as shown in Figure 10. In this figure, we can clearly see boats in the sea near the shore at the time the satellite image was taken.

Conclusions and Future Work
This study describes a method for obtaining the Ridge Regression model applied to in situ and Sentinel Level-2A satellite data and discusses the obtained results. The data set is constructed by making feature augmentation and adding arbitrary points representing stations with zero concentration of Chl-a. The best results in the application of Ridge regression had a combination that included all of the transformed features of the bands, both for train (R 2 = 0.685 and RMSE = 0.2254) and test data (R 2 = 0.6599 and RMSE = 0.2051). Obtained coefficients of applied Ridge Regression model are used for the implementation of MLR formula in order to estimate and map Chl-a distribution in the Adriatic sea for the areas of Kaštela Bay and the Brač Channel situated in the Republic of Croatia. In the estimated formula for Chl-a prediction, bands B01, B02, B04, B05, B06, B8A, B11, and B12 in different transformations showed the best correlation with in situ Chl-a data. Predicted concentration of Chl-a is high at the coast and nearby rivers, which are areas usually classified as places with a high Chl-a trend.
One of the undesirable sea phenomena that can be unpleasant and dangerous for human and animal health is algae blooms. They can cause discomfort to bathers in the summer; so, it is useful to monitor and predict changes in the sea. We applied a formula for predicting Chl-a on Sentinel-2 images taken on the date of the incident of algae blooms that occurred in April 2021 in our study area. This showed larger deviations in Chl-a concentration during algae blooms, compared with when there was no algae bloom in this area. Thus, prediction of Chl-a concentration using remote sensing gives us the opportunity to monitor the water quality in an inexpensive way and gives us a broader picture of distribution of Chl-a.
Statistical evaluation of our algorithm shows that it is possible to find correlations of band values and transformations of those values with measured Chl-a concentration for oligotrophic waters with relatively small values of Chl-a concentration. Our model output is more correlated with measurement values than the prediction of OC3 and C2RCC algorithms.
In the future, in addition to chlorophyll, other optical parameters such as turbidity, total suspended matter (TSM), and colored dissolved organic matter (CDOM) can be used to predict water quality. Therefore, to improve the existing model, we could include the listed parameters in order to achieve higher accuracy for water quality prediction. Additionally, for better results, it is necessary to further enrich the data set for training and testing, so we plan to include data from other satellites in our future work, such as Sentinel-3 or Landsat-8. In this way, the data set will be enriched with measurements for which we excluded the data from the Sentinel-2 satellite.

Data Availability Statement:
Restrictions apply to the availability of these data. In situ data is property of -Hrvatske Vodeand is not available from the authors. Satellite data used in this study and computer code are available on request from the corresponding author.