Determination of Water Depth in Ports Using Satellite Data Based on Machine Learning Algorithms

: One of the fundamental maintenance tasks of ports is the periodic dredging of them. This is necessary to guarantee a minimum draft that will enable ships to access ports safely. The determination of bathymetries is the instrument that determines the need for dredging and permits an analysis of the behavior of the port bottom over time, in order to achieve adequate water depth. Satellite data processing to predict environmental parameters is used increasingly. Based on satellite data and using different machine learning algorithm techniques, this study has sought to estimate the seabed in ports, taking into account the fact that the port areas are strongly anthropized areas. The algorithms that were used were Support Vector Machine (SVM), Random Forest (RF) and the Multi-Adaptive Regression Splines (MARS). The study was carried out in the ports of Cand á s and Luarca in the Principality of Asturias. In order to validate the results obtained, data was acquired in situ by using a single beam provided. The results show that this type of methodology can be used to estimate coastal bathymetry. However, when deciding which system was best, priority was given to simplicity and robustness. The results of the SVM and RF algorithms outperform those of the MARS. RF performs better in Cand á s with a mean absolute error (MAE) of 0.27 cm, whereas SVM performs better in Luarca with a mean absolute error of 0.37 cm. It is suggested that this approach is suitable as a simpler and more cost-effective rough resolution alternative, for estimating the depth of turbid water in ports, than single-beam sonar, which is labor-intensive and polluting.


Introduction
The bathymetry of coastal zones is important for many applications. These include navigation, infrastructure maintenance, dredging planning, managing the environment, hydrographic applications and coastal engineering sciences [1][2][3][4]. Sediment deposition and erosion in these shallow areas occurs frequently due to tides, wave propagation and intensive human activities [5]. Management of and planning for these areas of endeavor require updated and accurate information. In turn, this requires efficient technologies to record these never-ending changes. Although detailed information of the seabed is essential worldwide for the management of coastal environments, there are still economical and logistical constraints. In the specific actions program for ports and port facilities, dredging maintenance of docks and navigation channels is undertaken as actions that are necessary to guarantee navigation within ports and operation of their infrastructure and facilities. Most ports have dredging channels that experience sedimentation and reduce the depth of water that is available for navigation [6]. Ports operate with a minimum draft that is necessary to accommodate the ships. Most ports need maintenance dredging at some point to improve and facilitate navigation and for the development and maintenance of infrastructures in the marine and fluvial environment [7,8]. It is necessary to update the bathymetries periodical. In order to minimize unnecessary or excessive dredging and the associated expense [9], it is important to determine and model seafloor levels accurately. Conventional methods of bathymetric data acquisition generally provide accurate depth profiles or point measurements along transects. However, they are limited by their logistical expenses and inefficiency in use. Further, they are difficult to apply in remote areas. Echo sounders are normally used to measure depths [10,11]. Single beam echo sounders are most commonly used for port studies. The measurement range of depths of these systems is from 0 to 5 m, but it can perform measurements greater than 5 m, a value well above what dredging operations require. The echo sounder measures the time for the signal from the transducer to reach the receiver, after being reflected by the background. By this means, it is possible to measure the depth of both the seabed and any object that is below the surface of the sea. This tool provides valid results in port studies. It will continue to do so, if the studies are well planned and executed. However, problems in precision and accuracy limit its use, as does the difficulty of using the tool for shallow coastal waters. Its operating costs are also high and its use requires many safety precautions [11]. These aspects increase the attractiveness of such alternative techniques as remote sensing, to provide reliable lower costing depth estimates [12][13][14][15]. Combining echo sounders' and satellite data provides another means to improve bathymetric estimation. Many topographic studies are carried out by remote sensing technology. This solves various problems that require studies of the depths on temporal and spatial scales that are impossible to achieve [16][17][18][19]. Some authors have developed simple methods that use optical images to estimate the depth of water. They include the use of linear regression logarithms [20][21][22].
Machine Learning (ML) techniques have become popular in recent years to estimate bathymetries using optical sensors. This is due to the increasing availability of new satellites and rapid advances in algorithm development and data availability [23]. For example, Neural Networks (NN) is a potential tool that numerous researchers have used recently in a wide variety of remote sensing applications [24][25][26]. Other researchers have been using Support Vector Machine (SVM) [27] as an alternative to NN to improve the performance of bathymetric recovery algorithms. It works well for nonlinear classification, time series prediction and regression [28]. Another non-linear regression algorithm, Random Forest (RF), is suitable for the construction of regression models that involve satellite images for bathymetry data [29][30][31]. Recently, Multi-Adaptive Regression Splines (MARS) has been used as relatively novel method for modelling and approximating nonlinear bathymetry measurements in shallow coastal areas [32]. These data-driven models are generally regarded as offering greater flexibility and accuracy in using satellite images to estimate water depths [33].
The literature reveals areas of possible future use of remotely sensed data in studies of water depth in clear shallow water [34,35]. However, the inherent conditions of ports that have highly polluted and turbid waters have often compromised the results that have been obtained. This may be related directly to the water's inherent optical properties (attenuation coefficient, dispersion, absorption . . . ). This paper proposes a comparison of three different approaches for bathymetry estimation at two ports located at Candás and Luarca (Spain). The water depth estimation models were created by use of the Support Vector Machine, Random Forest and Multi-Adaptive Regression Splines methods. These proposed bathymetry methodologies were applied to Sentinel-2 images and compared to echo sounder depth data of the two study ports.
Previous studies have investigated the use of SVM techniques to estimate water depth in ports [36], the main advantages of machine learning methods include their reproducibility and their potential for continuous updating. In the current study, three machine learning techniques, SVM, MARS and RF, were compared to construct bathymetry maps using geographic information systems and remote sensing techniques, in order to provide the most efficient and simplest depth estimation model based on the accuracy of the resulting models for the ports. These three models were used with the main objective of using satellite data sets rather than extensive field studies. The novelty of this work is the application of the methodology proposed in port areas. The fundamental difference between the port areas that have been analyzed in this work and other areas in which similar studies have been carried out, are the characteristics of the bottom, polluted and darker areas compared to light and sandy areas. In addition, these bathymetric maps in port areas can identify areas with accumulation of sediments, so that areas in need of dredging can be easily detected as well as being applied in the future in different ports. The intention is to provide a fast, operational and low cost alternative to traditional bathymetry with which to assess the need for port maintenance dredging.

Areas of Study and Field Measurements
The sites that were studied were the Port of Candás (43 • 35 25" N to 5 • 45 43" W) and the Port of Luarca (43 • 32 45" N to 6 • 32 1" W), located on the Cantabrian Sea coast (Bay of Biscay) on Spain's northern coast. The Port of Candás ( Figure 1) has been the object of extensive rebuilding and extensions, although mainly since the 18th century. In the early 1950s, the dock was expanded leading to a gradual silting. This resulted in a gradual decrease in the draft of the port. Recently, work has been undertaken to improve this situation with new extensions to the port's levees. The port's traffic today consists of cargo vessels and recreational boats. However, the minimum draft in the port's operating varies from 1 m near the docks for small boats to 3.5 m for the navigation channel. The water in the access area reaches a depth of up to 5 m. the resulting models for the ports. These three models were used with the main objective of using satellite data sets rather than extensive field studies. The novelty of this work is the application of the methodology proposed in port areas. The fundamental difference between the port areas that have been analyzed in this work and other areas in which similar studies have been carried out, are the characteristics of the bottom, polluted and darker areas compared to light and sandy areas. In addition, these bathymetric maps in port areas can identify areas with accumulation of sediments, so that areas in need of dredging can be easily detected as well as being applied in the future in different ports. The intention is to provide a fast, operational and low cost alternative to traditional bathymetry with which to assess the need for port maintenance dredging.

Areas of Study and Field Measurements
The sites that were studied were the Port of Candás (43°35'25″ N to 5°45'43″ W) and the Port of Luarca (43°32'45″ N to 6°32'1″ O), located on the Cantabrian Sea coast (Bay of Biscay) on Spain's northern coast. The Port of Candás ( Figure 1) has been the object of extensive rebuilding and extensions, although mainly since the 18th century. In the early 1950s, the dock was expanded leading to a gradual silting. This resulted in a gradual decrease in the draft of the port. Recently, work has been undertaken to improve this situation with new extensions to the port's levees. The port's traffic today consists of cargo vessels and recreational boats. However, the minimum draft in the port's operating varies from 1 m near the docks for small boats to 3.5 m for the navigation channel. The water in the access area reaches a depth of up to 5 m.  The second study site was the Port of Luarca ( Figure 1). From its origin in the 10th century, Luarca has been linked to maritime activity and a fishing enclave. However, it was not until the 20th century that Luarca's outer breakwaters were built. Because only small boats use the port, the minimum draft ranges from two meters in the docking area to three meters in the navigation canal and as much as twelve meters in the access area. The docking area and the navigation channel are dredged every year to maintain the necessary depth of water.
The bathymetry data of the Candás and Luarca ports were provided by the port service. The latter determines the quality of the water in the port each year and, also, any morphological changes. For both ports, the measurements were carried out using a Navisound 210 sounder (Reson, Inc.; Slangerup, Denmark) single beam echo, with a variable frequency acoustic profiler between 201 kHz and 33 kHz, and a 1 cm vertical precision. A Differential Global Positioning System (D-GPS) determined the position. Because the depth measurement data is affected by the tide, the depth measurement referred to the mean sea level. Table 1   The bathymetry elevations in this zone were referenced to the UTM/WGS84 projection ZONE 30N. The images were acquired on 16 October 2016; 12 March and 29 April 2019 for Candás and on 28 June 2016; 10 May 2018 and 18 May 2019 for Luarca. All images were acquired during calm weather. They were selected as the reference data set and compared to the satellite-delivered bathymetry products for each study area.

Satellite Data Acquisition
Sentinel-2 is the latest generation of the European Space Agency (ESA) [37]. The Copernicus program is an ambitious program for Earth observation than has been designed to obtain current and accurate information that can be accessed easily. The data from the Sentinel-2 satellite were used to predict the water depth of the study ports. The Sentinel-2 satellite conducted measurements in 13 spectral bands. Its spatial resolutions extended from 10 to 60. The spectral channels of the satellite include four bands of a 10 m spatial resolution. They were B2 (blue), B3 (green), B4 (red) and B8 (near infrared). There are also six bands of 20 m spatial resolution, four of which are used for the characterization of the vegetation in the bands B5, B6, B7 and B8a (red-edge). The two other bands are used for applications, such as the detection of clouds, snow or ice (B11 and B12). Finally, three bands of 60 m spatial resolution were used for atmospheric corrections and cloud screening. These were aerosols (B1), water vapor (B9) and cirrus detection (B10). The characteristics of specified spectral bands and their resolutions are shown in Figure 2. The satellite data were selected based on the proximity to the date of the in situ bathymetry and the least amount of clouds at the time of data acquisition ( Figure 2). Energies 2021, 14, x FOR PEER REVIEW 5 of 22

Pre-Processing of Satellite Images
Data captured by Sentinel-2A satellite, which currently is in orbit, is available without charge under an open license through portals, such as the Copernicus Open Hub. To visualize and preprocess Sentinel-2A data (10 m resolution), SNAP (Sentinel Application Platform) software (v7.0.1) was used [38]. This is an open source architecture that combines all toolboxes from the ESA. The data from the Sentinel-2A satellite reflectance bands (1, 2, 3, 4, 5, 6, 7, 8, 8A, 9, 11 and 12) were used to predict the depth of the water at the study ports. A resampling of all the spectral bands of the satellite images was performed, transforming the resolution of all of them to a resolution of 10 × 10 m [39,40]. This was done with SNAP software using the S2 Resampling Processor. As a result, a dataset without georeferencing was obtained, and to determine the positioning of the reference points, the geographical location of each point was defined by its longitude and latitude using the SNAP program. Then, using the WGS84 ellipsoid, a coordinate projection was created to, obtain the coordinates in ETRS89. This system was also used to project the positions that the echo sounder provides. The ellipsoid projections have an average position error of 1 cm. The data that was obtained was compared to the bathymetry that was projected. To accomplish this, a geodesic calculator was used to project the coordinates. This enabled the authors to obtain the data for bands that are associated with Universal Transverse Mercator (UTM). The annual in-situ bathymetries of the study ports that were provided for the Principality of Asturias Port Service were used to assign the z coordinate. Bathymetries were carried out by means of a single beam echo sounder located on a boat. Data are measured every 10 cm in each data acquisition beam. From the points, the surfaces were obtained using digital terrain models. Linear interpolation was used in the triangulation in this case. The error that was incurred in this process is not great due to the seabed's smooth surface and lack of significant irregularities. The z coordinates are related to the port's zero. The dimensions of each pixel are assigned on the basis of its former x-y location. The method of analysis and pre-processing the Sentinel-2 images is shown schematically in Figure 3.

Pre-Processing of Satellite Images
Data captured by Sentinel-2A satellite, which currently is in orbit, is available without charge under an open license through portals, such as the Copernicus Open Hub. To visualize and preprocess Sentinel-2A data (10 m resolution), SNAP (Sentinel Application Platform) software (v7.0.1) was used [38]. This is an open source architecture that combines all toolboxes from the ESA. The data from the Sentinel-2A satellite reflectance bands (1, 2, 3, 4, 5, 6, 7, 8, 8A, 9, 11 and 12) were used to predict the depth of the water at the study ports. A resampling of all the spectral bands of the satellite images was performed, transforming the resolution of all of them to a resolution of 10 × 10 m [39,40]. This was done with SNAP software using the S2 Resampling Processor. As a result, a dataset without georeferencing was obtained, and to determine the positioning of the reference points, the geographical location of each point was defined by its longitude and latitude using the SNAP program. Then, using the WGS84 ellipsoid, a coordinate projection was created to, obtain the coordinates in ETRS89. This system was also used to project the positions that the echo sounder provides. The ellipsoid projections have an average position error of 1 cm. The data that was obtained was compared to the bathymetry that was projected. To accomplish this, a geodesic calculator was used to project the coordinates. This enabled the authors to obtain the data for bands that are associated with Universal Transverse Mercator (UTM). The annual in-situ bathymetries of the study ports that were provided for the Principality of Asturias Port Service were used to assign the z coordinate. Bathymetries were carried out by means of a single beam echo sounder located on a boat. Data are measured every 10 cm in each data acquisition beam. From the points, the surfaces were obtained using digital terrain models. Linear interpolation was used in the triangulation in this case. The error that was incurred in this process is not great due to the seabed's smooth surface and lack of significant irregularities. The z coordinates are related to the port's zero. The dimensions of each pixel are assigned on the basis of its former x-y location. The method of analysis and pre-processing the Sentinel-2 images is shown schematically in Figure 3.

Support Vector Machines
The support vector machine is a widely used linear regression technique [41][42][43][44]. This technique provides higher accuracy when inputs are properly selected. SVM uses a kernel-based algorithm. Its new input estimations require the kernel function's evaluation of a subcategory of events during a training stage. The challenge with this method is in identifying a function to minimize Equation (1)'s final error.
where y(x) is the predicted value, w is a vector with parameters that the model defines, b is the value of the bias and ϕ(x) denotes the feature-space transformation. In this case, the error function i in the linear regression (Equation (1)) is replaced by an ϵ insensitive error function (Equation (2)). Equation (3) assigns a zero to value if exceeds the difference between the predicted value and target value. If the difference is equal to, or exceeds, , the error function's value remains unchanged. Equation (4) can be minimized by assigning a cost (C) to the difference between the predicted value and the targeted value.

Support Vector Machines
The support vector machine is a widely used linear regression technique [41][42][43][44]. This technique provides higher accuracy when inputs are properly selected. SVM uses a kernel-based algorithm. Its new input estimations require the kernel function's evaluation of a subcategory of events during a training stage. The challenge with this method is in identifying a function to minimize Equation (1)'s final error.
where y(x) is the predicted value, w is a vector with parameters that the model defines, b is the value of the bias and φ(x) denotes the feature-space transformation. In this case, the error function i in the linear regression (Equation (1)) is replaced by an insensitive error function (Equation (2)). Equation (3) assigns a zero to value if exceeds the difference between the predicted value and target value. If the difference is equal to, or exceeds, , the error function's value remains unchanged. Equation (4) can be minimized by assigning a cost (C) to the difference between the predicted value and the targeted value.
where, is the margin if the function fails to impose a penalty, t represents the searched target function, C is the penalty and y(x) is the value that Equation (1) predicted. The final function resembles Equation (5).
where α is a solution for the occasionally encountered optimization problem with the Lagrangian Theory. The Gaussian Radial Basis Function (RBF) is generally the best kernel. It ensures the highest overall accuracy and Kappa [45]. The RBF function was used in this study (Equation (6)).
The SVM was conducted in the R statistical computing environment using the "e1071" package (version 1.7-1, The R Foundation for Statistical Computing, Vienna, Austria) [46].

Random Forest
Random Forest refers to a model that was developed by Breiman [47]. It provides classification and regression. That creates model numerous classification trees by use of a randomized subset of predictors [48]. The algorithm grows many of these trees. Each tree begins as a sample of training data. The tree building process involves a random subset of predictor variables, which are used at each fork in the process. Thus, each tree is unique. The basic principle employed is that each tree is a poor predictor, but any pair of trees provide very different responses, thereby aggregating the predictions of uncorrelated trees. This reduces the prediction variance and improves accuracy [49][50][51]. The number of trees in this work was set at 100. Fifteen randomly selected variables were assigned to ted for each node. The minimum size of nodes was set at the default values. The RF algorithm was implemented by the Random Forest (v 4.6-2, The R Foundation for Statistical Computing, Vienna, Austria) [52] R package [46], to predicted bathymetry maps.

Multi-Adaptive Regression Splines
The Multi-Adaptive Regression Splines algorithm [53] is a nonparametric multiple regression method that uses adaptively selected spline functions [54]. Although based on linear relationships, it identifies and simulates a model with coefficients that change with changes in the predictor variable's level [45]. Depth water models were constructed by use of the Earth package [55] under R environment [46].
The MARS principle is based on the linear basis functions of Equations (7) and (8).
where c is the connecting knot or intersection between successive splines. The MARS model takes the form of a linear combination of these basis functions, as Equation (9) [56].
where B i (x) is the basis functions, β o is the bias, and β i is the coefficients of basic functions that are calculated by a least square method. The letter n is number of terms in the model and is calculated after two successive steps.

Data Processing
In data mining, data preparation is one of the essential steps for modeling. In this study the collected dataset from the two ports was randomly divided into two parts, in order to properly validate the models, 80% of the data were used for training and the remaining 20% for testing. The data selection was carried out randomly and was tested with 5 different data sets, the mean of the 5 tests was taken as error. The data points for the port of Candás were divided into 1092 model generation points and 284 validation points. The data points that were used for the Luarca model were 1593 and 388 for training and testing respectively. The training and testing data set was randomly selected, five different groups of random data were generated, validating that the dataset was homogeneous. In this case, this validation system was used, although other authors use cross-validation systems [57,58]. It is necessary to use a valid method for implementing the remote sensing bathymetric measurements of an area from optical images. Thus, to choose the appropriate variables and to decide on corrections to make to the images, a Principal Component Analysis (PCA) was conducted. PCA is based on the component substitution of the original data for spectral transformation [59]. It is used in this work to minimize repetitive information within strongly correlated Sentinel-2 bands and to produce a set of linearly uncorrelated variable values, which are known as principal components (PC). PC1 is considered to contain the greatest amount of information from an original multispectral image with the greatest variance (74.2% in this case), whereas PC2 explains 14.0% of the total variance. Figure 4 represents the projected data by using the principal components, PC1 and PC2. It is apparent that there are two clearly differentiated data groups, a group of data from the port of Luarca (blue points) and a group with data from both ports. The reason for this is that the depth range in Candás is much less than in Luarca. Therefore, there is a relationship between the points in shallower areas and the B9 band and the tide. Also, there is an area of greater depths that is more related to the remaining bands.
where Bi (x) is the basis functions, is the bias, and is the coefficients of basic functions that are calculated by a least square method. The letter n is number of terms in the model and is calculated after two successive steps.

Data Processing
In data mining, data preparation is one of the essential steps for modeling. In this study the collected dataset from the two ports was randomly divided into two parts, in order to properly validate the models, 80% of the data were used for training and the remaining 20% for testing. The data selection was carried out randomly and was tested with 5 different data sets, the mean of the 5 tests was taken as error. The data points for the port of Candás were divided into 1092 model generation points and 284 validation points. The data points that were used for the Luarca model were 1593 and 388 for training and testing respectively. The training and testing data set was randomly selected, five different groups of random data were generated, validating that the dataset was homogeneous. In this case, this validation system was used, although other authors use cross-validation systems [57,58]. It is necessary to use a valid method for implementing the remote sensing bathymetric measurements of an area from optical images. Thus, to choose the appropriate variables and to decide on corrections to make to the images, a Principal Component Analysis (PCA) was conducted. PCA is based on the component substitution of the original data for spectral transformation [59]. It is used in this work to minimize repetitive information within strongly correlated Sentinel-2 bands and to produce a set of linearly uncorrelated variable values, which are known as principal components (PC). PC1 is considered to contain the greatest amount of information from an original multispectral image with the greatest variance (74.2% in this case), whereas PC2 explains 14.0% of the total variance. Figure 4 represents the projected data by using the principal components, PC1 and PC2. It is apparent that there are two clearly differentiated data groups, a group of data from the port of Luarca (blue points) and a group with data from both ports. The reason for this is that the depth range in Candás is much less than in Luarca. Therefore, there is a relationship between the points in shallower areas and the B9 band and the tide. Also, there is an area of greater depths that is more related to the remaining bands. In addition, Figure 4 shows a strong relationship between all bands, except for band B9. In remote sensing, the adjacent Sentinel-2 bands are correlated to each other. The correlation coefficients (R 2 ) between all variables studied appear in Figure 5. They indicate a strong statistical relationship among Sentinel-2 bands. A coefficient of correlation of 1.0 indicates a perfect correlation between the two variables. In contrast, a coefficient of 0.0 indicates that the two variables are not correlated at all [60]. Thus, the bands with the highest coefficient of correlation, were chosen for data modeling.  In addition, Figure 4 shows a strong relationship between all bands, except for band B9. In remote sensing, the adjacent Sentinel-2 bands are correlated to each other. The correlation coefficients (R 2 ) between all variables studied appear in Figure 5. They indicate a strong statistical relationship among Sentinel-2 bands. A coefficient of correlation of 1.0 indicates a perfect correlation between the two variables. In contrast, a coefficient of 0.0 indicates that the two variables are not correlated at all [60]. Thus, the bands with the highest coefficient of correlation, were chosen for data modeling. Lyzenga [20] used two bands to offset the disadvantages of using a single-band linear correlation of reflectance (R( i)) and water depths (Z). It is assumed that the column of water was uniform and the bottom's surface was homogenous (Equation (10)).

= log ( ) +
The ratio algorithm (Equation (11)) estimates depth without a need for bottom reflectance [61] = log ( ) log + where R( i) and R( j) are reflectance in bands i and j. As in the case of a linear algorithm, information from any bands in the satellite image can be transformed into a multiple linear regression by Equation (12)  Lyzenga [20] used two bands to offset the disadvantages of using a single-band linear correlation of reflectance (R(λ i )) and water depths (Z). It is assumed that the column of water was uniform and the bottom's surface was homogenous (Equation (10)).
The ratio algorithm (Equation (11)) estimates depth without a need for bottom reflectance [61] where R(λ i ) and R(λ j ) are reflectance in bands i and j. As in the case of a linear algorithm, information from any bands in the satellite image can be transformed into a multiple linear regression by Equation (12) [61].
Stumpf et al. [61] suggested a linear model, although it did not always reveal a linear relationship between water depth and the dates of satellites. It is better to obtain this by examining the relationship between a non-linear function and depth (Z), (Equation (13)).
where R w is the observed reflectance of the wave length (λ) of bands i and j, and n is a fixed value. Each algorithm (SVM, RF and MARS) was trained by using two different options, in which the input variables vary. In the first option, the (B i ) bands were used as input variables to obtain the bathymetry maps by analyzing water-leaving reflectance. This option had been used previously by several authors [30,31,42]. In the second option, a band ratio method was used to estimate water depths by using, as input variables, two radiance bands through the relationship that had been observed in Equation (14). This technique has been used by many researchers [27,62]. To date, the results that have been obtained by the two previously proposed options have not been compared. The analysis of the results will enable selection of the option that is best in selecting the input variables for each implemented algorithm and obtaining the simplest model.

Results
Three statistical metrics were used to compare the accuracies of the SVM, RF and MARS models. They were the Mean Absolute Error (MAE), the Root Mean Squared Error (RMSE), and the correlation coefficient (adjusted R 2 ). These are calculated by the following Equations (14)- (16).
where Z Sentimel are the depths that were predicted by the three proposed methodologies (SVM, RF and MARS) from satellite images. Z echo is the in-situ echo sounding depths and N is the number of data.
In case of Candás Port, results of the testing data set appear in Table 2 (the best results appear in bolded fonts). The RF and SVM methods achieved good predictive performance. Thus, predictions of depths by the two models improved greatly there in comparison to MARS. All models were implemented. Their R 2 , MAE and RMSE values were analyzed for both options, using the bands as input variables (Bands (B i )) and using spectral band pairs with a high coefficient of determination estimated by Equation (13) as input variables (Ratios (LB i /LB j )). Table 2 shows that the most consistent method, according to the correlation coefficients, was RF-Bands (0.92). This was followed closely by RF-Ratios (0.87), and SVM (0.85 and 0.74 for Ratios and Bands respectively). The MARS (0.62 for Ratios and 0.69 for Bands) gave the poorest predictive performance. The best performance for MAE in each of the proposed algorithms was achieved by using RF-Bands (0.27 m), followed by SVM-Ratios (0.34 m) and MARS-Ratios (0.50 m). According to RMSE, random forest models provided the most robust methodology, especially for RF-Bands to excellent performance (0.33 m). It was followed closely by SVM-Ratios (0.44 m). MARS (0.59 m) varied the most.  Figure 6 shows the bathymetry map of Candás that was created by using the echo sounder measurements, and Figure 7 represents the bathymetry maps of Candás in a comparison and evaluation of the best performance of the three proposed methodologies that were obtained with ratio bands (LB i /LB j as input variables) and the SVM algorithm (Figure 7a), the RF algorithm ( Figure 7b) and MARS algorithm (Figure 7c). Figure 8 represents the water depth maps of Candás that were produced by using bands (B i ) as input variables and the SVM algorithm (Figure 8a), the RF algorithm ( Figure 8b) and the MARS algorithm (Figure 8c). Figures 7 and 8b show that RF-Bands algorithm is very effective in prediction depths from satellite images. This algorithm produced the fewest errors. Figure 7b indicates that there are fewer areas of low points or high points. This corresponds to reality, as see in the bathymetry by an echo sounder (Figure 6), the transitions and slopes are smooth, which corresponds to the actual seabed. It can be concluded that the best results in the graphic representation are provided by Figures 7b and 8b and associated with Random Forest. However, it is the latter (Figure 8b) that is associated with modeling with the use of the bands, with similarity to the results of using the echo sounder measurements ( Figure 6).  Table 2 shows that the most consistent method, according to the correlation coefficients, was RF-Bands (0.92). This was followed closely by RF-Ratios (0.87), and SVM (0.85 and 0.74 for Ratios and Bands respectively). The MARS (0.62 for Ratios and 0.69 for Bands) gave the poorest predictive performance. The best performance for MAE in each of the proposed algorithms was achieved by using RF-Bands (0.27 m), followed by SVM-Ratios (0.34 m) and MARS-Ratios (0.50 m). According to RMSE, random forest models provided the most robust methodology, especially for RF-Bands to excellent performance (0.33 m). It was followed closely by SVM-Ratios (0.44 m). MARS (0.59 m) varied the most. Figure 6 shows the bathymetry map of Candás that was created by using the echo sounder measurements, and Figure 7 represents the bathymetry maps of Candás in a comparison and evaluation of the best performance of the three proposed methodologies that were obtained with ratio bands (LBi/LBj as input variables) and the SVM algorithm ( Figure  7a), the RF algorithm ( Figure 7b) and MARS algorithm (Figure 7c). Figure 8 represents the water depth maps of Candás that were produced by using bands (Bi) as input variables and the SVM algorithm (Figure 8a), the RF algorithm ( Figure  8b) and the MARS algorithm (Figure 8c). Figures 7 and 8b show that RF-Bands algorithm is very effective in prediction depths from satellite images. This algorithm produced the fewest errors. Figure 7b indicates that there are fewer areas of low points or high points. This corresponds to reality, as see in the bathymetry by an echo sounder (Figure 6), the transitions and slopes are smooth, which corresponds to the actual seabed. It can be concluded that the best results in the graphic representation are provided by Figures 7b and  8b and associated with Random Forest. However, it is the latter (Figure 8b) that is associated with modeling with the use of the bands, with similarity to the results of using the echo sounder measurements ( Figure 6).    Testing dataset results for Port of Luarca appear in Table 3 (the best results are bolded). As for Candás, the RF and SVM methods provided good predictive performance in comparison to MARS. Table 3 shows that the most consistent method according to the correlation coefficients was RF-Bands (0.974). This was followed closely by SVM-Ratios (0.973), RF-Ratios (0.96) and SVM-Bands (0.96), whereas MARS-Ratios (0.95) gave the poorest predictive performance. The best performance for MAE in each of the proposed algorithms was achieved when using RF-Bands (0.37 m) and SVM-Ratios (0.37 m). This was followed by MARS-Bands (0.48 m). According to RMSE, SVM models had the most robust methodology, especially for Ratios, which gave an excellent performance (0.46 m), followed closely by RF-Bands (0.47 m). The performance of MARS (0.59 m) varied the most.  Figure 9 shows the bathymetry map of Luarca that was created by using the echo sounder measurements, and Figure 10 represents the bathymetry maps of Luarca in a comparison and evaluation of the best performance of the three proposed methodologies that were created using ratios bands (LB i /LB j ) as input variables, and the SVM algorithm (Figure 10a), the RF algorithm ( Figure 10b) and the MARS algorithm (Figure 10c). Figure 11 shows the bathymetry maps of Luarca using bands (B i ) as input variables and the SVM algorithm (Figure 11a), the RF algorithm ( Figure 11b) and the MARS algorithm (Figure 11c). From Figures 9-11, it can be concluded that the SVM and RF algorithms are a very effective predictors of depths from satellite images. In the methodologies (RF and SVM) that generated the best results, smooth curves can be seen that are substantially parallel to the beach. Testing dataset results for Port of Luarca appear in Table 3 (the best results are bolded). As for Candás, the RF and SVM methods provided good predictive performance in comparison to MARS. Table 3 shows that the most consistent method according to the correlation coefficients was RF-Bands (0.974). This was followed closely by SVM-Ratios (0.973), RF-Ratios (0.96) and SVM-Bands (0.96), whereas MARS-Ratios (0.95) gave the poorest predictive performance. The best performance for MAE in each of the proposed algorithms was achieved when using RF-Bands (0.37 m) and SVM-Ratios (0.37 m). This was followed by MARS-Bands (0.48 m). According to RMSE, SVM models had the most robust methodology, especially for Ratios, which gave an excellent performance (0.46 m), followed closely by RF-Bands (0.47 m). The performance of MARS (0.59 m) varied the most.  Figure 9 shows the bathymetry map of Luarca that was created by using the echo sounder measurements, and Figure 10 represents the bathymetry maps of Luarca in a comparison and evaluation of the best performance of the three proposed methodologies that were created using ratios bands (LBi/LBj) as input variables, and the SVM algorithm (Figure 10a), the RF algorithm ( Figure 10b) and the MARS algorithm (Figure 10c). Figure  11 shows the bathymetry maps of Luarca using bands (Bi) as input variables and the SVM algorithm (Figure 11a), the RF algorithm ( Figure 11b) and the MARS algorithm ( Figure  11c). From Figures 9 to 11, it can be concluded that the SVM and RF algorithms are a very effective predictors of depths from satellite images. In the methodologies (RF and SVM) that generated the best results, smooth curves can be seen that are substantially parallel to the beach.    The research that is presented in this work suggests that RF models detect depths in port waters well and can replace echo-sounded bathymetry measurements to measure port depths. The studies demonstrated that all methods provided good predictive performance of models. In Tables 2 and 3, it can be observed that the values of correlation coefficients using the three proposed algorithms are very close to 1.0 (which is very high). The worst R 2 was obtained when using the MARS algorithm for Candás. The results that were obtained with MARS provided the greatest error, although the latter is the algorithm that best adapts to the generation of surfaces. The reason is that the position of the points was not used as an input variable, due to a desire to obtain a model that is as general as possible. Thus, it is so generic that a single model has been created for two ports with different characteristics. In all of the bathymetric maps for all models (Figures 7, 8, 10 and 11), the algorithms identified correctly the deepest areas, areas where the depth is lowest with smooth transitions and coastal contour lines. Complex areas that are very shallow also were identified correctly.
In order to analyze the factors that may influence the errors, their relationship with depth was analyzed. The global errors of Candás and Luarca have been analyzed jointly. The results appear in Table 4. This table shows how in the studied models the error did not increase with increasing depth.  The research that is presented in this work suggests that RF models detect depths in port waters well and can replace echo-sounded bathymetry measurements to measure port depths. The studies demonstrated that all methods provided good predictive performance of models. In Tables 2 and 3, it can be observed that the values of correlation coefficients using the three proposed algorithms are very close to 1.0 (which is very high). The worst R 2 was obtained when using the MARS algorithm for Candás. The results that were obtained with MARS provided the greatest error, although the latter is the algorithm that best adapts to the generation of surfaces. The reason is that the position of the points was not used as an input variable, due to a desire to obtain a model that is as general as possible. Thus, it is so generic that a single model has been created for two ports with different characteristics. In all of the bathymetric maps for all models (Figures 7, 8, 10 and 11), the algorithms identified correctly the deepest areas, areas where the depth is lowest with smooth transitions and coastal contour lines. Complex areas that are very shallow also were identified correctly.
In order to analyze the factors that may influence the errors, their relationship with depth was analyzed. The global errors of Candás and Luarca have been analyzed jointly. The results appear in Table 4. This table shows how in the studied models the error did not increase with increasing depth.   As can be seen in Figure 12 and Table 4, the RF-Band technique gives better results in the depths between +2 m to −2 m and −8 m to −10 m. SVM-Ratio algorithms have better As can be seen in Figure 12 and Table 4, the RF-Band technique gives better results in the depths between +2 m to −2 m and −8 m to −10 m. SVM-Ratio algorithms have better results at between −4 m and −8 m and −10 to −12 m. In the range between −2 m and −4 m, the MAE results for the two algorithms are very similar.

Discussion
In machine learning, there is no single algorithm or solution that adapts to all the analyzed data, so it is quite common to work with several algorithms to find the best or most adjusted solutions. To our knowledge, no study previously has compared the use of SVM, RF, and MARS to study bathymetric mapping for the determination of depth in anthropized water areas, including ports that experience contamination and processes of accretion. Bathymetric mapping requires is highly precise design characteristics. The research that is presented in this work suggests that RF models detect depths in port waters well and can replace echo-sounded bathymetry measurements to measure port depths. The models that are presented in this study produced fewer errors in Candás than in Luarca. It is also necessary to consider that the bottoms of the port of Candás is sandy. In Luarca, there are areas of greater depth, although there are more rocky outcrops. The bathymetry obtained permits an understanding of the state of the port areas and where there is less draft and, therefore, a greater deposit of sediments. The latter makes it important to have this system of obtaining the bathymetry to reveal the areas where dredging is necessary for proper operation of the port. Thus, the main advantage of the technique that was implemented in this study is acquiring a greater understanding of the topography of the seabed at ports. The technique offers a high level of precision, applicability in areas of turbid and shallow water, rapidity of use and flexibility. Finally, the proposed method is economical. Thus, the results of this work offer a valuable contribution to the provision of useful information for the management of port maintenance dredging.
To compare the results that have been obtained with previous results from other authors studies, many variables, such as depth range, nature of bottom, image quality and water-quality should be considered [63]. Many authors [39,56,64] have applied SVM and RF methods to estimate the bathymetry in shallow water using satellite imagery. The errors that those have found are greater than those in this study. The cause could be the color and turbidity, because the bottoms were muddy and contaminated. They absorbed more light than did the sandy bottoms below waters of high transparency that many studies have examined. However, the errors obtained in this work are lower than those obtained in [32], where the authors proposed empirical approaches of bathymetry estimations in different locations with a silt-sand bottom water area, and a high-turbidity, clay bottom area. Using free and open-access satellite data that does not provide a resolution that is less than what other authors experiences and the satellites that they used may also have affected these results. However, unlike the studies cited above, it is important to note that the proposed methodology is applied to study anthropized water areas, not coastal areas whose bottoms are cleaner and the waters are clearer. In fact, it is in these types of areas where this system is most useful since it would allow the analysis of the sedimentation process that occurs in ports.
This study confirms the viability of machine learning models using Sentinel-2 images, we have proposed a methodology to build the best performance model that could be applied to different anthropized water areas. Sentinel-2 images can be used effectively to determine bathymetry in the study area, and this methodology could be extended to different ports in the Principality of Asturias that have similar characteristics for water. The use of this methodology could also be extended to other ports in the world. In future work, it could be interesting to analyze the composition of the seabed looking for different algorithms based on that composition.

Conclusions
This approach brings a new perspective in the subject of determination of water depth in ports using remote sensing technology, this technology is considered a timeeffective, low-cost, and wide-coverage solution. It is also a supplement and improvement to traditional bathymetric measurement methods and techniques. This study compared SVM, RF and MARS methods of bathymetry prediction using Sentinel-2 images, in order to propose a simple and robust model for bathymetry mapping in anthropized water areas. This depth estimation is needed for the dredging processes, especially for maintaining the free draft and adequate port management, also the analysis of the behavior of the bottom of the ports provides valuable information the knowledge of their behavior in the face of littoral dynamics. The algorithms were applied in two different ports-in Candás and Luarca (Asturias, Spain)-with different numbers of available data points. The depths that were determined were compared to those that were produced by a depth sounder in-situ measurements. The three proposed approaches used bands and bands logarithms ratios as input variables. The errors obtained were admissible since the oscillations in the background due to the storms have an order of magnitude greater than the errors obtained in the models.
At Candás, the RF method provided the best bathymetry predictions. Further, its results were most consistent, according to the correlation coefficients RF-Bands (0.92). The method's best performance was achieved by the use of the RF-Bands (0.27 m). The Random Forest model was the most robust methodology for RMSE, especially for the Bands option where it gave an excellent performance (0.33 m). In case of Luarca, the coefficient of determination that was obtained was very strong in case of RF-Bands (0.974). It is closely followed by SVM-Ratios (0.973). For MAE, the highest performance was achieved using RF-Bands and SVM-Ratios (0.37 m). For RMSE, the SVM models had the most robust methodology, especially for Ratios, with an excellent performance (0.46 m), closely followed by RF-Bands (0.47 m).
The results that the RF and SVM algorithms provided exceeded those of the MARS algorithm. In addition, the RF method produced results that were more accurate in Candás, and the SVM method in Luarca. The difference in results between the two models is very small. It should be noted that the best RF result was obtained from the Bands. For SVM, the best result came from the use of the Ratios. Therefore, in order to choose a single model, RF is considered best due to its simplicity and its need for fewer input variables. Validation method used in this work use randomly chosen training and testing sets, which ignore spatial autocorrelation (SAC) in data. This may lead to overoptimistic assessment of model predictive power [58]. Our intention is to address SAC properly in future studies. Also, in future work, a very interesting option could be to use both techniques, applying one or the other depending on the previous depth result.