Quantifying Marine Macro Litter Abundance on a Sandy Beach Using Unmanned Aerial Systems and Object-Oriented Machine Learning Methods

: Unmanned aerial systems (UASs) have recently been proven to be valuable remote sensing tools for detecting marine macro litter (MML), with the potential of supporting pollution monitoring programs on coasts. Very low altitude images, acquired with a low-cost RGB camera onboard a UAS on a sandy beach, were used to characterize the abundance of stranded macro litter. We developed an object-oriented classiﬁcation strategy for automatically identifying the marine macro litter items on a UAS-based orthomosaic. A comparison is presented among three automated object-oriented machine learning (OOML) techniques, namely random forest (RF), support vector machine (SVM), and k-nearest neighbor (KNN). Overall, the detection was satisfactory for the three techniques, with mean F-scores of 65% for KNN, 68% for SVM, and 72% for RF. A comparison with manual detection showed that the RF technique was the most accurate OOML macro litter detector, as it returned the best overall detection quality (F-score) with the lowest number of false positives. Because the number of tuning parameters varied among the three automated machine learning techniques and considering that the three generated abundance maps correlated similarly with the abundance map produced manually, the simplest KNN classiﬁer was preferred to the more complex RF. This work contributes to advances in remote sensing marine litter surveys on coasts, optimizing the automated detection on UAS-derived orthomosaics. MML abundance maps, produced by UAS surveys, assist coastal managers and authorities through environmental pollution monitoring programs. In addition, they contribute to search and evaluation of the mitigation measures and improve clean-up operations on coastal environments.


Introduction
The amount of anthropogenic debris in marine and coastal environments is increasing dramatically and constitutes a global issue. Monitoring the abundance and characterization of anthropogenic marine debris (or marine litter) becomes essential to identify the main sources [1,2] and to design effective mitigation measures [3,4]. In particular, as marine litter is present in large quantities on segmentation technique is directed by the relative object heterogeneity and internal homogeneity criteria, weighted by its spectral and shape characteristics [33].
The main objective of this work was to propose and evaluate a simple and cost-effective UAS-based approach for automatically generating MML abundance maps of sandy beaches. In this context, we evaluated the performance of three commonly used object-oriented machine learning classifiers (OOML), namely support vector machine (SVM), k-nearest neighbor (KNN), and random forest (RF), to automatically detect MML items on an orthomosaic derived from UAS flight. In addition, this work contributes to advances in remote sensing MML surveys by optimizing automated MML detection on UAS-derived orthomosaics. The MML abundance maps produced by the UAS surveys can assist environmental pollution monitoring programs and contribute to the search and evaluation of mitigation measures. Furthermore, these MML maps can also improve the clean-up activities on coastal environments carried out by governmental authorities in close partnerships with all stakeholders, including non-governmental organizations, municipalities, local communities, and the private sector.

Materials and Methods
A simple, cost-effective, and UAS-based framework was used for generating MML abundance maps of sandy beaches, in compliance with European Directives [34]. This framework, described in Figure 1, was composed of four operational steps. First, a very low altitude UAS flight was planned and the corresponding ultrahigh resolution images were acquired over the targeted area. Then, the image block was processed using a Structure from Motion and Multi Video Stereo (SfM-MVS) processing workflow to generate the digital surface model (DSM) and the orthomosaic. In the third step, the MML items were detected in the orthomosaic by a supervised and OOML classifier, which used a minimal training effort. In the last step, abundance maps were created by using the centroids of the macro litter objects classified in the previous step.
Remote Sens. 2020, 12, x FOR PEER REVIEW 3 of 19 classifiers (OOML), namely support vector machine (SVM), k-nearest neighbor (KNN), and random forest (RF), to automatically detect MML items on an orthomosaic derived from UAS flight. In addition, this work contributes to advances in remote sensing MML surveys by optimizing automated MML detection on UAS-derived orthomosaics. The MML abundance maps produced by the UAS surveys can assist environmental pollution monitoring programs and contribute to the search and evaluation of mitigation measures. Furthermore, these MML maps can also improve the clean-up activities on coastal environments carried out by governmental authorities in close partnerships with all stakeholders, including non-governmental organizations, municipalities, local communities, and the private sector.

Materials and Methods
A simple, cost-effective, and UAS-based framework was used for generating MML abundance maps of sandy beaches, in compliance with European Directives [34]. This framework, described in Figure 1, was composed of four operational steps. First, a very low altitude UAS flight was planned and the corresponding ultrahigh resolution images were acquired over the targeted area. Then, the image block was processed using a Structure from Motion and Multi Video Stereo (SfM-MVS) processing workflow to generate the digital surface model (DSM) and the orthomosaic. In the third step, the MML items were detected in the orthomosaic by a supervised and OOML classifier, which used a minimal training effort. In the last step, abundance maps were created by using the centroids of the macro litter objects classified in the previous step.

Study Area
Cabedelo Beach (40°08′12.8″N and 8°51′47.5″W, Figure 2) is a sandy beach located on the western Portuguese coast facing the North Atlantic Ocean (OSPAR area 5, Iberian coast, [34]), southward Mondego River estuary (Figueira da Foz). The beach is backed by a stabilized dune, with a crest height that varies between 5 and 10 m moving southwards (see Figure 2).

Field Data Acquisition and Unmanned Aerial Syetem (UAS) Survey
The acquisition of aerial images was performed with the quadcopter Phantom 4 Pro (Figure 3a) on 15 February 2019 at 12:30 a.m., a sunny day with clear sky and light wind. The choice of this aerial platform was driven by the need for the aircraft to be deployed in very small places and flown at a very low cruise speed. This rotary wing, which was significantly more affordable than most rotary wings, was equipped with a one-inch 20 megapixels CMOS (complementary metal oxide semicondutor) sensor (camera model FC6310, 24 mm full-frame equivalent) with a mechanical shutter. The camera was also combined with a three-axis brushless gimbal, that smooths the angular movements of the camera, dampens vibrations, and maintains the camera in a predefined position [35]. This component was essential to ensure good stabilization of the image acquisition process and to avoid blurring in the very low altitude images.

Field Data Acquisition and Unmanned Aerial Syetem (UAS) Survey
The acquisition of aerial images was performed with the quadcopter Phantom 4 Pro (Figure 3a) on 15 February 2019 at 12:30 a.m., a sunny day with clear sky and light wind. The choice of this aerial platform was driven by the need for the aircraft to be deployed in very small places and flown at a very low cruise speed. This rotary wing, which was significantly more affordable than most rotary wings, was equipped with a one-inch 20 megapixels CMOS (complementary metal oxide semicondutor) sensor (camera model FC6310, 24 mm full-frame equivalent) with a mechanical shutter. The camera was also combined with a three-axis brushless gimbal, that smooths the angular movements of the camera, dampens vibrations, and maintains the camera in a predefined position [35]. This component was essential to ensure good stabilization of the image acquisition process and to avoid blurring in the very low altitude images.
Concerning the image acquisition strategy and taking into account the current practices in UAS-based environmental monitoring [36], the following three main issues were considered: mission planning, UAS georeferencing accuracy, and camera settings. The mission planning must include all the parameters that allow the UAS to perform the flight autonomously. For nadiral image acquisition, the most important parameters are as follows: (i) nominal flight height, (ii) image overlap, (iii) geometry of surveyed area, and (iv) camera settings. On the basis of these parameters, the flight mission software computed, for the given camera model, the expected ground sampling distance (GSD) and the flight path (waypoints) to follow. In this work, mission planning was carried out using the freeware mobile application DroneDeploy (Figure 3b). The drone was set to fly at an altitude of 20 m, with the camera gimbal set to −90 • for capturing photos perpendicular to the direction of the flight (Figure 3b). The images with a resolution of 4864 × 3648 pixels (aspect ratio 4:3) were overlapped with 80% front and 70% side rates. The final image nominal spatial resolution (GSD), was 5.5 mm. Concerning the image acquisition strategy and taking into account the current practices in UASbased environmental monitoring [36], the following three main issues were considered: mission planning, UAS georeferencing accuracy, and camera settings. The mission planning must include all the parameters that allow the UAS to perform the flight autonomously. For nadiral image acquisition, the most important parameters are as follows: (i) nominal flight height, (ii) image overlap, (iii) geometry of surveyed area, and (iv) camera settings. On the basis of these parameters, the flight mission software computed, for the given camera model, the expected ground sampling distance (GSD) and the flight path (waypoints) to follow. In this work, mission planning was carried out using the freeware mobile application DroneDeploy (Figure 3b). The drone was set to fly at an altitude of 20 m, with the camera gimbal set to −90° for capturing photos perpendicular to the direction of the flight (Figure 3b). The images with a resolution of 4864 × 3648 pixels (aspect ratio 4:3) were overlapped with 80% front and 70% side rates. The final image nominal spatial resolution (GSD), was 5.5 mm.
In general, the positioning and image georeferencing accuracies of a UAS are driven by internal quality of the on-board Global Navigation Satellite System (GNSS) sensors. Using the waypoints computed by the mission planning software, the UAS performs an autonomous flight and records digital images with the specific camera settings at the indicated geographic positions. During the flight, the camera position and attitude are also recorded by the internal UAS GNSS sensors. However, Phantom 4 Pro navigation sensors are not accurate enough to perform a correct georeferencing of the derived geospatial products. Therefore, ground control points (GCPs) are needed for georeferencing digital surface model (DSM) and an orthomosaic in a specific cartographic coordinate system to eventually refine the auto-calibrated camera model. Along with GCPs, it is recommended to acquire additional points that can be used as independent check points (CHP) for assessing geometric accuracy of derived geospatial products. In order to maintain a low cost and simple approach, we acquired only five GCPs for georeferencing purposes and two CHPs for assessing the horizontal and vertical accuracy of the generated orthomosaic and DSM, respectively ( Figure 3c).
Regarding camera settings, the overall exposure of each image has a significant impact on the geometric and radiometric quality of the final UAS-based geospatial products [37]. The ISO, aperture, and shutter speed are the three fundamental camera settings that determines the image exposure. In this work, ISO, shutter speed, and aperture were set to 100, 1/1250 s, and f/3.2, respectively, in order to accommodate the scene to daytime illumination conditions and to obtain sharp and well-exposed image data. In general, the positioning and image georeferencing accuracies of a UAS are driven by internal quality of the on-board Global Navigation Satellite System (GNSS) sensors. Using the waypoints computed by the mission planning software, the UAS performs an autonomous flight and records digital images with the specific camera settings at the indicated geographic positions. During the flight, the camera position and attitude are also recorded by the internal UAS GNSS sensors. However, Phantom 4 Pro navigation sensors are not accurate enough to perform a correct georeferencing of the derived geospatial products. Therefore, ground control points (GCPs) are needed for georeferencing digital surface model (DSM) and an orthomosaic in a specific cartographic coordinate system to eventually refine the auto-calibrated camera model. Along with GCPs, it is recommended to acquire additional points that can be used as independent check points (CHP) for assessing geometric accuracy of derived geospatial products. In order to maintain a low cost and simple approach, we acquired only five GCPs for georeferencing purposes and two CHPs for assessing the horizontal and vertical accuracy of the generated orthomosaic and DSM, respectively ( Figure 3c).
Regarding camera settings, the overall exposure of each image has a significant impact on the geometric and radiometric quality of the final UAS-based geospatial products [37]. The ISO, aperture, and shutter speed are the three fundamental camera settings that determines the image exposure. In this work, ISO, shutter speed, and aperture were set to 100, 1/1250 s, and f/3.2, respectively, in order to accommodate the scene to daytime illumination conditions and to obtain sharp and well-exposed image data.

Structure from Motion and Multi Video Stereo (SfM-MVS) Processing
Generating a DSM (and the subsequent orthomosaic) from a block of overlapping images and processed with a SfM-MVS photogrammetric workflow requires that every part of the surface is imaged from two or more different positions [38,39]. The first step of this process consists of detecting features (keypoints) in each image and assigning a unique identifier to them, regardless of the image perspective and scale. The external orientation of the images (i.e., camera position and attitude) and the coordinates of the tiepoints (i.e., scene geometry) are then reconstructed simultaneously through the automatic identification of matching keypoints (tiepoints) in multiple images. These features, which are tracked from each image pair of the whole image block, allow one to estimate the initial camera positions and the object coordinates of tiepoints. Then, these initial values are simultaneously optimized in a bundle block adjustment (BBA), which minimizes the overall residual error and produces a self-consistent three-dimensional (3D) model with the associated camera parameters.
Agisoft Metashape (v. 1.5.3, [40]) was adopted as a Structure from Motion and Multi Video Stereo (SfM-MVS) processing software package to produce the digital surface model (DSM) and the related RGB orthomosaic. The processing strategy was divided into the following steps:

1.
Photo alignment Using the keypoints detected on each image, the process computes the internal camera parameters (e.g., lens distortion), the external orientation parameters for each image, and generates a sparse 3D point cloud.

2.
Georeferencing The geospatial 3D point cloud is assigned to a specific cartographic (or geographic) coordinate system.

3.
Camera optimization Camera calibration and the estimation of its interior orientation parameters are refined by an optimization procedure, which minimizes the sum of re-projection errors and reference coordinate misalignments. For this step, the sparse point cloud is statistically analyzed to delete misallocated points and to find the optimal re-projection solution.

4.
Dense matching The MVS dense matching technique generates a 3D dense point cloud from multiple images with optimized internal and external orientation parameters.

5.
DSM and orthomosaic generation The DSM is interpolated from the 3D dense point cloud, and consequently, the orthomosaic is generated from this DSM. It is worthwhile to note that we imaged a scene with low variation in height relative to the flying height. Therefore, the extra time-consuming steps of mesh generation and 3D texture mapping were not necessary for the generation of the orthomosaic.

Classification Preprocessing, Nomenclature, and Training Areas
Before classification, the UAS-based orthomosaic was cropped to a manually digitized outline of the beach area that was monitored and where the MML was present. The aim of this preprocessing step was to either simplifying the beach cover nomenclature or minimizing the negative influences of non-beach areas (dune, rocs, and walkways) on the classification procedure.
Considering that we were interested in mapping MML abundance on a sandy beach, a nomenclature (classification scheme) was carefully selected and defined (Table 1 and Figure 4) taking into account that the corresponding classes were as follows: (i) mutually exclusive; (ii) exhaustive, and if necessary (iii) hierarchical [41].  The previously mentioned literature supports that, at least, image segmentation, training sample, feature space, and tuning parameters can have a significant impact on classification accuracy and efficiency [42,43]. Collecting adequate training data is a time-consuming and expertise demanding task. However, as we wanted to propose a simple, easy, and accessible OBIA classification approach, we decided to use a rectangular training area, outlined manually over the orthomosaic, where the variability of each beach cover class was well represented. After carefully inspecting the orthomosaic, this training area was located at the south part of the study area and represented only one-third of the total surface area to be classified. Within this training area, several polygons representing each class where manually digitized in a GIS environment ( Figure 4).

Feature Space and Data Normalization
Implementing a successfully OBIA classification requires careful selection of suitable discriminating features (or variables) such as spectral signatures, vegetation indices, transformed images, as well as textural and contextual information [44]. In our case, the spectral dimensionality was restricted to the RGB wavelengths of the low-cost onboard UAS camera which is sensitive to illumination intensity. The bands of the RGB wavelengths are highly correlated, mixing the color and intensity information, and in general this color space is not perceptually uniform [27]. To overcome these limitations, and considering that MML is generally characterized by its strong manufactured color, we used transformed image features described by the following three additional color spaces (see Figure 5): hue-based (HSV), perceptually uniform (CIE-Lab), and luminance-based YCbCr [45]. For each color space, the color is described differently from the RGB additive color model [46]. In HSV (hue, saturation, and value), the color information is only contained in the hue channel. In CIE-Lab, the color information is contained in two chromaticity layers, i.e., the red-green axis (a) and the blue-yellow axis (b). In YCbCr, the intensity or luminance (Y) is easily discriminated from the two chrominance components: the blue (Cb) and the red (Cr). The previously mentioned literature supports that, at least, image segmentation, training sample, feature space, and tuning parameters can have a significant impact on classification accuracy and efficiency [42,43]. Collecting adequate training data is a time-consuming and expertise demanding task. However, as we wanted to propose a simple, easy, and accessible OBIA classification approach, we decided to use a rectangular training area, outlined manually over the orthomosaic, where the variability of each beach cover class was well represented. After carefully inspecting the orthomosaic, this training area was located at the south part of the study area and represented only one-third of the total surface area to be classified. Within this training area, several polygons representing each class where manually digitized in a GIS environment (Figure 4).

Feature Space and Data Normalization
Implementing a successfully OBIA classification requires careful selection of suitable discriminating features (or variables) such as spectral signatures, vegetation indices, transformed images, as well as textural and contextual information [44]. In our case, the spectral dimensionality was restricted to the RGB wavelengths of the low-cost onboard UAS camera which is sensitive to illumination intensity. The bands of the RGB wavelengths are highly correlated, mixing the color and intensity information, and in general this color space is not perceptually uniform [27]. To overcome these limitations, and considering that MML is generally characterized by its strong manufactured color, we used transformed image features described by the following three additional color spaces (see Figure 5): hue-based (HSV), perceptually uniform (CIE-Lab), and luminance-based YCbCr [45]. For each color space, the color is described differently from the RGB additive color model [46]. In HSV (hue, saturation, and value), the color information is only contained in the hue channel. In CIE-Lab, the color information is contained in two chromaticity layers, i.e., the red-green axis (a) and the blue-yellow axis (b). In YCbCr, Remote Sens. 2020, 12, 2599 8 of 19 the intensity or luminance (Y) is easily discriminated from the two chrominance components: the blue (Cb) and the red (Cr).
Considering that the color space transformations generated a mixture of spectral bands (RGB) with synthetic bands, data normalization was important for some classifiers to treat each band equally. For the SVM and KNN classifiers, bands were normalized by using linear scaling to produce a range from zero to one.

Image Segmentation
Segmentation is the process of dividing the image into non-overlapping image objects that are spatially and spectrally homogeneous. As a first and most critical step of OBIA classification [47], the quality of the image segmentation has a significant impact on the classification accuracy. Oversegmented objects which contain only a part of the target object class, and under-segmented objects which contain more than one target object class, both cause negative effects on the predicted class signatures [48].
In this study, the segmentation of the synthetic remote sensing image was realized with the multi-resolution image segmentation algorithm (MRIS) available in Trimble eCognition Developer ® (usually known as eCognition) [49]. The MRIS is a bottom-up region growing technique driven by the following three main parameters: scale, shape and compactness. The most important is the scale parameter that controls the average size in pixels of the resulting image objects (a higher value results in larger objects). Shape and compactness define the object homogeneity and are weighted from zero to one. Shape controls how much the segmentation is influenced by the spectral (color) information versus the object shape information (a higher value means lower influence of color). Compactness also controls the object shape (a higher value means more compact objects but less spectrally homogeneous) [47]. The values of these three parameters were selected using an iterative trial and Considering that the color space transformations generated a mixture of spectral bands (RGB) with synthetic bands, data normalization was important for some classifiers to treat each band equally. For the SVM and KNN classifiers, bands were normalized by using linear scaling to produce a range from zero to one.

Image Segmentation
Segmentation is the process of dividing the image into non-overlapping image objects that are spatially and spectrally homogeneous. As a first and most critical step of OBIA classification [47], the quality of the image segmentation has a significant impact on the classification accuracy. Over-segmented objects which contain only a part of the target object class, and under-segmented objects which contain more than one target object class, both cause negative effects on the predicted class signatures [48].
In this study, the segmentation of the synthetic remote sensing image was realized with the multi-resolution image segmentation algorithm (MRIS) available in Trimble eCognition Developer ® (usually known as eCognition) [49]. The MRIS is a bottom-up region growing technique driven by the following three main parameters: scale, shape and compactness. The most important is the scale parameter that controls the average size in pixels of the resulting image objects (a higher value Remote Sens. 2020, 12, 2599 9 of 19 results in larger objects). Shape and compactness define the object homogeneity and are weighted from zero to one. Shape controls how much the segmentation is influenced by the spectral (color) information versus the object shape information (a higher value means lower influence of color). Compactness also controls the object shape (a higher value means more compact objects but less spectrally homogeneous) [47]. The values of these three parameters were selected using an iterative trial and error process, combined with a visual analysis performed by an experienced operator. In order to find a single segmentation scale that would best separate the four cover classes and based on similar research on OBIA analysis of ultrahigh sub-decimeter UAS imagery [50], we started by fixing the values of the two parameters shape and compactness to 0.1 and 0.5, respectively. Then, the training area was segmented at seven segmentation scales, starting at 10 and ending at 80 by using scale increments of 10 (see Figure 6). The scale 30 was the best because it retained the individual marine litter items (Figure 6c); at a coarse scale (Figure 6b), these items were very often merged into broader image objects such as vegetation debris.
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 19 error process, combined with a visual analysis performed by an experienced operator. In order to find a single segmentation scale that would best separate the four cover classes and based on similar research on OBIA analysis of ultrahigh sub-decimeter UAS imagery [50], we started by fixing the values of the two parameters shape and compactness to 0.1 and 0.5, respectively. Then, the training area was segmented at seven segmentation scales, starting at 10 and ending at 80 by using scale increments of 10 (see Figure 6). The scale 30 was the best because it retained the individual marine litter items (Figure 6c); at a coarse scale (Figure 6b), these items were very often merged into broader image objects such as vegetation debris.

Classifiers and User-Defined Parameters
In the context of detecting MML items from an orthomosaic with ultrahigh resolution (subcentimeter level), the following three supervised, non-parametric and object-oriented machine learning classifiers were evaluated: (1) RF, a decision-tree-based ensemble algorithm; (2) SVM, a statistical learning algorithm; and (3) KNN, an instance-based learning algorithm.

Random Forest
RF is an ensemble classifier that uses a large number of decision tree classifiers to assign a final class of the unknown object by majority voting of all decisions taken at each tree [51]. Each tree is constructed and trained automatically using a random set (in general, two thirds) of the training data (referred to as in-bag samples) and a random set of the variables [43]. The remaining training data (in general, one third) that is not used at each tree, is known as out-of-bag samples and is used in an internal cross-validation technique to provide an independent estimate of the overall accuracy of the RF classification [52]. In order to generate a prediction model, two important and user-defined

Classifiers and User-Defined Parameters
In the context of detecting MML items from an orthomosaic with ultrahigh resolution (sub-centimeter level), the following three supervised, non-parametric and object-oriented machine learning classifiers were evaluated: (1) RF, a decision-tree-based ensemble algorithm; (2) SVM, a statistical learning algorithm; and (3) KNN, an instance-based learning algorithm.

Random Forest
RF is an ensemble classifier that uses a large number of decision tree classifiers to assign a final class of the unknown object by majority voting of all decisions taken at each tree [51]. Each tree is constructed and trained automatically using a random set (in general, two thirds) of the training data (referred to as in-bag samples) and a random set of the variables [43]. The remaining training data (in general, one third) that is not used at each tree, is known as out-of-bag samples and is used in an internal cross-validation technique to provide an independent estimate of the overall accuracy of the RF classification [52]. In order to generate a prediction model, two important and user-defined parameters need to be set, i.e., the number of decisions trees to be generated (ntree) and the number of variables used in each node to make the tree grow (Mvar). The published literature has highlighted that the RF classifier is more sensitive to the Mvar parameter than to the ntree parameter [53]. Since the computational efficiency and the non-overfit properties of the RF classifier allows the error to stabilize before 500 trees are achieved, this number of trees is commonly assigned to the ntree parameter [43,52]. Regarding the Mvar parameter, the square root of the total number of variables is the value commonly used in classification problems [43]. However, in some software implementations (e.g., eCognition), the RF algorithm can be subject to the same parameters as decision trees (DT). These parameters include the following: (i) depth (Dep) to regularize each tree (i.e., to limit the way it grows) preventing overfitting; (ii) minimum number of samples (Ns) that a node must contain to consider splitting; (iii) maximum categories to cluster possible values of a categorical variable; and (iv) the use (or non-use) of surrogates to work with missing data [49]. The following additional eCognition parameters are: (i) active variables (Mvar); (ii) forest accuracy, for the desired level of accuracy, and (iii) termination criteria, which can be set to the maximum number of trees, forest accuracy, or both.

Support Vector Machine
According to the principle of statistical learning theory, the SVM constructs an optimal hyperplane (i.e., a decision surface) that separates the dataset into a discrete predefined number of classes in a way consistent with the training examples [54]. The amount of training data that can be misclassified (e.g., on the wrong side of the hyperplane) is controlled by a positive user-defined parameter C (the cost parameter). A large C value decreases the number of misclassified objects, but can create an overfitted model that may not be adequate to classify new data [55]. When it is not possible to separate the classes linearly, kernel functions are used to project the input data into a high-dimensional feature space that increases the separability of these classes in this feature space [56]. The most commonly used kernel functions in remote sensing are linear, polynomial, and radial basis function (RBF) which is controlled by the gamma (Υ) parameter [52]. Adjusting the value of Υ changes the shape of the decision boundary; smaller values mean a smoother boundary, whereas higher values mean a more complex boundary. In eCognition, the SVM classifier was implemented with the following configurable parameters: (i) C, (ii) kernel function (linear or radial basis function), and (iii) gamma (for RBF only). The optimal values of C and Υ are often determined by using the grid search method (also known as exhaustive search), which uses a large range (search interval) of different pairs of parameters and the one having the highest classification accuracy rate in this interval is selected [57].

K-Nearest Neighbor
KNN is a relatively simple instance-based learning approach. An object is classified based on the weighted average value of the class attributes of its k spectrally nearest neighbor (e.g., k = 5) in the training set [58]. The performance of this classifier is mainly influenced by the key parameter k [55]. In eCognition. The KNN was implemented with only one configurable parameter, k.

Tuning the Primary Classifier Parameters
The strategy used for tuning the classifier was to modify one by one each of the primary parameters, while maintaining the others fixed. For RF, we started with the default values and we modified successively the ntree, depth, and Ns, one parameter at a time. For SVM, we also started with the default values and we modified successively the Υ and C parameters, one at a time. For KNN, only the number of neighbors (K) was tuned, since it was the only implemented parameter.

Performance Assessment
In order to have a valuable reference for evaluating the performance detection of the classifiers, the RGB orthomosaic map was visually screened and manually processed by an operator in the GIS environment. For each object recognized as marine litter item by the operator, the approximated center of marine litter item shapes was marked. For further details about the manual procedure and the type of MML encountered at Cabedelo beach, please refer to Gonçalves et al. [26,27].
The automated detection performances were evaluated with the F-score statistical analysis. The centroid of all the objects labeled as MML by the algorithms were compared to the centroids of MML objects delineated manually in the testing areas. When the distance between the centroids was smaller than 20 cm (setup threshold), the detection was marked as true positive (TP), otherwise as false positive (FP).
Finally, all the marine litter items not detected by the automated algorithm were counted as false negatives (FN). In detail, the precision (P) is a measure of the method to not generate false positives and is defined as: To measure the sensitivity of each method to not generate false negatives we use the recall (R), which is given by: The F-score (F) is a measure of the overall quality of the method and combines the previous P and R metrics as: It also varies between 0% and 100%, where 0% means no correlation between the predicted and observed MML items and 100% means a perfect classification (i.e., a perfect match).

Quantifying Macro Litter Abundance
Quantifying and mapping the abundance of MML on coastal areas is an important issue to understand the dynamics of their deposition, to compute accumulation rates, and to identify spatial distribution patterns over time for improving the planning of clean-up operations [28,29]. In this study, kernel density estimators (KDE) were used for quantifying the MML abundance. First, the polygonal macro litter items detected by a particular OOML method were converted to point features using the centroid of these polygonal features. Then, using a KDE function, these point events (i.e., the centroids of the macro litter items) were transformed into a continuous surface that represented the point density (i.e., the number of MML items per square meter) in a two-dimensional (2D) space [59]. The two key parameters of a planar KDE function are the kernel function and the search bandwidth. However, there is consensus that the choice of the bandwidth that determines the smoothness of the density surface is more important than the choice of the kernel function [60]. In this work, the quartic function was used for estimating the MML density at each cell of the orthomosaic image. In addition, to generate a smooth MML abundance map, a sufficiently large bandwidth of 10 m was chosen.

Georeferencing Accuracy
The geometric accuracy of the SfM-MVS processing workflow was evaluated by using the reprojection error of the tie points (0.2 pix), the RMSE of 5 GCPs (1.0 cm in XY and 2.5 cm in Z), and the RMSE of 2 CHPs (1.5 cm in XY and 3.4 cm in Z). Using these two CHPs we assessed the accuracy of the orthomosaic (0.4 cm in XY) and the DSM (2.8 cm in Z). Overall, the accuracy of the two geospatial products exported from Agisoft MetaShape (the orthomosaic and the DSM with spatial resolutions of 5.5 and 7.6 mm, respectively) are in same level as the NTRIP-GNSS method used for georeferencing, which are suitable for mapping the MML abundance.

Effects of the Tuning Parameters on the F-Score
For each machine learning, the effect of the tuning parameters had a different impact on the F-score (Figure 7). Running the RF with the default parameters (ntree = 50, Dep = 0, and Ns = 0), a mean F-score of 59% was achieved for the two validation areas (A1 and A2). Running the SVM with the default values (Υ = 0 and C = 2), a mean F-Score of 22% was achieved. Running the KNN with the default values (K = 1), a mean F-score of 30% was observed. These findings are in agreement with those presented in [55], highlighting that the default parameters are not appropriate for this work and must be tuned.

Effects of the Tuning Parameters on the F-Score
For each machine learning, the effect of the tuning parameters had a different impact on the Fscore (Figure 7). Running the RF with the default parameters (ntree = 50, Dep = 0, and Ns = 0), a mean F-score of 59% was achieved for the two validation areas (A1 and A2). Running the SVM with the default values (Υ = 0 and C = 2), a mean F-Score of 22% was achieved. Running the KNN with the default values (K = 1), a mean F-score of 30% was observed. These findings are in agreement with those presented in [55], highlighting that the default parameters are not appropriate for this work and must be tuned. Regarding the RF optimization, choosing the default values of depth and Ns, the best F-score was achieved for ntree = 500. Setting the values ntree = 500, Ns = 0 (default), and varying the Dep value, the best F-score was obtained for Dep = 5. Setting the values of ntree = 500 and Dep = 5, the best F-score was obtained for NS = 5. Concerning the SVM, the best tuning parameters were obtained setting firstly the default of C = 2 and varying the values of Υ, obtaining the best F-score for Υ = 0.1. Finally, using Υ = 0.1 and varying the values of C, the best F-score was obtained for C = 5. For KNN, the procedure was straightforward with the best F-score obtained for K = 10 ( Figure 7).

Comparisons of the Classifiers for Mapping Marine Litter
In the previous section, for deriving the optimized parameters of each machine learning classifier, 42 beach cover maps were generated, i.e., 21 for the RF, 14 for the SVM, and seven for the KNN. Details of these classification maps obtained with the optimized parameters of each machine learning classifier are shown in Figure 8. Regarding the RF optimization, choosing the default values of depth and Ns, the best F-score was achieved for ntree = 500. Setting the values ntree = 500, Ns = 0 (default), and varying the Dep value, the best F-score was obtained for Dep = 5. Setting the values of ntree = 500 and Dep = 5, the best F-score was obtained for NS = 5. Concerning the SVM, the best tuning parameters were obtained setting firstly the default of C = 2 and varying the values of Υ, obtaining the best F-score for Υ = 0.1. Finally, using Υ = 0.1 and varying the values of C, the best F-score was obtained for C = 5. For KNN, the procedure was straightforward with the best F-score obtained for K = 10 ( Figure 7).

Comparisons of the Classifiers for Mapping Marine Litter
In the previous section, for deriving the optimized parameters of each machine learning classifier, 42 beach cover maps were generated, i.e., 21 for the RF, 14 for the SVM, and seven for the KNN. Details of these classification maps obtained with the optimized parameters of each machine learning classifier are shown in Figure 8.
Although the classifications maps are visually quite similar, a detailed analysis about the detection performance of the MML class showed significant differences among the three machine learning algorithms (see Table 2). First, we found that for each validation area the detection performance obtained by each machine learning method was very similar. Second, RF registered the highest number of TP with a mean recall of 67%. Third, SVM returned the lowest number of FP, thus, the highest precision (77% on average). Fourth, KNN had the worst performance in terms of F-score, although results did not differ significantly from SVM and RF. Overall, the averaged F-score slightly varied between the three machine learnings techniques, with 65% for KNN, 68% for SVM, and 72% for RF. Although the classifications maps are visually quite similar, a detailed analysis about the detection performance of the MML class showed significant differences among the three machine learning algorithms (see Table 2). First, we found that for each validation area the detection performance obtained by each machine learning method was very similar. Second, RF registered the highest number of TP with a mean recall of 67%. Third, SVM returned the lowest number of FP, thus, the highest precision (77% on average). Fourth, KNN had the worst performance in terms of F-score, although results did not differ significantly from SVM and RF. Overall, the averaged F-score slightly varied between the three machine learnings techniques, with 65% for KNN, 68% for SVM, and 72% for RF. For SVM and KNN the 12 bands are normalized.

Mapping Marine Litter Abundance
The MML class objects detected by each optimized OOML classifier are exported as 2D polygon shapefiles and converted to a point geometry using the centroids of polygons in a GIS environment. Using the planar KDE function available on ArcGIS, each point layer was, then, converted to a density map representing the MML abundance ( Figure 9). In order to evaluate the performance of each classifier to map MML abundance, the orthomosaic was manually screened by an experienced operator to produce a reference dataset of the centroids of the MML items present at the beach. Then, this reference dataset was used to generate the reference MML abundance map which in turn was employed to evaluate, visually and quantitatively, the performance of each classifier. Figure 9 shows

Mapping Marine Litter Abundance
The MML class objects detected by each optimized OOML classifier are exported as 2D polygon shapefiles and converted to a point geometry using the centroids of polygons in a GIS environment. Using the planar KDE function available on ArcGIS, each point layer was, then, converted to a density map representing the MML abundance ( Figure 9). In order to evaluate the performance of each classifier to map MML abundance, the orthomosaic was manually screened by an experienced operator to produce a reference dataset of the centroids of the MML items present at the beach. Then, this reference dataset was used to generate the reference MML abundance map which in turn was employed to evaluate, visually and quantitatively, the performance of each classifier. Figure 9 shows the centroids of the MML items manually screened in the orthomosaic and the MML abundance maps obtained manually and automatically using the three OOML classifiers (RF, SVM, and KNN). All three OOML classifiers returned MML accumulation patterns that were visually consistent with the manual method, identifying two main MML clusters at the beach area. In addition, we found a strong correlation between the manual abundance map and each OOML abundance map, with R 2 (r-square) values of 0.79 (RMSE 0.028 items/m 2 ) for RF, 0.76 (RMSE 0.027 items/m 2 ) for SVM, and 0.83 (RMSE 0.026 items/m 2 ) for KNN. obtained manually and automatically using the three OOML classifiers (RF, SVM, and KNN). All three OOML classifiers returned MML accumulation patterns that were visually consistent with the manual method, identifying two main MML clusters at the beach area. In addition, we found a strong correlation between the manual abundance map and each OOML abundance map, with R 2 (r-square) values of 0.79 (RMSE 0.028 items/m 2 ) for RF, 0.76 (RMSE 0.027 items/m 2 ) for SVM, and 0.83 (RMSE 0.026 items/m 2 ) for KNN.

UAS Type and Flight Mission
In environmental monitoring, the acquisition of aerial images with UAS platforms is commonly performed by the following two categories of systems: multirotor and fixed wing. These two systems have different performances in terms of takeoff capabilities, payload, flight time, cruise speed, and stability of image acquisition. Fixed wing UASs have very good flight endurance, high cruise speeds, and can cover large areas in one flight. However, they require a suitable landing area and skills for taking off and landing them softly to avoid damage to the aircraft and payload sensors. Multirotor UASs are easy to fly, including takeoff, and landing. In addition, the cruise speed can also be as low as necessary. In this work, the multirotor option was chosen due to its low cruise speed and its vertical takeoff and landing (VTOL) capabilities, which allowed the UAS to be deployed at very small areas of the beach. Nevertheless, the short operational time of the multirotor battery limits the flight time

UAS Type and Flight Mission
In environmental monitoring, the acquisition of aerial images with UAS platforms is commonly performed by the following two categories of systems: multirotor and fixed wing. These two systems have different performances in terms of takeoff capabilities, payload, flight time, cruise speed, and stability of image acquisition. Fixed wing UASs have very good flight endurance, high cruise speeds, and can cover large areas in one flight. However, they require a suitable landing area and skills for taking off and landing them softly to avoid damage to the aircraft and payload sensors. Multirotor UASs are easy to fly, including takeoff, and landing. In addition, the cruise speed can also be as low as necessary. In this work, the multirotor option was chosen due to its low cruise speed and its vertical takeoff and landing (VTOL) capabilities, which allowed the UAS to be deployed at very small areas of the beach. Nevertheless, the short operational time of the multirotor battery limits the flight time and restricts the extent of the beach area to be surveyed. In this study, one battery allowed an operational flight time of about 27 min. Using the current flight planning settings (Section 2.2), we were able to scan a beach area of~2 ha (370 × 65 m) with eight parallel flight lines. In this context, to extend the scanned area, it has been suggested to fly multi-battery missions using specialized UAS flight mission and autopilot software (e.g., DroneDeploy and DJI MapPilot) with a resume feature [36]. In addition, the recent availability of off-the-shelf UASs incorporating onboard RTK-GNSS sensors, has generated high accuracy and precise geospatial products [61], removing the time-consuming framework steps of preflight GCP and CHP displacements and the post-flight acquisition of their coordinates.

Object-Oriented Machine Learning Methods
In contrast to previous studies conducted by different authors [22,23,62], in this work, the MML detection on the orthomosaic with ultrahigh spatial resolution was preferable to the use of single UAS-based imagery, because the subsequent generation of a georeferenced abundance map was a straightforward step. The OBIA classification approach based on the proposed nomenclature was efficient for extracting the MML class and proved to be well suited for transferring the process to other orthomosaic areas. In fact, the mean values of the 12 composite bands collected for all the object classes on the training area could be applied directly on the other orthomosaic areas without any editing. Using a tiling and stitching approach implemented in the server version of eCognition, the proposed classification approach could be used to detect MML items over larger orthomosaic areas, as long as the beach substrate remained similar to the training area where the class statistics were collected. The comparison among the three OOML classifiers showed that RF obtained better results than SVM and KNN. However, when we compared the automated abundance maps with the one produced manually, the map generated by the KNN classifier achieved the best correlation factor.
Once color was the key element for detecting MML, colored items similar to the sand color were not detected; they were included in the sand class. In this context, it was expected that the use of the volume of the MML items derived from the DSM would decrease the number of false positives [63]. However, the heights of the MML objects were not expressive, and hence, the DSM was not used for this purpose. Inaccurate classification was also due to the low payload capabilities of the low-cost UAS, since the images were acquired by an inexpensive off-the-shelf camera. Its low radiometric quality and low spectral resolution were significantly influenced by the lighting and atmospheric conditions. To mitigate the impact of lighting and atmospheric conditions on the accuracy of litter detection and to maximize the contrast between the sand and the colored litter items, it has been suggested to fly in similar geometric lighting and sunny conditions [31,64]. Future work should explore the feasibility of using multispectral sensor products for automatically categorizing marine litter items, which are expected to have a unique reflectance response based on their color and material (e.g., [65]).
It is worth mentioning that we were interested in mapping only one land cover beach class (i.e., the MML class). The use of a supervised multiclass classifier may not necessarily be an appropriate approach because these classifiers require considerable effort to produce an exhaustive and mutually exclusive training dataset, which should include all classes present in the area of interest [66,67]. However, when only two classes were considered (litter and non-litter), the labeling accuracy of the RF classifier was decreased significantly, i.e., approximately 35% (F-score 44%).
In comparison with previous works [22,23,26], the OOML classifiers presented here were simpler to use for inexperienced analysts, achieved a similar (sometimes a slightly better) F-score, and did not show any particular issue regarding the shadows and the transparent objects. While the absolute segmentation and OOML parameters could have different values for different spatial resolutions (different flying heights) and in different beaches with different substrate, we expect that the proposed classification workflow would offer reliable guidance for selecting and tuning these parameters.

Conclusions
This study showed that a consumer-grade UAS combined with SfM-MVS methods can be used effectively for generating an ultrahigh resolution orthomosaic (sub-centimeter level) and for monitoring sandy beaches polluted by MML. The low spectral resolution of the orthomosaic was overcome by combining four color spaces (RGB, CIE-Lab, HSV, and YCbCr) with an OBIA approach which proved to be highly suitable for extracting marine MML objects from ultrahigh resolution imagery.
After being optimally tuned, the three compared object-oriented machine learning (OOML) classifiers, namely random forest (RF), support vector machine (SVM), and k-nearest neighbor (KNN), were shown to have quite similar performances (F-score) for detecting colored MML objects. Although the RF had more parameters to be tuned, and therefore appeared to be more complex to optimize, the number of trees (ntree) was the most influencing parameter. On the contrary, for the KNN, which had one parameter to be tuned, the F-score was slightly worse than the other two machine learning classifiers. Nevertheless, the MML abundance map generated from KNN was well correlated with the abundance map produced manually. This suggests that this OOML classifier can be used effectively by nonexpert remote sensing analysts in a simple MML abundance map framework.
The synergistic use of small UAS with OOML classifiers is a major step towards cost-effective and efficient operational programs for monitoring MML abundance and detecting hotspots on sandy beaches, as they can be easily implemented by local, municipal, and national environmental agencies.
Future research should focus on the use of one-class classifiers with minimal labeling effort to generate abundance maps from an ultrahigh resolution orthomosaic obtained by a consumer-grade UAS incorporating RTK-GNSS sensors.