Designing Unmanned Aerial Survey Monitoring Program to Assess Floating Litter Contamination

: Monitoring marine contamination by ﬂoating litter can be particularly challenging since debris are continuously moving over a large spatial extent pushed by currents, waves, and winds. Floating litter contamination have mostly relied on opportunistic surveys from vessels, modeling and, more recently, remote sensing with spectral analysis. This study explores how a low-cost commercial unmanned aircraft system equipped with a high-resolution RGB camera can be used as an alternative to conduct ﬂoating litter surveys in coastal waters or from vessels. The study compares different processing and analytical strategies and discusses operational constraints. Collected UAS images were analyzed using three different approaches: (i) manual counting (MC), using visual inspection and image annotation with object counts as a baseline; (ii) pixel-based detection, an automated color analysis process to assess overall contamination; and (iii) machine learning (ML), automated object detection and identiﬁcation using state-of-the-art convolutional neural network (CNNs). Our ﬁndings illustrate that MC still remains the most precise method for classifying different ﬂoating objects. ML still has a heterogeneous performance in correctly identifying different classes of ﬂoating litter; however, it demonstrates promising results in detecting ﬂoating items, which can be leveraged to scale up monitoring efforts and be used in automated analysis of large sets of imagery to assess relative ﬂoating litter contamination.


Introduction
Over the last few decades, marine litter has increasingly captured the attention and concerns of scientists, decision makers, and civil society [1,2].The persistent nature of plastic materials and their increasing global presence in both aquatic [3] and terrestrial ecosystems [4] has resulted in the conception of a new era-"The Plasticene" [5].The incessant and growing delivery of plastic litter and debris to our oceans has become one of the most significant forms of marine pollution [6,7].While there are bans on single-use plastics and improved recycling practices (e.g., usage of straws), the COVID-19 pandemic has resulted in an immediate increase in personal protection equipment (e.g., discarded face masks), further polluting the aquatic environments [8,9].Indeed, marine litter has become critical to global sustainability, as it affects marine ecosystems and human health [10][11][12].
studies is constantly growing.Still, the open challenge in automating imagery analysis is to reduce labor time in identifying and classifying target objects, and ultimately, have a better understanding of the distribution and sources of marine litter items.However, when compared to human inspection and annotation, automated object detection and classification in imagery by AI often lacks flexibility in contextual interpretation, and rely on well-established predetermined object categories.In addition, the computing power and the technical skills required to implement automated object detection based on AI can be considerably more demanding than those required for human-supervised imagery annotation.
The growing availability and the development of inexpensive commercial off-the-shelf (COTS) drones and other advanced unmanned aerial systems (UASs) are making hightech aerial imagery platforms more accessible [47][48][49].The use of custom-designed UASs is becoming increasingly popular for recreational, industrial, topographic surveying, monitoring, and research purposes due to their relatively low cost, operational flexibility, and simplicity [50][51][52][53][54].With low-altitude flight, UASs produce aerial imagery with higher resolution than that achieved by current satellites or by manned aerial platforms [37,55,56].Additionally, most modern UASs include automated flight capabilities, pre-planned mission controls, highresolution camera systems, and geotagged logs that enhance their operational capabilities and their range of applications [57,58].
The use of UAS-based remote sensing has already demonstrated a variety of research applications in coastal areas [47,49,51,[59][60][61][62][63][64][65][66][67][68][69][70][71].Operation flexibility and simplicity make UASs promising platforms for developing remote sensing protocols and monitoring litter using systematic approaches.There have been a growing number of studies focusing on the use of UAS-based remote sensing and AI to monitor litter pollution; however, most of them have focused on beached litter [51,54,65,67,70,[72][73][74][75], and only a few have explored their use for floating litter [38,46,73,76,77].A recent critical review on beached litter survey studies using UAS remote sensing [65] summarizes the findings of recent studies and outlines basic guidelines for developing and implementing monitoring programs.Despite some of their conclusions being transferable to floating litter monitoring, these studies do not account for the dynamic nature of open waters, the lack of matching references to construct orthophotos, and differences in image background contrasts and complexity.Other studies have focused on the use of UAS aerial images to monitor floating litter using color-based image processing [78], deep learning [45,79], and other remote sensing analytical techniques [39].However, these studies are descriptive of technical advances using a single approach or focus on comparing different AI algorithms and classifiers, lacking an overall evaluation of how to implement a floating litter monitoring program that relies on UAS aerial imagery, and missing a critical comparison of different image analysis strategies and options.As such, this study fills some of the current gaps by outlining some of the specificities in floating litter monitoring, including UAS operational constraints, a comparison of MC, PBD, and ML image analysis, and an overall guideline to design and implement a monitoring program.
Detecting and monitoring floating litter using aerial photography and UAS-based remote sensing pose specific challenges.On land, structure from motion photogrammetry uses unique and discrete references in overlapping images to construct a mosaic and estimate position, slope, and other topographic features along the survey area [80,81].Over open water, the lack of discrete or unique reference points, the homogeneity of images, and the dynamic surface make it virtually impossible to reconstruct orthophotomosaics systematically [82,83].In theory, one could fly at high enough altitude to simultaneously include land features (to include discrete matching points) and coastal waters.However, this is generally not a practical solution as the survey area will be greatly constrained to near-shore areas due to safety, resolution would reduce with altitude, and due to regulations that prohibit or limit the maximum altitude for UAS operations [84,85].
Tackling limitations in photogrammetry mosaic generation from overlapping aerial images over the ocean, we explore the use of a COTS UAS platform to collect multiple (non-overlapping) individual aerial images to assess floating litter contamination.Leveraging specific flight altitudes and information on the camera field of view and sensor dimensions, it is possible to estimate the surface area for individual images.This strategy allows one to conduct aerial surveys, collecting multiple images that can be processed and analyzed independently to produce overall assessments over meaningful spatial extents (e.g., 1 km 2 ).
In order to assess the feasibility of such a strategy for floating litter monitoring, we designed an experimental trial where floating litter items were deployed and multiple individual aerial images were collected with a UAS to compare three imagery processing and analysis strategies: (i) manual counting (hereinafter abbreviated as MC), an image inspection with object supervised identification and annotation; (ii) pixel-based detection (hereinafter abbreviated as PBD), an automatic color detection of pixels from floating items; and (iii) machine learning (hereinafter abbreviated as ML), an automated object detection and classification.The general objective of this experimental trial was to provide answers to two main questions: (i) can floating litter items be detected from RGB aerial imagery collected by a UAS? (ii) are automated image processing and analysis strategies practical reliable solutions that can replace human image inspection and litter items classification?Ultimately, this study assesses the operational advantages and disadvantages of different aerial imagery processing strategies for floating litter item detection and provides guidelines for optimising and implementing floating litter monitoring programs that rely on UASbased remote sensing using low-cost, COTS quadcopters equipped with high-resolution RGB cameras.

Data Collection
Conducted in Madeira Island (Portugal) coastal waters, this study was developed to assess the feasibility of using UAS aerial imagery for detecting and monitoring floating litter by designing an experimental trial with dummy floating litter objects.Preliminary test flights were conducted (using a DJI Phantom 2 Vision+ and a DJI Mavic 2 PRO) from land and vessels to test and assess UAS flight capabilities (e.g., wind speed limit, range, flight time) as well as optimal take-off and landing techniques and optimal imagery sensor settings (compiled in Supplementary Materials S1).Once flight capabilities and operations were tested, an experimental trial flying a Mavic 2 PRO quadcopter from the sea vessel and using selected "dummy" litter items was carried out.
During the experiment, common floating litter items were deployed from a boat while flying a UAS at 10-30 m of altitude, set up to collect images (with 5472 × 3648 px) of the sea surface area every 10 s where litter items had been deployed (Figure 1A).There was a total of 28 objects, and the majority of the items were made of floating plastic.These objects were categorized into nine classes: Cleaner Bottles and Containers (one item); Drink Bottles-Green (six items); Drink Bottles-Transparent (two items); Drink Bottles-Large (>5 L) (two items); Floating Fishing Gear (seven items); Other Containers (one item); Other Floating Debris (no items); Plastic Bags (five items); and Tetra Pak (four items) (see Supplementary Materials S2, Table S1).As litter items scattered across the water, the vessel was repositioned to be outside of the image frame (Figure 1A), and the UAS position was adjusted to capture as many items as possible inside the live feed frame.Deployed objects naturally drifted at different speeds and directions, for which after some hovering time collecting imagery (5-10 min), the UAS (Figure 1B) was recovered, and all litter items were successfully collected.The procedure was repeated using different exposure settings, specifically, a normal exposure (EV 0) and a low exposure (EV −3), to produce two sets of images a Blue Set and a Dark Set, respectively (Figure 1C,D).The two image sets were used to compare object detectability under two contrasting.The collection of imagery with low exposure values (EVs) was included to enable a major reduction in light backscatter on the sea surface (i.e., homogenizing the background), while maintaining the ability to detect floating objects by visual inspection and based on RGB profiles.
For selected individual images, ground sampling distance (GSD) ranged between 0.26 and 0.7 cm/px, with estimated areas of 117 to 988 m 2 , respectively.The two collections wer compiled and labeled for individual image analysis using three different strategies to assess floating litter contamination: (i) a visual inspection with manual annotation o detected litter items; (ii) a pixel-based detection color analysis; and iii) the use of CNN fo automated object detection.A total of 148 individual images, with objects and no vessel in the frame, were selected for analysis, which were further divided into 2 different collections of individual images: a "Blue Set" (Figure 1C) with 74 images normally exposed, with a blue background and normal sun glint and backscatter; and a "Dark Set" (Figure 1D) with 74 underexposed images, with a dark background and reduced sun glint and backscatter.For selected individual images, ground sampling distance (GSD) ranged between 0.26 and 0.7 cm/px, with estimated areas of 117 to 988 m 2 , respectively.The two collections were compiled and labeled for individual image analysis using three different strategies to assess floating litter contamination: (i) a visual inspection with manual annotation of detected litter items; (ii) a pixel-based detection color analysis; and iii) the use of CNN for automated object detection.

Comparison of Analytical Procedures
Keeping a rationale of assessing the pros and cons of different analytical and classification approaches, in order to design floating litter monitoring programs with UAS-based remote sensing that are feasible in different conditions, training, and available resources, we compared the three methods by considering (i) the average time required to inspect and process each image; (ii) the ability to adequately assess floating litter contamination; and (iii) the skills and logistical requirements for implementing a monitoring program using each method.
Visual inspections and annotations for single images were considered as reference data to assess and compare the performance of automated methods.Simple descriptive statistics were applied to compare the outputs of the three methods tested, including time for processing, correlations, and standard metrics to assess deep learning classification performance.

Visual Inspection and Manual Classification
Two independent annotations were performed: one to identify and count all floating objects, labeling them with an all-inclusive category "floating litter item" during annotation; and a second one where floating objects were classified and labeled using nine different categories (see Supplementary Materials S2, Table S1).All images from both Blue and Dark datasets were visually inspected and annotated using DotDotGoose (DotDotGoose Available online: https://biodiversityinformatics.amnh.org/open_source/dotdotgoose/(accessed on 25 July 2022)) [86].For each image, all objects were identified/classified, and collected data were exported as .CSV files and compiled into a summary table that included information on image file identification, image dataset (Blue or Dark), number of floating items, number of items per category, the time for inspection and annotation using a single object class, and the time for inspection and annotation using the nine classes of floating items.

Color-and Pixel-Based Detection Analysis
Images of both Blue and Dark datasets were compiled for analysis using pixel color differences to estimate overall floating debris in each image using a color-and pixelbased analysis [78,87] to detect pixels with different color profiles than the background (e.g., seawater color) (see Supplementary Materials S2, Figure S1).The method consists in generating an image of the color difference between the debris and surrounding water in the CIELuv color space and detecting the debris pixels from the color difference image [78].The color difference is expressed by the Euclidean distance between two points in the CIELuv color space [78].The fundamental steps for extracting the "debris pixel" from the color difference images were as follows: (i) generating an image which is smoothed from each original image using the median box filter with a 200 × 200 px window; (ii) computing the color difference between the denoised and smoothed images in the CIELuv color space converted from the RGB color space; (iii) extracting the pixels of floating macro debris using an appropriate constant threshold value.In this study, the threshold value was set at 60 by trial and error during empirical tests.The percentage of "debris pixels" was calculated for each image and included on the compiled summary table.Performance was assessed using linear regressions, assuming that automated selection of pixels was proportional and correlated with the number of litter items in each image.

Machine Learning for Automated Object Detection and Classification
Blue and Dark image sets were also used in automated object detection and classification using state-of-the-art CNN architecture combining MobileNetV2 [88] with Sigle-Shot Detection (SSD) algorithm [89].All images from each dataset were visually inspected and manually annotated with Supervise.ly,an online image annotation tool dedicated to model training (Supervisely: Unified OS for Computer Vision Available online: https://supervise.ly/(accessed on 25 July 2022)).Target litter items were identified with bounding boxes and classified within the nine litter categories (see Supplementary Materials S2, Table S1) previously established.
Two models were trained to classify floating objects into the nine pre-established categories (see Supplementary Materials S2, Table S1): one using the Blue Set (normally exposed imagery), and a second using the Dark Set (underexposed imagery).The latter used a total of 4041 images, while the former used 7597 training images after applying traditional data augmentation techniques (flip, noise, blur) [90].All 74 images from both Dark and Blue datasets (with original full-size resolution) were used for model inference.Training and testing procedures involved single-and multiclass identification using object detection, based on ground truth annotation (bounding boxes made by the research authors as annotators) and the bounding boxes predicted by the models.Both models were trained using NVIDIA Tesla P100 PCI-E 16GB GPU on Google Collab, using TensorFlow 1.15.2, in 12 h.Model training was with 200 k epochs using default hyperparameters, ReLU6 activation function and initial learning rate of 0.004.For performance, a batch size of 12 images was used with down-sampled imagery of 300 × 300 px.Overall model performance was assessed by computing model precision (P), recall (R), and F1 score (F1) [91].For each model, a stopwatch was used to assess the time which fir data upload (ground truth imagery, annotations, trained model), runtime of the model inference script using Jupyter Notebook, computation resource allocation time on the free GPU instances, and results download time.For each image, information from both models (i.e., number of items per category, object classification time) was included in the compiled summary table (see above) for comparison and analysis.Performance was assessed using linear regressions, assuming that the number of overall items classified as litter objects would be proportional and correlated with the number of manually labeled items.Additionally, average overand underestimations of ML automated classification for each of the nine categories were computed for each of the image sets (i.e., Blue and Dark sets), in order to assess the ML ability to correctly classify floating items in each of the nine selected categories.Standard deviations were also calculated to assess variance in differences between reference data (i.e., number of items manually classified) and the number of items detected by ML for each of the nine categories (see Supplementary Materials S2, Figure S2).

Performance Assessment
Visual inspections and manual classification were assumed to have 100% detectability, and were used as reference data to assess the performance of automated approaches.An inspection of a linear regression using the number of identified objects in each image illustrates that the color difference selection of pixels from normally exposed imagery was inadequate in estimating floating litter contamination, with poor correlation with the actual number of items in each image (Figure 2, top-left panel).The methods performance improved when using underexposed imagery, with less backscatter and sun glint (Figure 2, bottom-left panel).However, it still lacked a strong linear correlation with the number of floating items in each image, as one would expect if the automatically selected pixels corresponded to floating debris.Automated floating object detection using ML had a good overall performance in matching human detection and labeling, especially with normally exposed imagery (Figure 2, top-right panel).Assuming the null hypothesis that two samples (MC and ML for Blue dataset) have equal variances, a two-tailed t-test using the critical value 1.9763 did not show statistical significance (p > 0.05).Such a result indicates that the machine learning method is performing to a similar level as the manual counting method when predicting the Blue dataset.Overall, the lack of strong collinearity with the number of floating items renders the color difference detection of debris pixels from RGB imagery an unreliable method for estimating contamination by floating debris.
that two samples (MC and ML for Blue dataset) have equal variances, a two-tailed t-test using the critical value 1.9763 did not show statistical significance (p > 0.05).Such a result indicates that the machine learning method is performing to a similar level as the manual counting method when predicting the Blue dataset.Overall, the lack of strong collinearity with the number of floating items renders the color difference detection of debris pixels from RGB imagery an unreliable method for estimating contamination by floating debris.ML automated object classification performed differently in discriminating different litter categories across both datasets (Figure 3).Using the manual counts as a reference, the automated object classification using ML had an average under-and overestimation that ranged between −2.95 and 3.32 objects in Blue Set (Figure 3, left panel) and ranged between −4.18 and 7.43 in the Dark Set (Figure 3, right panel).Unlike the performance of pixel-based detection of marine debris (Figure 2), automated classification of floating items overall performs better in normally exposed images (Blue Set) than in underexposed images (Dark Set).An inspection of the average under-or overestimation values and respective standard deviations (Figure 3) illustrates that in normally exposed imagery ML automated object classification performed differently in discriminating different litter categories across both datasets (Figure 3).Using the manual counts as a reference, the automated object classification using ML had an average under-and overestimation that ranged between −2.95 and 3.32 objects in Blue Set (Figure 3, left panel) and ranged between −4.18 and 7.43 in the Dark Set (Figure 3, right panel).Unlike the performance of pixel-based detection of marine debris (Figure 2), automated classification of floating items overall performs better in normally exposed images (Blue Set) than in underexposed images (Dark Set).An inspection of the average under-or overestimation values and respective standard deviations (Figure 3) illustrates that in normally exposed imagery (Blue Set), the classes of Cleaner Bottles and Containers, Green Drinking Bottles, Floating Fishing Gear, and Plastic Bags were underestimated, whereas Transparent Drinking Bottles, Other Containers, and Other Floating Debris were overestimated (Figure 3, left panel).The categories Cleaner Bottles and Containers and Large Drink Bottles were the ones more accurately detected with low average difference and low standard deviations, whereas the category Other Floating Debris appears to be the most challenging, with an average overestimation of 3.22 and the highest standard deviation.In underexposed imagery (Dark Set), ML had better success in correctly classifying items in the categories Cleaner Bottles and Containers and Transparent Drink Bottles, underestimating these with low average differences and relatively low standard deviations (Figure 3, right panel).Floating items in the categories Drink Bottles-Large (>5 L), Other Containers, and Other Floating Debris were significantly overestimated with relatively high standard deviations, illustrating a poor performance of ML in classifying items in these categories in underexposed imagery.
Bottles and Containers and Transparent Drink Bottles, underestimating these with low average differences and relatively low standard deviations (Figure 3, right panel).Floating items in the categories Drink Bottles-Large (>5 L), Other Containers, and Other Floating Debris were significantly overestimated with relatively high standard deviations, illustrating a poor performance of ML in classifying items in these categories in underexposed imagery.
Average differences and respective standard deviations illustrate to what degree ML can accurately detect and classify a floating object.With lower averages and variances, ML has a better overall performance in classifying floating litter items in normally exposed images (Figure 3).However, it is noteworthy to mention that, in some specific categories (e.g., Transparent Drink Bottles, Floating Fishing Gear), the use of underexposed imagery outperformed the use of normally exposed image sources.

Comparing Processing Times and Requirements
One additional relevant aspect in automation of litter detection and/or classification relates to processing times (Table 1).On average, visual inspection and user annotation took 26 s to detect e and 52 s to classify all visible objects in a single image.Interestingly, user annotation was slightly faster when inspecting underexposed images (Figure 4).Color-and pixel-based detection had comparable processing times, averaging 43 s to process normally exposed images and 26 s to process underexposed images (Figure 4).As expected, image processing times for ML object classification and detection using deep learning were significantly longer than remaining methods (i.e., visual inspection and Average differences and respective standard deviations illustrate to what degree ML can accurately detect and classify a floating object.With lower averages and variances, ML has a better overall performance in classifying floating litter items in normally exposed images (Figure 3).However, it is noteworthy to mention that, in some specific categories (e.g., Transparent Drink Bottles, Floating Fishing Gear), the use of underexposed imagery outperformed the use of normally exposed image sources.

Comparing Processing Times and Requirements
One additional relevant aspect in automation of litter detection and/or classification relates to processing times (Table 1).On average, visual inspection and user annotation took 26 s to detect e and 52 s to classify all visible objects in a single image.Interestingly, user annotation was slightly faster when inspecting underexposed images (Figure 4).Colorand pixel-based detection had comparable processing times, averaging 43 s to process normally exposed images and 26 s to process underexposed images (Figure 4).As expected, image processing times for ML object classification and detection using deep learning were significantly longer than remaining methods (i.e., visual inspection and color-and pixel-based detection).Interestingly, and similar to other methods, processing times were faster when dealing with underexposed images (Figure 4).Table 1.Summary comparison of different performance indicators for the use of manual counting, pixel-based detection, and machine learning to detect and assess floating litter contamination using UAS-based remote sensing to collect aerial imagery (Blue and Dark Sets).Legend: µ, average; σ, standard deviation; Precision, the ratio of the correctly segmented classes that are positive for each class; Recall (sensitivity), ratio of the correctly classified positive classes; F1, harmonic mean indicating the extent of the alignment of the predicted boundary with the ground truth boundary, evaluates the balance between precision and recall values.For Precision, Recall, and F1, the higher the value, the better the performance.color-and pixel-based detection).Interestingly, and similar to other methods, processing times were faster when dealing with underexposed images (Figure 4).

Table 1.
Summary comparison of different performance indicators for the use of manual counting, pixel-based detection, and machine learning to detect and assess floating litter contamination using UAS-based remote sensing to collect aerial imagery (Blue and Dark Sets).Legend: µ, average; σ, standard deviation; Precision, the ratio of the correctly segmented classes that are positive for each class; Recall (sensitivity), ratio of the correctly classified positive classes; F1, harmonic mean indicating the extent of the alignment of the predicted boundary with the ground truth boundary, evaluates the balance between precision and recall values.For Precision, Recall, and F1, the higher the value, the better the performance.

Manual Count
Pixel Base Detection Machine Learning

Discussion
Based on a custom-designed experimental trial, this case study assesses the detectability of floating litter items from aerial imagery, appraises the advantages and disadvantages in different imagery processing strategies, and sets guidelines for optimizing and implementing floating litter monitoring programs relying on UAS-based remote sensing.
There are numerous challenges in operating UAS for systematic monitoring of the sea surface; namely, the unpredictability of weather conditions (i.e., wind, clouds); the limited flight operations, i.e., geozones and maximum radio range operating of the drone; the risk of losing the drone, i.e., experience in piloting UAS from vessels; how the varying light (i.e., sun glint, cloud shadows, sea conditions (e.g., waves)) can affect the image collected, subsequently affecting image processing and litter detection.Flat ocean conditions offer a homogeneous background where floating items are easily identified [92].One important factor that can be compounded by sea conditions is related to lighting and light backscatter [83].Ideal light conditions include clear skies, during a period where the sun is at a low angle (i.e., high angles increase backscatter for nadir imagery), and with sea conditions flat without bright elements (i.e., white caps, foam, waves, and ripples) that will also influence light backscatter on the water surface.Determining light conditions by choosing the time of day, from 8 to 10 am and/or 16 to 18 pm, plus the direction of flight paths helps to enhance image quality by minimizing the sun reflection and backscatter over the sea [83].In turn, this minimizes the spots of undefined shapes that create visual noise and hamper manual and automated analysis.Overcast conditions, high sun, waves, and floating items partially submerged in the water column can easily decrease image quality for object detection and potentially lead to the need of discarding a large portion of each image.The use of the multispectral sensors can reduce some of the negative impacts of poor conditions, as some channels generally produce outputs that are less sensitive to light backscatter over the sea surface (i.e., infrared, near-infrared) [36,77].Thermal sensors can also be adequate to detect large objects that have a large proportion that is air-exposed [93]; however, they are typically unable to detect objects that are frequently submerged and cooled by waves and sea spray.Another constraint using UAS-based remote sensing relates to their flight range and the compromise between surveyed area and image resolution.Operational range can greatly vary depending on the UAS.Fixed-wing drones have the advantage of being able to cover larger areas (more battery and longer radio signal range) [94].However, they tend to require specific conditions for taking off and landing, which makes them less suitable for monitoring surveys from small vessels.
Similar to other studies using UAS aerial imagery to monitor litter, flight parameters selected for this case study have influenced the final result and the detection capability, since flight height, light exposure, and even the orientation of the camera in relation to the light source (among other factors) affect the image quality and the perception of some physical characteristics of the objects to be classified, including (i) color reflectance (translucid vs. opaque objects or reflected spectral profile of the material); (ii) the definition of object contours (well-defined vs. blurred); and (iii) the "size" of objects (number of pixels).As such, most parameters were not variable, with the exception of altitude, which varied between 10-30 m (which provides a range of GSD from 0.26 to 0.7 cm/px and a range of surface area covered from 117 to 988 m 2 ), ensuring that objects were visually identifiable in all selected images.Exposure was purposely set to capture normally exposed images (EV set to 0) and underexposed images (EV set to −3) to enable sun glint and backscatter reduction, and assess whether it affected object detectability.The main reason for carrying out this experiment with two image sets using different light exposures-Blue Set vs. Dark Set-was to understand how differences in exposure and contrast affect the reliability of the automated pixel selection and object detection models.Indeed, one of the biggest problems with nadir images collected over the ocean is the glare from sunlight backscatter, resulting in "specs" of high reflectance that can be misidentified as white floating objects [95].The use of these contrasting exposure settings were expected to have a major influence on colorand contrast-based analysis and identification of floating items due to the homogenization of the background (i.e., seawater) in underexposed images [36,78].
The conducted experimental trial also allowed to ascertain how well two different autonomous analytical methods (i.e., color-and pixel-based detection and ML object classification) could assess floating litter contamination in comparison to human-supervised annotation of aerial images.Theoretically, the pixel-based detection method would allow one to know the percentage of general contamination of a given area based on the number of "debris" pixels.This method could be useful in scenarios where it is necessary to find places of concentration or sources of contamination by marine litter in a large volume of images and/or with different areas.However, our findings illustrate that the use of color difference "debris" pixel selection to detect floating litter still requires significant improvement.Sun glint and wave crests greatly affect the accuracy of this method, and result in numerous false positives and, even though it performs better in underexposed images, the computed correlation between selected pixels and number of litter items was still rather low (Figure 2).Ultimately, the use of color difference debris pixel detection requires additional optimization and development to reduce error; namely, by integrating additional multispectral data; hyperspectral data; and/or by reducing false-positive pixels in each image by masking all items by bounding boxes that are automatically detected by the machine learning technique (see Supplementary Materials S2, Figure S2).
Similar to other studies [45,73,76,79], the automated classification of floating objects using ML in this case study also showed promising results in detecting floating items (Figure 2).However, similar to previous studies, it showed mixed results in accurately discriminating different types of floating items (Figure 3).The categories Drink Bottles-Green and Plastic Bags had comparable underestimation in both datasets.The similarity of the light reflectance vs. the spectrum between the blue sea and the translucent green of the bottles could have hampered the detection and classification of these items [34,35,96,97].In the class Plastic Bags, as they present different shapes in each image, automated detection may have been negatively affected, as the shape of an object can be a relevant criterion for classification success [98].The flexibility and mutable shape of Plastic Bags create a handicap for the automatic detection of this item class.Other Containers and Other Floating Debris categories, in both datasets, were overestimated with many false positives being classified within these two categories.Other Containers overestimation may be an artefact from the use of a single object within this category, a black container that would float under the sea surface.The use of a single object, combined with the lensing effect of water over the partially submerged object, may have contributed for the misclassification of shadows and areas of images with high contrast as an object.Other Floating Debris category, was a category created to enable the algorithms to identify floating objects that could not be classified as one of the existing categories.These generated false positives, mostly produced by high-reflectance backscatter that produces white "false objects" at the sea surface.Transparency of the objects classified as Drink Bottles-Transparent has likely influenced the overestimation of these objects in the Blue Set, as differences in light and color profiles are reduced by transparency.The use of low-exposure images (i.e., Dark Set) appears to produce some mitigating effects, producing a lower underestimation than the overestimation produced with imagery and training sets from the Blue Set.The comparison in performance and accuracy between the Blue and Dark Set also highlights relevant findings concerning the type of object vs. the environment in which it exists.For some object categories, such as Cleaner Container Bottles and Fishing Gear, using low-exposure, high-contrast images in training and analysis seems to perform better and produce lower errors (under or overestimation) than normally exposed imagery.These findings suggest that further research is needed to combine the use of multiple sensors producing contrasting exposure images or multiple spectral data to increase the accuracy in discriminating different objects and materials.
One additional and essential indicator for determining the adequate method and analytical approach to use is the average time required for each image to be processed when using different methods.Despite the short times required for the manual counting method (visual inspection and classification of floating objects), this method requires human supervision through the whole process with 100% dedication of the user, which may hamper scaling up imagery collection and efforts.The dedication and time spent by users will be proportionally and constantly increased by the number of images to process.However, despite being a tedious and repetitive task, the level of expertise required by the user is minimal, as it only needs the user to inspect each image and tag visible litter items.Color difference pixel selection average processing times are comparable to those required for human-supervised annotation (Figure 4); however, it requires more expertise from the user (i.e., advanced image processing and familiarity with programming) and it lacks accuracy in the automated outputs (Figure 2).In the machine learning method, the model requires considerably more time to provide information on the number of different objects than that taken by a user to visually classify an image and tag the multiple objects (Figure 4); however, an important consideration is the fact that the classification process can mostly run with no human supervision required.Indeed, AI algorithms have already been used to automate marine litter recognition from aerial imagery, where the common algorithms applied are typically based on random forest algorithms [64,68,99] or deep learning approaches [45,46,79,98].The main factor that encourages the development of AI algorithms for automatic identification of floating marine litter is that, after the first initial effort of classification and validation, it is a process that can be replicated for future studies without human supervision, which creates less time-consuming workflows.The time and effort dedicated, the knowledge, and the skills required to optimize and routinely apply machine learning is often compensated for by it being a single initial effort to acquire knowledge and train the model.After this laborious process, the model is continuously self-training, and is fed by the images that the user asks the model to use, that is, if it keeps the classification classes constant.This is one of the most significant differences to be highlighted between the compared methods.

Conclusions
Overall, the obtained analysis results suggest that UAS remote sensing can be effectively used for floating litter monitoring in two ways: (i) by visually inspecting each image and identifying or classifying images, or (ii) using deep learning to detect floating items without classifying them.Our findings suggest that ML can be used for processing large numbers of images autonomously for assessing contamination with acceptable error; however, when implementing a floating-litter-dedicated monitoring program, it is important to consider the level of output detail required.The European Marine Strategy Framework Directive and OSPAR litter monitoring standards require monitoring activities to report litter items classified according to extensive standardized lists.The highly detailed categories of these standard lists are often challenging to discriminate, making current automated classification a target for the future.Automated classification will most certainly become more accurate and reliable as research and development progress, and with the introduction of multicamera and multispectral systems, optimizing model training and creating a multistep workflow for the classification.Many of the constraints regarding the use of UAS-based remote sensing to detect, map, or monitor litter contamination are related to the aerial imagery processing requirements.Georeferenced individual images or mosaics collected with regular RGB cameras or with additional channels, require processing and analysis to manually or autonomously detect litter or assess contamination levels.The careful visual inspection of imagery and manual annotations is the simplest solution, but more laborious, especially if dealing with large numbers of images and in long-term programs.Opposingly, automated object detection can potentially reduce user interaction needs, but typically requires higher computational power and programming expertise.In fact, the success of any monitoring program relying on remote sensing will greatly depend on the analysis process and the associated operational costs, processing times, accuracy, and reliability, which can be substantiated by the findings of this case study.
Monitoring programs that aim to use UAS-based remote sensing in the near future should also consider the frequency and total number of images that will be processed when selecting which analytical method suits them best.Special consideration should also be given to available human resources and their skillset.Annual programs with 0−1000 images to process can consider using visual inspection and manual identification or categorization, as they will require low expertise and a total processing time of 9−18 h

Figure 1 .
Figure 1.From top to bottom, left to right: (A) example of an aerial image of the experimental tria where floating litter objects were deployed from the boat to collect aerial imagery; (B) an UA

Figure 1 .
Figure 1.From top to bottom, left to right: (A) example of an aerial image of the experimental trial where floating litter objects were deployed from the boat to collect aerial imagery; (B) an UAS operator using the commercial DJI Phantom Series UAV; (C,D) example of two types of collected aerial images with normal exposure (Blue Set-(C)) and exposure for low EV (Dark Set-(D)).

Figure 2 .
Figure 2. Linear regressions between the number of items per image (i.e., visually identified) and automated analysis using pixel-based detection (left panels) and automated object detection using machine learning (right panels) for the Blue Set (top) and Dark Set (bottom) of images.Machine learning has a greater correlation (R 2 = 0.88 and R 2 = 0.64 for Blue and Dark datasets, respectively) than that found for pixel-based detection (R 2 = 0.002 and R 2 = 0.25 for Blue and Dark datasets, respectively).

Figure 2 .
Figure 2. Linear regressions between the number of items per image (i.e., visually identified) and automated analysis using pixel-based detection (left panels) and automated object detection using machine learning (right panels) for the Blue Set (top) and Dark Set (bottom) of images.Machine learning has a greater correlation (R 2 = 0.88 and R 2 = 0.64 for Blue and Dark datasets, respectively) than that found for pixel-based detection (R 2 = 0.002 and R 2 = 0.25 for Blue and Dark datasets, respectively).

Figure 3 .
Figure 3. Average differences and respective standard deviations per category in the number of classified objects between manual counting and machine learning using the Blue Set of normally exposed images (A) and Dark Set of underexposed images (B).Negative values represent an overall underestimation shaped by false negatives (i.e., the number of non-detected objects in average), and positive values represent an overall overestimation shaped by false positives (i.e., the number of falsely identified objects on average).

Figure 3 .
Figure 3. Average differences and respective standard deviations per category in the number of classified objects between manual counting and machine learning using the Blue Set of normally exposed images (A) and Dark Set of underexposed images (B).Negative values represent an overall underestimation shaped by false negatives (i.e., the number of non-detected objects in average), and positive values represent an overall overestimation shaped by false positives (i.e., the number of falsely identified objects on average).

Figure 4 .
Figure 4. Average processing times during identification and classification over the three methods tested using images with normal exposure (Blue set) and low exposure (Dark Set)..

Figure 4 .
Figure 4. Average processing times during identification and classification over the three methods tested using images with normal exposure (Blue set) and low exposure (Dark Set)..