A Combination of Machine Learning Algorithms for Marine Plastic Litter Detection Exploiting Hyperspectral PRISMA Data

: A signiﬁcant amount of the produced solid waste reaching the oceans is made of plastics. The amount of plastic debris in the ocean and coastal areas is steadily increasing and is now a major global environmental issue. The monitoring of marine plastic litter, ground-based monitoring systems and/or ﬁeld campaigns are time-consuming, expensive, require great organisational efforts, and provide very limited information in terms of the spatial and temporal dynamics of marine debris. Earth Observation (EO) by satellite can contribute signiﬁcantly to marine plastic litter detection. In 2019, a new hyperspectral satellite, called PRISMA, was launched by the Italian Space Agency. The high spectral resolution of PRISMA may allow for better detection of ﬂoating plastic materials. At the same time, Machine Learning (ML) algorithms have the potential to ﬁnd hidden patterns and identify complex relations among data and are increasingly employed in EO. This paper presents the development of a new method of identifying ﬂoating plastic objects in coastal areas by exploiting pan-sharpened hyperspectral PRISMA data, based on the combination of unsupervised and supervised ML algorithms. The study consisted of a conﬁguration phase, during which the algorithms were trained in a fully controlled test, and a validation phase, in which the pre-trained algorithms were applied to satellite data collected at different sites and in different periods of the year. Despite the limited input data, results suggest that the tested ML approach, applied to pan-sharpened PRISMA data, can effectively recognise ﬂoating objects and plastic targets. The study indicates that increasing input datasets can help achieve higher-quality results.


Introduction
The year 1950 has been commonly considered to mark the beginning of plastic mass production [1]. Since then, 8.3 billion tons of virgin plastic materials have been manufactured [2]. Between 1950 and 2015, it is estimated that 6.3 billion tons of plastic waste have been produced globally [2], constituting almost 76% of all virgin plastics produced since the 1950s. Eight million tons of plastic items spill into the ocean every year [3]. Plastic items end up in the ocean from various sources in many ways and never fully biodegrade, thus threatening aquatic species, marine and coastal ecosystems, and also human beings as plastic debris enters our food chain [4][5][6][7]. The abundance of biodiversity and precious resources for humans and other species calls for efficient technologies to monitor marine pollution caused by plastic litter. Ground-based monitoring systems and field campaigns provide precise information on the quantity and quality of marine litter, but present several The study sites offered an unobstructed space in which to run the controlled experiments, with no interference from any touristic or commercial activities. Moreover, the seabed was sufficiently deep and dark to simulate deep waters effectively, as the spectral response of clear deep water has a unique characteristic in the blue part of the electromagnetic spectrum and becomes insignificant and practically null in the Near-Infrared (NIR) and Short-Wave Infrared (SWIR) portions of the spectrum.

Study Area
For this investigation, satellite data were collected for two study areas: Tsamakia Beach in Mytilene (Lat: 39.108406°; Long: 26.565948°) and Geras Gulf (Lat: 39.046606°; Long: 26.526732°), which are located on the Greek island of Lesvos ( Figure 1). Six controlled experiments were set up to simulate real-world situations at both sites. The study sites offered an unobstructed space in which to run the controlled experiments, with no interference from any touristic or commercial activities. Moreover, the seabed was sufficiently deep and dark to simulate deep waters effectively, as the spectral response of clear deep water has a unique characteristic in the blue part of the electromagnetic spectrum and becomes insignificant and practically null in the Near-Infrared (NIR) and Short-Wave Infrared (SWIR) portions of the spectrum. The configuration phase of this study was conducted on Tsamakia Beach, whilst the validation phase was performed in Geras Gulf.

Field Data
Twelve floating plastic targets were built for the controlled experiments ( Figure 2). The targets were square in shape and made in three different sizes. For each size, four targets were built with four different plastic materials: three targets were made of highdensity polyethylene (HDPE) (tarps in white, yellow, and green); three targets were made using polyethylene terephthalate (PET) (transparent water bottles, green oil bottles), three targets were made using polystyrene (PS) (sheets for building insulation, in cyan), and three targets were composed of all the above materials over an equal surface area. The specifications of the 12 plastic targets are reported in Table 1, and fully described in [17]. The different sizes were defined based on the spatial resolution that was expected to be achieved with the pan sharpening on PRISMA images, and the lowest possible threshold of accumulation size detectable with these input data was identified. The various plastic materials were chosen to cover all of the most diffused materials dispersed in the marine environment. The 12 targets were placed offshore and onshore during four PRISMA The configuration phase of this study was conducted on Tsamakia Beach, whilst the validation phase was performed in Geras Gulf.

Field Data
Twelve floating plastic targets were built for the controlled experiments ( Figure 2). The targets were square in shape and made in three different sizes. For each size, four targets were built with four different plastic materials: three targets were made of high-density polyethylene (HDPE) (tarps in white, yellow, and green); three targets were made using polyethylene terephthalate (PET) (transparent water bottles, green oil bottles), three targets were made using polystyrene (PS) (sheets for building insulation, in cyan), and three targets were composed of all the above materials over an equal surface area. The specifications of the 12 plastic targets are reported in Table 1, and fully described in [17]. The different sizes were defined based on the spatial resolution that was expected to be achieved with the pan sharpening on PRISMA images, and the lowest possible threshold of accumulation size detectable with these input data was identified. The various plastic materials were chosen to cover all of the most diffused materials dispersed in the marine environment. The 12 targets were placed offshore and onshore during four PRISMA passages over the study sites ( Figure 1). The Global Positioning System (GPS) coordinates of the plastic targets were collected during controlled experiments. passages over the study sites ( Figure 1). The Global Positioning System (GPS) coordinates of the plastic targets were collected during controlled experiments. October 2020(sensing date) as well as details of the plastic targets during the construction stage. Due to their small sizes, the four small floating plastic targets (T-3x) were not clearly visible. Table 1. Specifications of the 12 plastic targets: three targets were made using high-density polyethylene (HDPE); three targets were made using polyethylene terephthalate (PET); three other targets were made using polystyrene (PS), and the last three targets were realised with all the above materials over an equal surface area [17]. In this study, data acquired by the new hyperspectral satellite PRISMA were used. PRISMA was developed and operated by the Italian Space Agency in 2019. It records data in the 400-2500 nm spectral window with 239 bands (66 bands in the VNIR and 173 in the SWIR range), with a spectral resolution of less than 12 nm and a spatial resolution of 30 m. The satellite also records a single panchromatic (PAN) band in the 400-700 nm spectral window at a spatial resolution of 5 m. PRISMA's relook time is approximately 29 days. The technical characteristics of PRISMA are reported in Table 2.  Table 1. Specifications of the 12 plastic targets: three targets were made using high-density polyethylene (HDPE); three targets were made using polyethylene terephthalate (PET); three other targets were made using polystyrene (PS), and the last three targets were realised with all the above materials over an equal surface area [17]. In this study, data acquired by the new hyperspectral satellite PRISMA were used. PRISMA was developed and operated by the Italian Space Agency in 2019. It records data in the 400-2500 nm spectral window with 239 bands (66 bands in the VNIR and 173 in the SWIR range), with a spectral resolution of less than 12 nm and a spatial resolution of 30 m. The satellite also records a single panchromatic (PAN) band in the 400-700 nm spectral window at a spatial resolution of 5 m. PRISMA's relook time is approximately 29 days. The technical characteristics of PRISMA are reported in Table 2. L1 products were exploited, as the atmospheric correction of L2D products affects image radiometry over water bodies. All collected PRISMA data were pre-processed with image fusion techniques to obtain pan-sharpened images with higher spatial resolution than the initial high-spectral-resolution images, fully exploiting PRISMA's panchromatic band at a 5 m spatial resolution. Image fusion techniques were applied to increase the sensor detectability of marine plastic litter (Figures 3 and 4). The pan sharpening was performed using the Principal Component Analysis (PCA) substitution method, reaching a spatial resolution of 5 m [18]. Bands with a low signal-to-noise ratio were removed. The removed bands were affected by high atmospheric absorption between 1350 and 1470 nm and between 1800 and 1970 nm. Thus, the final pre-processed data consisted of 175 bands (from 239) and a spatial resolution of 5 m. More details on data acquisition and pre-processing of the data utilised are reported in [17]. A summary of all PRISMA acquisitions is reported in Table 3.

Type
The study consisted of two phases: the configuration and validation phases. The configuration phase was performed on Tsamakia Beach ( Figure 1). During this phase, plastic targets were placed offshore and onshore. Four PRISMA images were collected: two images with targets offshore and two images with targets onshore. The offshore target images were used as input data to train ML algorithms to detect and recognise plastic pixel spectral behaviours. No information from onshore targets was used as input data; however, these two images were used as a preliminary crosscheck to ensure that no plastic pixels were detected offshore by ML algorithms.   Table 1). The validation phase was conducted in Geras Gulf ( Figure 1). New PRISMA data were collected with plastic targets placed offshore. The ML algorithms trained during the configuration phase were run with the new images during the validation phase.
For both applied ML methodologies, all parameters were calibrated on a subset of input data, covering the plastic targets and nearby pixels, for both satellite data collected with targets offshore.
Moreover, the two subsets were concatenated and normalised using a master image (i.e., the PRISMA image acquired on 18 September 2020) ( Figure 5). As the main goal of this study was to verify the possibility of distinguishing the spectral signals of plastic targets from other signals, the first step was the normalisation of the two PRISMA subsets using a histogram normalisation algorithm [19]. By applying this technique, it was possible to modify the histogram of each band of the second image (slave) using the histogram shape of each band of the first image (master). Thus, the digital numbers of the two types of data were more comparable and less affected by local or temporal features. The study consisted of two phases: the configuration and validation phases. The configuration phase was performed on Tsamakia Beach ( Figure 1). During this phase, plastic targets were placed offshore and onshore. Four PRISMA images were collected: two images with targets offshore and two images with targets onshore. The offshore target images were used as input data to train ML algorithms to detect and recognise plastic pixel spectral behaviours. No information from onshore targets was used as input data; however, these two images were used as a preliminary crosscheck to ensure that no plastic pixels were detected offshore by ML algorithms.
The validation phase was conducted in Geras Gulf ( Figure 1). New PRISMA data were collected with plastic targets placed offshore. The ML algorithms trained during the configuration phase were run with the new images during the validation phase.
For both applied ML methodologies, all parameters were calibrated on a subset of input data, covering the plastic targets and nearby pixels, for both satellite data collected with targets offshore.
Moreover, the two subsets were concatenated and normalised using a master image (i.e., the PRISMA image acquired on 18 September 2020) ( Figure 5). As the main goal of this study was to verify the possibility of distinguishing the spectral signals of plastic targets from other signals, the first step was the normalisation of the two PRISMA subsets using a histogram normalisation algorithm [19]. By applying this technique, it was possible to modify the histogram of each band of the second image (slave) using the histogram shape of each band of the first image (master). Thus, the digital numbers of the two types of data were more comparable and less affected by local or temporal features. Because of the high number of correlated bands of the input PRISMA images and to help the unsupervised algorithm to efficiently distinguish between different spectral behaviours, the K-Means was applied after the dimensionality reduction for each pixel. Conversely, the LGBM was applied to the entire spectral information given the availability of ground truth data (i.e., pixels containing plastic materials), as the GPS coordinates of offshore targets were known. Before running the K-Means algorithm, different combinations of pre-processing steps to reduce correlated bands were applied.
Two different combinations of pre-processing were tested for K-Means: a feature extraction algorithm using PCA, and a feature selection method that exploited a subset of the spectral bands. Using PCA, the data can be described using the first four Principal Components that represent 99% of the explained variance, while for the feature selection, one of every four bands is taken into account to remove highly correlated bands and preserve the shape of spectral signatures.

Machine Learning Methodologies
Two different ML algorithms were used to detect artificial plastic targets offshore. The first is an unsupervised ML algorithm among the clustering methods called K-Means [20]. K-Means is an incremental approach to performing clustering. It can identify similar behaviours and group them in a cluster using the nearness to the principal point (centroid) based on specified metrics. The K-Means guarantees the method's applicability even with a new and different plastic target than the ones made for this study. The major issue in using K-Means is finding the correct value of the K parameter and the optimal number of clusters (or groups) accurately describing the variability of data. The silhouette analysis [21] was used to identify the correct number of clusters. The second ML algorithm is a supervised algorithm among the Decision Tree methods, termed the Light Gradient Boosting Model (LGBM) [22]. Due to the small number of pixels representing plastic materials in the collected PRISMA data, an unsupervised method was preferred. Unsupervised methods can extract hidden patterns directly from raw input data without the need for ground truth.
Nevertheless, the accuracy obtained with unsupervised algorithms is lower than the accuracy reached using supervised methodologies, which adopt labelling. On the other hand, supervised methods can automatically identify complex relationships between input data and ground truth. Thus, both methodologies were applied to output the final probability mask of plastic presence to increase the accuracy of results. For the supervised approach, four labelling classes were considered: land, shallow waters, deep waters, and plastic targets. The four classes were manually detected through photo interpretation, and the plastic pixels were extracted using only medium and large targets (Table 4). For the latter, GPS coordinates collected during controlled experiments were exploited. In the first step of the workflow, the K-Means was applied, and the optimal number of clusters was set to eight through the silhouette analysis. The K-Means was applied twice: in the first case, following the dimensionality reduction in input images through the PCA; in the second case, reducing input images through band sub-sampling (retaining one of every four bands) and after feature selection. In the first instance, the K-Means detected 8 of 12 targets (medium-size T-2x to large T-1x size), while in the second the K-Means extracted 9 of 16 targets (7 large T-1x and 2 medium T-2x from the concatenated images). In both cases, the K-Means was not capable of detecting small targets, and issues arose in distinguishing between plastic targets in shallow waters. Thus, preliminary masking of land and shallow waters was required. In the second step of the workflow, the LGBM was applied. The algorithm was trained on a dataset subset (80%), whilst validation was performed on the remaining dataset subset (20%) to compute accuracy and avoid overfitting. The final probability map was outputted by combining results from the K-Means with band sampling and from the LGBM, based on their accuracy as follows: wKM = coKM/(coKM + oaLGBM) (1) wLGBM = oaLGBM/(coKM + oaLGBM) (2) where coKM is the internal consistency of the K-Means; oaLGBM is the overall accuracy of the LGBM; wKM is the weight assigned to the K-Means; and wLGBM is the weight assigned to the LGBM. The entire workflow is illustrated in Figure 6.
images). In both cases, the K-Means was not capable of detecting small targets, and issues arose in distinguishing between plastic targets in shallow waters. Thus, preliminary masking of land and shallow waters was required. In the second step of the workflow, the LGBM was applied. The algorithm was trained on a dataset subset (80%), whilst validation was performed on the remaining dataset subset (20%) to compute accuracy and avoid overfitting. The final probability map was outputted by combining results from the K-Means with band sampling and from the LGBM, based on their accuracy as follows: wKM = coKM/(coKM + oaLGBM) (1) wLGBM = oaLGBM/(coKM + oaLGBM) (2) where coKM is the internal consistency of the K-Means; oaLGBM is the overall accuracy of the LGBM; wKM is the weight assigned to the K-Means; and wLGBM is the weight assigned to the LGBM. The entire workflow is illustrated in Figure 6. Each algorithm was independently applied to the images, and the final map was generated using the sum of the weights (Figure 7). Each algorithm was independently applied to the images, and the final map was generated using the sum of the weights (Figure 7).
In a later stage, the pre-trained K-Means and LGBM algorithms were applied to the other two satellite data collected during the configuration phase with the plastic onshore targets (Table 3). This test served as preliminary testing of the trained algorithms, which successfully did not detect any false-positive plastic pixels offshore. It is important to highlight that the normalisation of input data was essential to obtain meaningful and comparable results. The PRISMA images with the onshore targets were normalised using the same master image employed in the training phase. Remote Sens. 2022, 14, x FOR PEER REVIEW 10 of 16 In a later stage, the pre-trained K-Means and LGBM algorithms were applied to the other two satellite data collected during the configuration phase with the plastic onshore targets (Table 3). This test served as preliminary testing of the trained algorithms, which successfully did not detect any false-positive plastic pixels offshore. It is important to highlight that the normalisation of input data was essential to obtain meaningful and comparable results. The PRISMA images with the onshore targets were normalised using the same master image employed in the training phase.

Results
During the configuration phase of the study, the output probability map was able to highlight plastic targets offshore. The computed LGBM overall accuracy referred to the ground truth samples only and not to the whole map. To perform a quantitative analysis, a threshold was set to binarise the map, and the overall accuracy was performed on the entire map. Based on the test results, the threshold was set to 0.6. Thus, if a pixel value of the output map was greater than or equal to 0.6, the pixel was assigned to "Class 1"-"Plastic"; otherwise, it was assigned to "Class 0"-"No Plastic". To compare the results, the ground truth was built as follows: five pixels of Class 1 were selected around the GPS coordinates of big targets, and one pixel of Class 1 was selected around the GPS coordinates of medium targets, by taking advantage of photo interpretation. The other pixels of the ground truth map were assigned to Class 0. The true-positive results are shown in Table 5, where overall accuracy was 72.92%.

Results
During the configuration phase of the study, the output probability map was able to highlight plastic targets offshore. The computed LGBM overall accuracy referred to the ground truth samples only and not to the whole map. To perform a quantitative analysis, a threshold was set to binarise the map, and the overall accuracy was performed on the entire map. Based on the test results, the threshold was set to 0.6. Thus, if a pixel value of the output map was greater than or equal to 0.6, the pixel was assigned to "Class 1"-"Plastic"; otherwise, it was assigned to "Class 0"-"No Plastic". To compare the results, the ground truth was built as follows: five pixels of Class 1 were selected around the GPS coordinates of big targets, and one pixel of Class 1 was selected around the GPS coordinates of medium targets, by taking advantage of photo interpretation. The other pixels of the ground truth map were assigned to Class 0. The true-positive results are shown in Table 5, where overall accuracy was 72.92%. The true-positive results show that the proposed method can effectively detect floating objects offshore. In fact, considering only the central points of the targets as ground truth, 13 of 16 objects were highlighted. On the other hand, there were some commission errors. Different points on the map were classified as "Class 1", but were mostly isolated points with low probability. It is clear that the score coming from the unsupervised method was not significant with respect to the supervised method. Nevertheless, to perform the proposed method in different zones, the contribution of the K-Means can enable achieving high accuracy in the presence of a previously unidentified and different object. To confirm these sound and promising results, the workflow presented was applied to another location (validation phase) where the exact position of the target was unknown (Figures 1 and 8), and two new PRISMA images were collected with the offshore targets (Table 3). During the validation phase two, more large circular targets were placed in the Geras Gulf, one made of wood and the second of plastics [23]. The new data were acquired in a different season, under different light conditions, showing different histograms and different spectral characteristics than the images collected and exploited in the configuration phase. The true-positive results show that the proposed method can effectively detect floating objects offshore. In fact, considering only the central points of the targets as ground truth, 13 of 16 objects were highlighted. On the other hand, there were some commission errors. Different points on the map were classified as "Class 1", but were mostly isolated points with low probability. It is clear that the score coming from the unsupervised method was not significant with respect to the supervised method. Nevertheless, to perform the proposed method in different zones, the contribution of the K-Means can enable achieving high accuracy in the presence of a previously unidentified and different object. To confirm these sound and promising results, the workflow presented was applied to another location (validation phase) where the exact position of the target was unknown (Figures 1 and 8), and two new PRISMA images were collected with the offshore targets ( Table 3). During the validation phase two, more large circular targets were placed in the Geras Gulf, one made of wood and the second of plastics [23]. The new data were acquired in a different season, under different light conditions, showing different histograms and different spectral characteristics than the images collected and exploited in the configuration phase. Two tests were conducted within the validation phase. Supervised and unsupervised ML algorithms trained during the configuration phase were applied to the new PRISMA images for the first test. Figure 9 shows the output of the first test run on the satellite data collected on 23 June 2021. In this case, pre-trained K-Means and LGBM detected three floating objects (probably boats) on the surface of Figure 9 and two targets (one plastic Two tests were conducted within the validation phase. Supervised and unsupervised ML algorithms trained during the configuration phase were applied to the new PRISMA images for the first test. Figure 9 shows the output of the first test run on the satellite data collected on 23 June 2021. In this case, pre-trained K-Means and LGBM detected three floating objects (probably boats) on the surface of Figure 9 and two targets (one plastic and one wooden) on the bottom. It is worth noting that it was possible to remove several false positives using probability values. Domain experts can move the probability threshold to highlight the desired output. Preliminary masking of land and shallow waters was required to overcome a few open issues near the coastline. Figure 10 shows the output of the first test, run on the satellite data collected on 29 June 2021. In this case, no relevant results were obtained. The probability map (Figure 10a) did not highlight significant floating objects. This could be related to ML architecture: the new data might have values far different from data values collected during the configuration phase. Moreover, the new data values might not show enough variability compared to the data values of the configuration phase. Furthermore, atmospheric conditions might have played a significant role. false positives using probability values. Domain experts can move the probability threshold to highlight the desired output. Preliminary masking of land and shallow waters was required to overcome a few open issues near the coastline. Figure 10 shows the output of the first test, run on the satellite data collected on 29 June 2021. In this case, no relevant results were obtained. The probability map (Figure 10a) did not highlight significant floating objects. This could be related to ML architecture: the new data might have values far different from data values collected during the configuration phase. Moreover, the new data values might not show enough variability compared to the data values of the configuration phase. Furthermore, atmospheric conditions might have played a significant role.  A second test was performed to solve issues that arose with the first test and to better investigate the effect of training data augmentation on the final results. The K-Means and LGBM were re-trained with three images for the second test: two images collected during the configuration phase plus a third from the validation phase, collected on 23 June 2021. The third image was used to increase the number of plastic pixels in training the LGBM algorithm.
LGBM was trained on a dataset subset (80%), whilst the validation was performed on the remaining dataset subset (20%) to compute accuracy and avoid overfitting. Figure 11 shows notable improvement compared to Figure 10. and one wooden) on the bottom. It is worth noting that it was possible to remove several false positives using probability values. Domain experts can move the probability threshold to highlight the desired output. Preliminary masking of land and shallow waters was required to overcome a few open issues near the coastline. Figure 10 shows the output of the first test, run on the satellite data collected on 29 June 2021. In this case, no relevant results were obtained. The probability map (Figure 10a) did not highlight significant floating objects. This could be related to ML architecture: the new data might have values far different from data values collected during the configuration phase. Moreover, the new data values might not show enough variability compared to the data values of the configuration phase. Furthermore, atmospheric conditions might have played a significant role.  A second test was performed to solve issues that arose with the first test and to better investigate the effect of training data augmentation on the final results. The K-Means and LGBM were re-trained with three images for the second test: two images collected during the configuration phase plus a third from the validation phase, collected on 23 June 2021. The third image was used to increase the number of plastic pixels in training the LGBM algorithm.
LGBM was trained on a dataset subset (80%), whilst the validation was performed on the remaining dataset subset (20%) to compute accuracy and avoid overfitting. Figure 11 shows notable improvement compared to Figure 10. A second test was performed to solve issues that arose with the first test and to better investigate the effect of training data augmentation on the final results. The K-Means and LGBM were re-trained with three images for the second test: two images collected during the configuration phase plus a third from the validation phase, collected on 23 June 2021. The third image was used to increase the number of plastic pixels in training the LGBM algorithm.
LGBM was trained on a dataset subset (80%), whilst the validation was performed on the remaining dataset subset (20%) to compute accuracy and avoid overfitting. Figure 11 shows notable improvement compared to Figure 10

Discussion
Two different phases were composed in this work, the configuration phase and the validation phase. In the configuration phase, the methodology was set to build as general a method as possible using a combination of two ML methods. The configuration phase shows the capability of the proposed method to detect floating objects and distinguish the spectral behaviour of shallow water. Furthermore, despite the small size of the medium targets (~2.4 m × 2.4 m) compared to the sensor resolution (5 m × 5 m), the proposed method was able to detect six of eight targets.
Hence, it is clear that during the configuration phase, the unsupervised method alone was not enough to reach high accuracy. The results suggest that the supervised method (LGBM) is sufficient in the presence of more ground truth data; in fact, LGBM's overall accuracy was about 96%, and all plastic targets (from medium to large) were efficiently detected. Nevertheless, with the two methods combined, system operability was always guaranteed and independent of ground truth availability. If only the K-Means was used, several false positives would have been generated. The combination of the K-Means and LGBM helped us to reach accurate results.
The validation phase was used to understand if the proposed method had the necessary generality in terms of applications. In fact, to ensure that overfitting was avoided, the method was applied in an independent area. The result shows that floating objects were correctly detected (in the second test). Applying K-Means, the third acquisition was used to increase the available information on plastic behaviour. Eventually, the re-trained algorithms were applied to the satellite image of the validation phase collected on 29 June 2021, and the final probability map was output as previously described in the configuration phase. Some false positives remained, but floating objects (Figure 11a) and bigger targets (Figure 11b) were detected with higher accuracy than in the first test. It appears that using more data in training ML algorithms allows for the detection of generic floating objects and plastic targets to be improved.

Discussion
Two different phases were composed in this work, the configuration phase and the validation phase. In the configuration phase, the methodology was set to build as general a method as possible using a combination of two ML methods. The configuration phase shows the capability of the proposed method to detect floating objects and distinguish the spectral behaviour of shallow water. Furthermore, despite the small size of the medium targets (~2.4 m × 2.4 m) compared to the sensor resolution (5 m × 5 m), the proposed method was able to detect six of eight targets.
Hence, it is clear that during the configuration phase, the unsupervised method alone was not enough to reach high accuracy. The results suggest that the supervised method (LGBM) is sufficient in the presence of more ground truth data; in fact, LGBM's overall accuracy was about 96%, and all plastic targets (from medium to large) were efficiently detected. Nevertheless, with the two methods combined, system operability was always guaranteed and independent of ground truth availability. If only the K-Means was used, several false positives would have been generated. The combination of the K-Means and LGBM helped us to reach accurate results.
The validation phase was used to understand if the proposed method had the necessary generality in terms of applications. In fact, to ensure that overfitting was avoided, the method was applied in an independent area. The result shows that floating objects were correctly detected (in the second test). Applying K-Means, the third acquisition was used to increase the available information on plastic behaviour. Eventually, the re-trained algorithms were applied to the satellite image of the validation phase collected on 29 June 2021, and the final probability map was output as previously described in the configuration phase. Some false positives remained, but floating objects (Figure 11a) and bigger targets ( Figure 11b) were detected with higher accuracy than in the first test. It appears that using more data in training ML algorithms allows for the detection of generic floating objects and plastic targets to be improved.

Conclusions
The remotely sensed detection of accumulated plastic litter in the marine environment remains a challenge due to the paucity of data availability and spatial and spectral resolutions. Remote sensing applied to marine plastic litter detection is still in its early stages, but is an active hot research topic. Nevertheless, the detection of plastic accumulation and spatial distribution can be essential for effective environmental monitoring in the hands of regional and national agencies within the framework of domestic and international regulations. It can represent the starting point for identifying areas prone to plastic litter accumulation and evaluating the status of plastic pollution in marine areas.
The in situ detection of plastic accumulation for monitoring large surfaces raises certain difficult issues given the narrow perspective applied to solve a global environmental problem, the extreme spatial dynamicity of marine plastic pollution, and the financial resources invested. Satellite data can be of help in this context. However, the availability of satellite imagery to detect actual plastic accumulation with the proper spatial and spectral resolutions, which are cloud-free and collected under good sea weather conditions, are the main drawbacks of remotely sensed optical data. Moreover, these drawbacks slacken the pace of research and development activities regarding this research topic.
The availability of new hyperspectral satellites, such as PRISMA, designed by the Italian Space Agency, that collect data at high spectral resolution (i.e., 239 hyperspectral bands plus a panchromatic band) and medium spatial resolution (i.e., 30 m for the hyperspectral cube and 5 m for the panchromatic band) together with ML algorithms creates room for improvement.
This work aimed to develop a new method, based on a combination of two ML techniques, one unsupervised (K-Means) and the other supervised (LGBM), to detect 12 plastic targets offshore by exploiting pan-sharpened PRISMA hyperspectral data. K-Means alone detected eight of twelve targets, from 2.4 m to 5.1 m in size, while LGBM detected all plastic targets (from 0.6 m to 5.1 m), reaching an overall accuracy of 96%. Finally, the two methods were combined to guarantee operability and extend the capability of detecting different spectral behaviours of the same object under different probable conditions during satellite sensing. Furthermore, the combination of K-Means and LGBM helped to enhance the distinction between floating objects and shallow water.
The results show the capability of the proposed method to detect floating objects offshore. Furthermore, the combination of unsupervised and supervised algorithms was able to reduce false positives, which allows this method to become a supportive tool for domain experts.
Despite the small number of satellite input data, the study showed that the new approach applied to PRISMA hyperspectral data can effectively identify plastic floating marine objects larger than 2.4 m. Furthermore, the study suggests that training ML algorithms with a more robust satellite dataset using plastic materials can improve the performance of this novel method, reducing false positives such as boats or those caused by sunglint. Increasing the satellite dataset with floating plastic material would also allow the exploration of Deep Learning methodologies, such as Generative Adversarial Networks, and the implementation of different ML algorithms.