Sensors, Features, and Machine Learning for Oil Spill Detection and Monitoring: A Review

: Remote sensing technologies and machine learning (ML) algorithms play an increasingly important role in accurate detection and monitoring of oil spill slicks, assisting scientists in forecasting their trajectories, developing clean-up plans, taking timely and urgent actions, and applying e ﬀ ective treatments to contain and alleviate adverse e ﬀ ects. Review and analysis of di ﬀ erent sources of remotely sensed data and various components of ML classiﬁcation systems for oil spill detection and monitoring are presented in this study. More than 100 publications in the ﬁeld of oil spill remote sensing, published in the past 10 years, are reviewed in this paper. The ﬁrst part of this review discusses the strengths and weaknesses of di ﬀ erent sources of remotely sensed data used for oil spill detection. Necessary preprocessing and preparation of data for developing classiﬁcation models are then highlighted. Feature extraction, feature selection, and widely used handcrafted features for oil spill detection are subsequently introduced and analyzed. The second part of this review explains the use and capabilities of di ﬀ erent classical and developed state-of-the-art ML techniques for oil spill detection. Finally, an in-depth discussion on limitations, open challenges, considerations of oil spill classiﬁcation systems using remote sensing, and state-of-the-art ML algorithms are highlighted along with conclusions and insights into future directions. and ML approaches for oil


Introduction
Oil spills are generally characterized as the release of liquid petroleum hydrocarbons into the environment due to human activities [1]. Spillage commonly occurs in water, on ice, or land during oil exploration, production, transportation, refining, storage, and distribution [2]. For instance, oil spillage may occur from offshore oil platforms, refineries, pipelines, chemical plants, treatment facilities, and transportation accidents and deliberate oil discharges from ships, as well as oil disposal from energy production and operational errors. Accidents generally account for the massive oil spill incidents worldwide. Shipping accidents, particularly mishaps caused by oil tankers, release a significant amount of oil and pose a substantially higher threat to water ecosystems more than other

Remotely Sensed Data
Remotely sensed data are extensively used to detect and monitor oil spills in the past few decades. These data are generally acquired by passive and active systems. Passive sensors record naturally reflected and/or emitted solar radiation from the observed object, whereas active sensors use their energy source to illuminate sensed targets and record backscattered energy from the target. Visible and infrared multispectral, hyperspectral, thermal, microwave, and laser fluorosensors are some relevant remote sensing techniques for oil spill detection, monitoring, type characterization, and thickness estimation. Given that each technique has its own advantages and shortcomings, acquiring essential information for timely and effective oil spill management from one source of data can be challenging [28,29]; thus, a concession exists when selecting potential technique(s) from others. The advantages and shortcomings of optical and microwave remote sensing data for oil spill detection and monitoring are discussed in the following sections.

Optical Data
Optical images are less widely used in oil spill studies compared to microwave images owing to their dependence on weather conditions and day light. Although the presence of cloud in skies and the lack of sunlight hinder the usage of optical sensors, these devices have a unique spectral characteristic that can fill the spatial and temporal gaps for a synoptic coverage of oil spills, provide valuable information to differentiate between oil spills and water surface features (i.e., algal blooms) [14,30,31], potentially identify oil spills at fine levels [32], and provide relative information for oil spill thickness estimation [28]. Passive remotely sensed data acquired at different regions of the electromagnetic spectrum, including the ultraviolet (100-400 nm), visible (VIS) (400-700 nm), and near-infrared (NIR) (750-1400 nm) regions, are used in detecting [30,[32][33][34][35][36][37][38][39][40] and estimating oil spill surface thickness [31,[41][42][43][44][45]. The utilization of multispectral data for oil spill detection is growing, and various satellite data with different resolutions are used in different studies, as shown in Table 1. Such studies used moderate-resolution imaging spectroradiometer [38,[46][47][48], medium resolution imaging spectrometer (MODIS) [49], Sentinel-2 [36], Landsat [30,50,51], KOMPSAT-2 [52], and Gaofen-1 [51,53], among others. Figure 2 shows the different oil spill incidents captured by various optical satellites, such as Sentinel-2 (Figure 2a  Spectral characteristics of oil spills may vary from one source to another depending on physical oil characteristics, film thickness, weather and illumination conditions, and optical properties of water column [36]. Although oil appears in the visible region of the spectra (approximately 400-700 nm) and exhibits relatively higher reflectance than water, it has no unique reflection/absorption features that can enable straightforward discrimination between oil and the background [15,54]. Nevertheless, in some special cases (i.e., Figure 2c), oil may look silvery with a greater reflectance than the background [54]. Moreover, heavy oil may look brown and peak between the 600 and 700 nm regions of the spectra, whereas mousse appears red-brown and peaks near 700 nm [55]. However, detectability of oil spills in images produced by visible sensors can be affected by the contrast between oil and water based on several factors, including illumination-view geometry (satellite and solar zenith angles), cloud coverage, sea state (wind speed), oil spectral properties (refractive index and absorption coefficient), and oil conditions and thickness [15,54,56].
The NIR bands (750-1400 nm) from sun-glittered satellite images for oil spill detection have been investigated in few studies. For instance, Pisano et al. [57] detected the marine oil spill from a MODIS near-infrared sun-glittered radiance imagery. Adamo et al. [58] reported that the NIR bands of MODIS and MEdium Resolution Imaging Spectrometer (MERIS) images show an increased performance of oil/non-oil class separability compared with the bands in the visible range. Moreover, the absorption features in the near-infrared NIR region have been used as a proxy for estimating the thickness of oil spill [41,42,49]. Hyperspectral remote sensing, also known as imaging spectroscopy, records reflected and/or emitted solar radiation in large number (hundreds) of contiguous narrow spectral bands ranging from 350 nm to 2500 nm. The absorption features of oil and other materials can be detected in hyperspectral images considering that a continuous and detailed spectrum is measured for each pixel. The high spectral resolution is vital considering the spectral similarity between diverse oil types, such as crude, diesel, lubricant, and kerosene, and subtle differences can be distinguished using hyperspectral RS technology [81]. Several studies on oil spills use hyperspectral systems, including airborne visible/infrared imaging spectrometry [22,23,[82][83][84][85][86][87], Hyperion [81,88], and analytical spectral devices [89]. However, the requirement of advanced processing and analysis in hyperspectral images is a setback for real-time monitoring [15]. In addition, hyperspectral data are relatively expensive and publicly unavailable compared with multispectral and microwave images.
Oil, as a type of optically thick object, demonstrates different thermal characteristics (i.e., thermal conductivity, thermal inertia, and heat capacity) in comparison with surrounding water [90]. Given that oil absorbs and re-emits some parts of solar radiation as thermal energy, depending on the thickness of the oil slicks, it has a higher thermal infrared emissivity than water largely in the long-wavelength infrared spectral region (8000-14,000 nm) [54]. Temperature differences resulting from the variations in the emissivity enable the recognition of oil spills on sea surfaces. For instance, during sunny days, thick oil (greater than 500 µm thick) appears radiometrically hotter (brighter) in infrared images than the surrounding water due to its absorbance of greater amount of solar radiation; intermediate oil appears cooler (darker), while sheen (thin oil) and rainbow (very thin oil) are not detected [79,91]. Conversely, thick oil can appear cooler than the surrounding water at night because the heat loss of oil is faster than its surrounding water [91,92]. Considering that early morning and late afternoon are in-between periods, earlier in the afternoon could be a suitable time for oil film detection by thermal infrared sensors [93]. Various studies investigated the potential of using infrared bands for oil spill detection and monitoring from different remotely sensed data, including Landsat [94,95], Advanced Very High-Resolution Radiometer(AVHRR) [74,78,79,96], MODIS [97,98], Advanced Spaceborne Thermal Emission, and Reflection Radiometer (ASTER) [73,74], Environmental Satellite Advanced Along-Track Scanning Radiometer (ENVISAT-AATSR) [99], and thermal infrared imager [100]. One of the shortcomings of oil spill detection from thermal infrared images is that natural objects, such as shorelines, sediments, and organic matter, may appear like oil in thermal infrared images, which may cause errors to the detection of oil objects [15,16]. In addition, the resolution of satellite-based thermal images is low, and thermal images are often noisy and blurry [101].

SAR Data
Active microwave sensors are frequently used remote sensing systems for oil spill detection and monitoring due to their broad coverage and capabilities in collecting day-and-night data under all-weather conditions. Two main types of radar imaging are used in the detection and monitoring of oil spills, namely, synthetic aperture radar (SAR) and side-looking airborne radar (SLAR) systems. SAR (satellites-based), and SLAR (airborne) transmit/receive backscattered radio waves, and the reflection of target-surface properties are recorded to produce two-dimensional images of the scene. Both systems operate based on the same synthetic aperture principle and share the same side-looking imaging geometry. The usefulness and effectiveness of utilizing satellite-based SAR data for oil spill detection have been established in the reviewed literature. Table 2 lists the available satellite-based SAR system utilized in the reviewed literature on oil spills along with their frequencies and polarimetry.
The presence of oil in the sea typically reduces the intensity of the backscattered energy because oil dampens small-scale sea surface capillaries and short gravity waves [102,103]. Consequently, oil spills appear dark in SAR images. For example, Figure 3 depicts different oil spill incidents captured in various SAR images acquired in the Interferometric Wide (IW) swath mode and generated in high-resolution Level-1 ground range detected (GRD) format which entail radar observations projected onto a regular 10 × 10 m grid. However, one challenge in using SAR images for oil spill detection is that oil spills are only one of other phenomena, such as manmade or natural events, which can reduce the scattering mechanism and appear dark in SAR images. These phenomena are known as lookalikes, which may include the following: natural surface films produced by plankton or fish, grease, floating algae, internal waves, low-wind areas, plant oil, ship wakes, and convergence zones. The appearance of oil spills may vary in SAR images because the radiometric characteristics of radar imaging can differ based on the option of wavelengths, frequencies, and polarizations. L (wavelength of 24 cm), C (wavelength of 6 cm), and X (wavelength of 3 cm) are commonly used microwave bands in oil spill detection. This review found that the C-band is widely used in radar imagery for detecting oil spills, followed by X-and L-bands. SAR systems operate in different polarization schemes (VV, HH, VH, and HV). This condition enables the extraction of unique information for oil spill detection and monitoring. For instance, Sentinel-1 and Radarsat-2 can provide dual-polarized SAR data, that is, HH (horizontal transmitting and receiving) + HV (horizontal transmitting and vertical receiving) or VV+VH. Polarization modes can be single (i.e., HH or HV), dual (i.e., HH/HV or VV/VH), and quad (HH, HV, VH, and VV). However, some features can only be observed using specific polarizations. For example, an oil spill incident can be seen on the VV band of Sentinel-1 data, but it may not be visible in the corresponding VH band.

Optical Images
Given the inherent radiometric and geometric errors of optical and SAR sensors and the strong effect of the environmental conditions, the preprocessing of remotely sensed data is a fundamental step in enhancing the data quality and improving the accuracy of the developed oil spill classification systems. The preprocessing chain of optical data may vary on the basis of the quality of the data source and the required level of processing for the analysis (i.e., levels 0, 1B, and 1C). This process can generally be categorized into five main steps, namely, radiometric calibration, atmospheric correction, geometric correction, image enhancement, and masking. The radiometric correction of optical images is required to mitigate the atmospheric effects to improve the identification of oil spills [53,66] and remove the radiometric sensor aging effects and radiometric discrepancy among sensors (i.e., Landsat TM and ETM+) [166]. Atmospheric correction softens the atmospheric effects by eliminating the influence of the atmospheric molecules and aerosol scattering [71] and improving the extraction of real surface parameters from satellite images (i.e., surface reflectance, emissivity, and surface temperature) [167]. This correction is considered in different oil spill studies [30,32,47,49,53,62,64,168].
Considering that oil spills can be monitored by different satellite-based or airborne imaging systems, geometric correction might need to be applied prior to image analysis if an image is to be compared with multitemporal and multisensory images or with existing vector data/maps. Image data should then be projected to a local or common projection system (i.e., Universal Transverse Mercator). Image enhancement, such as contrast enhancement, is applied to each image scene to enhance the oil slick visibility. Masking clouds, smoke, shadow, and land pixels can enhance the oil spill visualization and improve the discrimination of oil spills from the complex surrounding features [32,63].

SAR Images
Preprocessing of SAR data can generally be divided into four main steps, namely, radiometric calibration, geocoding, filtering, and masking. First, radiometric correction and calibration of SAR images are essential procedures to eliminate or reduce radiometric distortions and ensure that pixel values in SAR images are linked to the backscattering coefficient (sigma naught [measured in dB]) of the reflecting surface [169]. Thus, quantitative measurements (backscattered microwave energy from ground targets) restored from digital number values of image pixels and characteristics of an object in multitemporal SAR images acquired with different SAR sensors and modes can be compared [170,171].
The presence of geometrical distortions in SAR images (i.e., foreshortening, layover, and shadow) can minimize the use of SAR data and impede information extraction in various applications. Thus, geocoding of SAR data is required to minimize geometric distortions, and the location of any pixel in SAR imagery can be connected directly to the location on the ground [172]. In addition, the geocoding of SAR data facilitates the integration of geospatial data collected from different sources to improve monitoring and classification processes of oil spills. However, a high-resolution digital elevation model and additional knowledge, regarding orbit and sensor platforms, are needed for accurate terrain geocoding [173].
Unlike passive sensors, SAR images contain a certain degree of dark and light multiplicative noise known as speckle. The speckle noise is caused by the random interference of waves received by the sensor of many elementary reflectors within the ground resolution cell (or pixel) with a single resolution [174]. The noise may reduce the efficiency of information extraction techniques, human interpretation, automated scene analysis, and the analysis of multiple SAR observations [175]. Thus, speckle filtering is a crucial step in SAR images for oil spill classification systems. The optimal filtering technique should preserve the useful radiometric information and avoid the loss of scene features, such as local mean of backscatter, texture, linear features, edges, and point targets [176]. Various filter types with different kernel sizes were utilized by previous studies to reduce the speckle and enhance the SAR images for oil spill detection. Lee, enhanced Lee, Frost, Kuan, median, Lopez, boxcar, and non-local mean filters are some examples of the filters used in speckle filtering techniques. The most commonly used despeckling techniques for SAR images in the reviewed literature are the Lee [26,64,66,109,136,161] and enhanced Lee filters [112,149,154]. These filters are selected in oil spill studies due to their outstanding ability to minimize speckle noises while preserving edge sharpness and the important features in the SAR images. The final preprocessing step is masking out land and shorelines from SAR images. This process prevents the land from interfering with the detection of oil spills while reducing the computational intensity of the image [129]. In addition to the masking out of land and shorelines, weed beds and algae infestations can also be removed [15].

Feature Extraction
Feature extraction, a critical stage in oil spill detection systems, allows the extraction and input of a set of features to distinguish oil spills from lookalikes (natural phenomena, such as algae bloom, biogenic slicks, currents, and low-wind areas) and other targets on the water. The incorporation of features with reliable discriminatory power contributes to the improvement of classification accuracy in oil spill detection. Various studies have attempted to determine the optimal combination of different features for detecting and classifying oil spills [110,145,[177][178][179][180][181]. However, the lack of systematic research on the extraction and combination of various sets of features (i.e., SAR polarimetric, textural, geometrical, and other features) and their influence on classification accuracies has generally contributed to the arbitrary selection of features as inputs to numerous classification systems [177,180]. The majority of reviewed studies explored the extraction of multiple features to detect oil spills from SAR and optical images in the last decade. The following section describes the widely used handcrafted (shallow) features in oil spill detection, while the automatic extraction of deep features is discussed in Section 5.2.
Considering that SAR data are extensively used in oil spill studies, various feature categories are extracted and utilized to differentiate oil spills and lookalikes. Commonly used feature categories and standard associated features for oil detection from SAR images are listed in Table 3. The frequency of adopting feature categories in differentiating oil spills and lookalikes from SAR images are shown in Figure 4. Table 3. Common features extracted from SAR imagery.

Feature Category
Feature Description References

Spreading (S) S measures the ratio between an object's width and length
Shape factor Measure of an image object border smoothness [109][110][111]118,129,134,177,180,186,188] Hu moment invariant [189] Invariant moments used to characterize object patterns [110,113,114,132,139,180,190] Circularity Measure of an image object compactness [104,139,180,190,191] Perimeter to area ratio Ratio of the perimeter to the area ( P A ) [ [110,113,129,155,192,[195][196][197] Dissimilarity GLCM GLCM dissimilarity value computed for an image object

SAR polarimetric features
Entropy Polarimetric parameter used to measure the degree of randomness of the scattering mechanism [103,128,[198][199][200] Alpha angle Polarimetric parameter used to characterize the scattering mechanism of the reflection Degree of polarization Physical quantity that is used to characterize the polarized light's polarization degree

Conformity coefficient
Evaluates if surface scattering is the dominant among all the scattering mechanisms [198], and it can discriminate surface, double-bounce, and volume scattering [201] Correlation coefficient Measure that reflects the averaged phase difference among scattering coefficients in co-polarized phases (i.e., HH, VV) [202] [103, 119,198,200,203] Anisotropy Measures of the relative values of the second and third eigenvalues [204] Pedestal height Measure of the amount of the unpolarized backscattered energy [205] Standard deviation of CPD (Co-Polarized phase Difference) Standard deviation of CPD was introduced by [206] to differentiate oil and biogenic slicks

Contextual
Number of neighboring targets in the same image Number of adjacent targets to oil slicks in the same scene [115,185] Distance to ship/rig Distance from oil slick objects to ship, rig, and oil platforms in the surroundings [110,142] Mean wind speed Values of mean wind speed of image object [132,139]  Most oil spill studies combine various SAR features from different categories rather than rely on a single feature category only. Geometric and statistical features, followed by textural, SAR polarimetric, and contextual features, are frequently used to determine the contribution of each feature category. Several studies in which the feature selection process was conducted indicate that geometric features that are simple and easy to extract demonstrate higher discrimination power than other feature types [106,178]. For instance, spill released from a moving ship can appear in an SAR image as an elongated object of a particular width and length ( Figure 3d). The ratio of the width to the length of the spillage could be used as a discriminative feature to differentiate between oil spills and their lookalikes. Area, perimeter, complexity, and spreading features are geometric features used in 39-49% of the total number of reviewed studies describing handcrafted features. Given that oil spills and their lookalikes can appear in diverse shapes under different and even similar environmental conditions, geometric features are usually combined with other feature categories. Object standard deviation, object mean value, background standard deviation, and maximum contrast, for instance, are common statistical features utilized in over 27-43% of the total number of reviewed studies discussing handcrafted features. The values and thresholds of the statistical features of an oil slick object extracted from optical or SAR imagery (i.e., mean, maximum, and standard deviation of a spectral band or backscattering coefficient), may vary from one data source/event to another owing to differences in oil characteristics, environmental conditions, sensor types, and specifications (i.e., wavelength, polarization, incidence angle). The texture of oil spills is continuous, smooth, and delicate, while that of their lookalikes is scattered, rough, and continuous [197]. Most textural features utilized in oil spill detection are based on GLCM. Contrast, homogeneity, and entropy are common GLCM-based features employed in over 23% of the total number of studies that use handcrafted features. However, GLCM features are computationally intensive compared with other feature types. Different SAR polarizations can help differentiate special features of the target and are considered in different studies to discriminate between oil spills and lookalikes. More than 21% of the studies rely on variations of SAR polarimetric features. Entropy (H), alpha angle (α), degree of polarization (DoP), and conformity coefficient are commonly used polarimetric features. Contextual features, which include information on the distance from the oil spill to a possible source, such as ships or oil rigs, are the least used features among the five categories [110,111,207]. Other contextual features may include weather data, such as wind speed, water depth, upwelling, atmospheric conditions (rain, dense fog, and aerosols), eddies, river inflow, location and direction of oil and gas pipelines, platforms, and vessels [64].

Feature Selection Techniques
Determining the optimal set of features for oil spill classification is based on the experience of researchers [180]. This step can be subjective and case-specific because the degree of importance of features may vary based on the data source, nature of the oil spill, complexity of surrounding surface features. The use of an excessive number of features in a classification scheme may result in the introduction of redundant features, increased processing time, and reduced classification accuracy as well as influence the generalizability of the model [208,209]. Thus, feature selection (FS) techniques, which are dimensionality reduction strategies used to select relevant features and overcome these issues, are widely utilized as a critical step in classification tasks of remotely sensed data [210,211]. FS methods can typically be grouped into three main categories, namely, filter, wrapper, and embedded methods [212,213]. Filter methods utilize statistical measures (i.e., correlation coefficients and variance) to evaluate and rank features based on their degree of importance [213,214]. However, feature selection through filter-based methods does not involve the use of any classification algorithms. Different from the filter methods, the performance of a specific classification algorithm is used by wrapper methods to select relevant features that lead to the best classification result [215]. However, wrapper-based methods are computationally extensive and prone to overfitting, particularly when small training samples are used to train the adopted classification model [216]. Embedded methods are the trade-off strategies between the two methods that aim to optimize classification performance while decreasing the number of selected features [211,217]. The selection of the relevant subset is performed as part of the learning process of a classifier without an additional evaluation of the selected feature subset [218].
The utilization of FS techniques in oil spill detection systems is limited given the lack of systematic studies that focused on extracted features and their contribution to the classification results in oil spill detection and monitoring [180]. Only a few studies use various feature selection techniques to evaluate the effectiveness of different features and select optimal ones for oil spill detection. For instance, Mera et al. [180] employed filters and embedded methods, five feature selection methods, to choose a concise and relevant set of features for improving oil spill detection systems. Correlation-based feature selection, consistency-based filter, information gain, relief, and recursive feature elimination for support vector machines (SVMs) were applied on a 141-input vector comprising features from a collection of outstanding oil spill studies. The selection of relevant features expedited the feature extraction step without reducing classification accuracy. Chehresa et al. [110] used and evaluated eight different evolutionary algorithms (i.e., genetic algorithm, particle swarm optimization, and others) to select optimum feature subsets from 74 different types of features. High-frequency features with the highest number of repetitions among 30 independent repetitions of three evolutionary algorithms (genetic algorithm, fast, and classical evolutionary programming) were selected for classification as the optimum set of features.

Machine Learning
ML, a subset of artificial intelligence, refers to the ability of machines to learn and understand relationships between inputs and outputs from a full set of representative training samples, from which predictive and empirical classification models can be constructed without assuming any data distribution. ML can address specific issues even when the theoretical understanding of a particular problem remains inadequate regardless of the availability of a massive number of observations. Given the increasing availability of high-dimensional remotely sensed data and the complexity of pattern recognition tasks, ML techniques have been adopted for a full spectrum of the earth's observation applications such as oceanography [219][220][221], natural disasters [222][223][224][225] agriculture [226,227], land use [208,228,229], and environmental monitoring [230][231][232].
Various ML models have been proposed in the past decades to detect oil spills and distinguish between oil slicks and lookalikes, in which optical and SAR images are used to provide efficient monitoring solutions to mitigate the impact of oil spills. ML methods for oil spill detection are categorized in this review into traditional ML techniques and deep learning (DL) models. The succeeding subsections discuss and analyze various types of classical and advanced ML models for recognizing, identifying, and detecting oil spills obtained from remotely sensed data.

Artificial Neural Network
Inspired by the functionality of the biological nervous system, ANNs are computing systems that entail a set of algorithms working together to simulate the structure and functions of the human brain. The relationship between input parameters and their output responses are derived using highly interconnected processing units (artificial neurons), a training or learning algorithm, and activation functions [244,245]. The basic ANN topological structure consists of three layers, namely, input, hidden (may be more than one), and output layers. The training procedure of ANNs involves the determination and adjustment of associated weights on connections in three main stages; namely, the feedforward of input data, calculation of the associated error between the input and output, and adjustment of weights [246]. Upon the completion of the training and accuracy evaluation phases, the developed neural network model can predict the presence and absence of oil spills in unseen data with similar feature characteristics.
Feedforward ANNs with backpropagation optimization algorithm are widely used ML algorithms for oil spill classification and account for almost 27% of the reviewed literature in this work (a total of 29 studies). A key challenge in the utilization of ANN for oil spill classification is the determination of optimal combinations of ANN hyperparameters, such as the number of hidden units, batch size, training iterations, learning rate, and momentum, because poor choices may negatively impact its accuracy and computing performance. A trial-and-error strategy is commonly used to determine and evaluate the appropriateness of multiple combination sets of these parameters. For instance, Park et al. [52] implemented ANN architecture to classify oil spills from optical images with the following settings: 1000 epochs, learning rate of 0.01, and hidden layer of eight neurons. Chen et al. [103] implemented ANN to classify oil spills from SAR images with the following parameter settings: 100 epochs, learning rate of 1.0, and two hidden layers of eight and six neurons. Reported accuracies of ANN in oil spill studies range from 72% to 99%.

Support Vector Machine
SVM [247], a nonparametric supervised ML technique based on the principle of structural risk minimization from statistical learning theory, has been successfully used in a wide range of remote sensing applications. The popularity of adopting SVM in the field of oil spill classification (almost 17% of the reviewed literature use SVM) can be attributed to its ability in handling high-dimensional feature space and achieving satisfactory classification results with a limited number of training samples. SVMs specifically focus on samples that are adjacent to borders between classes in the feature space, which are called support vectors; these SVMs aim to determine the location of a separating hyperplane (decision boundary), which produces the optimal separation of classes to minimize misclassifications and achieve satisfactory generalization capability [248]. Earlier versions of SVM were originally developed for binary classification by identifying the optimal hyperplane in linearly separable cases; kernel tricks were then introduced to address this limitation by mapping the data into a high-dimensional feature space and constructing an optimized separating hyperplane that deals with nonlinear decision surfaces [249][250][251]. Various kernel functions, including linear, polynomial, sigmoid, and radial basis function (RBF) kernels, are used to reduce the computational cost of dealing with high-dimensional feature spaces [248]. RBF [22,112,114,115,155,185,198] and polynomial kernel [194] are commonly used kernels in oil spill studies. However, the selection of the appropriate kernel type and its parameter configuration should be considered when adopting SVM for oil spill classification. Reported accuracies of SVM in oil spill studies range from 71% to 97%.

Decision Tree
DT is a simple and straightforward nonparametric ML technique classifier that recursively divides the input dataset into branches of data subsets; each subset is described by a set of features, thresholds, and a class label [252,253]. Compared with ANN and SVM, DT can be trained and executed rapidly and analysts can easily interpret the output of the model. DT is widely used to aid the development of ruleset for the classification of remotely sensed data using object-based classification approach due to its ability to handle nonlinear relationships between features and feature values from different scales or range of values and classes. Topouzelis and Psyllos [177] highlighted that tree size plays a significant role, especially because the tree deals with two classes, namely, oil spills and lookalikes. A tree size that is neither excessively large nor excessively small will correctly represent feature vectors. Moreover, tree classifiers are very sensitive to small changes that occur in the training dataset; thus, the careful development of a training dataset is required to distinguish between oil spills and lookalikes successfully [129,177]. Compared with other traditional ML classifiers, DTs and fuzzy logic have been adopted in fewer oil spill studies (10% of the reviewed studies). The reported accuracies of DT and fuzzy logic in oil spill studies range from 80% to 96%.
Differences in oil spill classification results reported in the literature are likely due to various factors affecting the performance of ML classifiers, including variations in the data source, data preprocessing techniques, training sample size, number and quality of selected features, and choice of classification algorithms and their parameter settings. A study is considered case-specific when the performance of ML models for oil spill detection is compared without unifying all possible influencing factors. Thus, comparing the performance of traditional ML algorithms in terms of their accuracy for oil spill detection using different datasets is problematic [104]. Several studies have compared the performance of numerous traditional ML models for oil spill detection using the same data source [115,139,198]. Zhang et al. [198], for example, compared three widely used supervised classifiers (i.e., ANN, SVM, ML) for oil spill classification using complete and compact polarimetric SAR images. SVM, followed by ANN, outperformed ML when sufficient polarimetric information (i.e., quad polarization) was obtained. Mera et al. [139] studied the performance of 428 classifiers of 41 families, including ensembles, SVM, ANN, Bayesian, DT, RF, and many others, for oil spill detection using 47 ENVISAT ASAR images. The group's experiments showed that the best classification accuracies are achieved by the rotation forest ensemble of multilayer perceptron base classifiers. Yang et al. [53] evaluated the performance of ANN, SVM, MD, and ML for oil spill extraction from GF-1 images using an object-based approach. The results of single classifiers demonstrated that ANN and SVM are superior to other classifiers, while the results of multiple classifiers (decision level fusion) revealed that the classification accuracy of SVM-ANN is slightly higher than that of ANN.

Deep Learning Techniques
Inspired by the structure and function of the human brain, DL algorithms are a series of distinct deep neural networks (DNNs) that automatically learn complex discriminative features from considerably large amounts of data in a hierarchical manner to extract information through multiple high-level layers of abstractions [254,255], demonstrate remarkable capabilities, and achieve remarkable success in various fields of remote sensing and geoscience. Unlike traditional ML approaches, DL is completely data-driven, in which natural relationships between input and output data are automatically constructed and feature representation characteristics are solely learned from the data [256]. Therefore, the feature extraction step dependent on expert knowledge in constructing handcrafted features prior to the classification phase of the oil spill is eliminated. Various DL models exhibited outstanding performance in detecting oil spills from SAR and optical images through the automatic extraction of discriminative learned features to distinguish between oil spills and lookalikes. Moreover, the generalization ability of these models can address the case-specific nature of traditional techniques. The number of oil spill-related studies that adopted DL models has increased since 2017. These models were used to perform diverse tasks, such as oil spill detection and recognition [257,258], image patch-based classification [83,103,197], and semantic segmentation [259][260][261][262][263][264][265]. Depending on the neural network architecture, DL models can vary in terms of their architecture, components, and tasks, which can consist of convolution layers, activation functions, pooling layers, fully connected layers, memory cells, gates, encoder/decoder, and others [266]. Commonly used DL models include convolutional neural network (CNN), autoencoder (AE), recurrent neural network (RNN), deep belief network (DBN), and generative adversarial network (GAN). Table 5 lists the different DL models adopted in oil spill studies for performing various tasks. The following subsections review some of these models in the context of oil spill identification and detection.

Convolutional Neural Network
CNNs [267] are widely used DL techniques in image recognition due to their weight-sharing network structure, which allows the direct feed of images into the deep network [258]. The underlying architecture of CNNs consists of a set of convolutional layers, activation functions, pooling layers, and fully connected layers ( Figure 6). CNNs are typical feedforward DNN architectures that can learn highly abstract features from original representations of images through a set of convolutions and mathematical operations, which preserve the spatial relationship among pixels and reduce the effective number of learning parameters [268]. Convolutional layers are used to perform feature extraction by utilizing several learnable convolutional kernels or filters on a small area of the input data based on the kernel size. The result of each convolutional function undergoes nonlinear transformation via an activation function (e.g., rectified linear unit, sigmoid, hyperbolic tangent, and softmax) to obtain nonlinear convolved features or so-called feature maps (i.e., multiple maps of neurons) and increase the nonlinear fitting capability of CNN.  Feature maps and the input image have similar output sizes. The dimensionality of feature maps is reduced by subsequent pooling layers (e.g., max and average pooling layers) to acquire features but insensitive to precise locations of targets, thereby ensuring that effective features can still be learned by the network [281]. Therefore, feature maps are generally downsampled by half using pooling layers to increase the abstraction of extracted features and reduce the input dimensionality of feature maps while maintaining the depth to minimize the computational power and circumvent overfitting by cutting the number of learnable parameters [282]. Additional pooling operations, such as stochastic [283], spatial pyramid [284,285], and atrous spatial pyramid pooling [286], are also used by several studies. Feature maps extracted through convolutional and pooling layers are transferred into a one-dimensional vector by flattening the layer that connects outputs of CNN layers with fully connected layers.
A fully connected layer, which is mounted at the top of the architecture, is composed of multiple hidden layers and computes the score of each class using convolved features from former layers. The output of fully connected layers is known as the classification layer. The classification results are derived through activation functions, such as sigmoid (for binary classification, which involves indicating the existence or absence of oil spill) and softmax functions (for assigning the probability that belongs to multiple classes, such as oil spills, ships, and lookalikes). Forward and backpropagation are two primary processes for training and learning weights of parameters between the input and output of the network. Forward propagation involves the transmission of characteristic information and optimization of weights of learnable parameters of the network through iterative backpropagation processes to minimize the value of a defined cost function [261].
CNNs are widely used as DL models in oil spill detection due to their outstanding performance and versatility in object detection (e.g., a label and a bounding box are produced from given image tiles to show the oil spill location in each image), image classification (e.g., a label from the given image tiles is used to indicate the content of each tile), and semantic segmentation (e.g., a segmentation or probability map is created for predefined classes from given image tiles). Different tasks and architectures of CNN for oil spill detection are discussed in the following subsections.

Patch-Based Image Classification
Patch-based CNN models are constructed based on equally spaced selected tiles (patches) from remotely sensed images, and each tile corresponds to one class label. The CNN model can be designed to produce a probability map of each input patch label for every tile to indicate the probability of the presence or absence of an oil spill in an image tile. The determination of the optimum size of the input patch for CNN classification can be considered a critical factor. For example, the selection of a small input image patch size may hinder the CNN model from learning discriminative features, whereas the selection of a large one adds computational burden on the network and increases the overall processing time [197]. Various CNN structures were used to classify oil spills from patches of remotely sensed data. Yaohua et al. [258] presented a densely connected CNN network structure based on DenseNet to recognize oil slicks from lookalikes in ERS-2 SAR data of the China Sea. A total of 148 images that represent 86 oil slick and 62 lookalike samples were selected from ERS-2 data to develop the DenseNet model. Considering the limited number of samples, the mixing of the oil and oil-like slicks was ignored. Zeng and Wang [270] developed a deep oil spill CNN based on the VGG-16 model by designing and adjusting the CNN architecture and hyperparameters using a large dataset comprising SAR dark patches. A total of 4843 oil slick and 18,925 lookalike samples were generated through manual labeling and data augmentation techniques and subsequently utilized to develop the model.
The incorporation of parametric SAR and optical features to improve the performance of patch-based classification was investigated by different researchers. For instance, Guo et al. [197] utilized SAR polarimetric features (i.e., entropy, alpha, and single-bounce eigenvalue relative difference) extracted from C-band SAR data to develop a CNN that can differentiate crude oil, plant oil, and oil emulsion. The CNN model was trained using 5400 samples and achieved a recognition rate of 91.33%. Song et al. [271] extracted deep features of SAR polarimetric data using a CNN model, which was accompanied by dimensionality reduction through principal component analysis and followed by an SVM classifier with radial basis kernel to identify oil spills. Liu et al. [22] extracted spectral indices from AVIRIS hyperspectral images and used a one-dimensional CNN to mine spectral feature information deeply and extract oil films automatically. The CNN model outperformed traditional ML models, such as SVM and RF.
Object Detection CNN-based object detection techniques contain a two-stage mechanism, where shared discriminative feature maps are initially extracted using CNNs and candidate region proposals are subsequently generated to localize object(s) within an image and output corresponding categories [287]. Various generic object detection techniques based on deep CNNs (DCNNs) can detect, localize, and predict the label of the target to deliver state-of-the-art detection performance [288]. Faster region-CNN (R-CNN) [289], mask region-based CNN (mask R-CNN) [290], you only look once [291], and single-shot multibox detector [292] are examples of models that achieve satisfactory performance in object detection. Previous studies adopted DCNNs to perform object detection of targets in water surfaces for detecting ships [293][294][295][296]. Few studies focused on investigating oil spills. Nieto-Hidalgo et al. [272] presented a system for detecting ships and oil spills from SLAR images using a two-stage CNN. Huang et al. [273] applied faster R-CNN to locate and classify the spill of floating hazardous and noxious substances from optical images. Jiao et al. [257] constructed and optimized a DCNN model using faster R-CNN on the basis of a pretrained network on ImageNet to detect oil spills on lands using unmanned aerial vehicle-based data. The results showed that the cost of inspecting oil spills reduces by 57.2% compared with the cost incurred in the traditional manual inspection process.

Semantic and Instance Segmentation
Image segmentation, based on pixel-wise classification using DNNs, can be categorized into semantic and instance segmentation [297]. Semantic segmentation, a widely used concept in computer vision that has the same meaning as per-pixel classification used among remote sensing communities, conducts pixel-level classification to assign a category to every pixel in a remotely sensed image. Sea surface areas, ships, and oil spill areas can be accurately classified through semantic segmentation, which can also provide comprehensive knowledge of an image [259]. Unlike patch-based and object detection methods, semantic segmentation accurately delineates the boundary and position of the target of interest and renders it suitable for processing remote sensing images [298]. Numerous semantic segmentation models, including fully connected network [299], fully connected DenseNet [300], U-Net [301], pyramid scene parsing network [302], SegNet [303], RefineNet [304], pyramid attention network [305], DeepLab series [286,306], and discriminative feature network [307], were proposed and adopted in the field of computer vision.
By comparison, instance segmentation models are hybrid approaches that incorporate semantic segmentation and object detection algorithms to localize objects and deliver their per-pixel classification simultaneously. Yekeen et al. [277] recently developed an instance segmentation mask R-CNN model to localize and segment oil spills and different elements within the surrounding areas of oil spill incidents. The developed model combines the feature pyramid network architecture (used for extraction features at different scales) and transfer learning approach through the pretrained ResNet 101 on COCO datasets. The performance of the mask R-CNN and approaches that utilized a set of classical ML [106,109,139,180,185,308] and DL models [260,270,278] in other studies was compared. The reported results indicated that mask R-CNN outperformed other CNN models in the literature. However, a comprehensive evaluation on the performance of diverse CNNs and the influence of various factors, such as the number of samples, hyperparameter settings, optimization algorithms, and transfer learning on the performance of oil spill detection are lacking.

Autoencoder
Hinton and Zemel [309] introduced AE, which is a feedforward neural network trained to transform its inputs into outputs. This transformation can be achieved through the utilization of an encoder-decoder structure in an unsupervised manner. Although the encoding step transforms characteristics of input data into a low-dimensional space, the decoder step takes the top extracted representative characteristics as the input and attempts to reconstruct such an input. AE aims to set target values to be as close to the original input as possible [310]. This outcome is achieved by adjusting parameters of the network and consistently comparing the input and output through backpropagation until a minimal amount of discrepancies between the input and output is achieved. Several AE architectures, such as multilayer, stacked (SAE), sparse, denoising, adversarial, variational, convolutional, and vanilla AEs, have been proposed to solve different types of problems. Additional information on different types of AEs are discussed in [310,311].
Motivated by the lack of studies using DL in the feature optimization for oil spill detection, Chen et al. [103] utilized SAEs and DBNs to reduce the dimensionality, optimized SAR polarimetric features in an unsupervised manner, and used them as input in a supervised classification procedure to classify marine oil spills and biogenic lookalikes. SAE and DBNs successfully boosted and achieved more accurate classification result using a given limited number of samples compared with classical algorithms. Liu et al. [83] used hyperspectral data and proposed a spatial-spectral jointed SAE (SSAE) to extract and classify oil slicks on the sea surface. The performance of the proposed SSAE was compared with the results of SAE, SVM, and BPNN algorithms (multilayer feedforward network trained according to error backpropagation); the results indicated that the proposed model remarkably outperformed other models. Two recent studies [263,278] adopted two different AE architectures to segment oil spills from an airborne SLAR dataset. Gallego et al. [278] utilized SelAE with very deep residual encoder-decoder networks to segment and identify oil spills from the SLAR data set. Bazine et al. [263] developed a selectional AE with convolutional long short-term memory to segment oil spills and other maritime classes (ship, lookalike, coast, central noise, lateral turns, and water) from scanlines of SLAR airborne images simultaneously.

Other Deep Learning Models
CNNs, followed by AEs, are commonly used DL models for identifying oil spills from remotely sensed data; however, only a few studies have adopted other DL models, such as DBN, RNN, and generative adversarial network (GAN). Chen and Guo [195] proposed a DBN model to distinguish oil spills, lookalikes, and water in three SAR images from a small sample space database. Chen et al. [103] analyzed and compared the performance of SAE, DBN, and several classical algorithms to identify the presence of oil spills from a limited number of samples. The performance of both DBN and SAE achieved better performance than classical ML algorithms.
A DL model based on RNN, which is a network where connections form directed cycles designed for processing sequential data, was presented in [279] to identify candidate oil spills from SLAR scanned sequences rapidly. The RNN model achieved better performance compared with a multilayer perceptron neural network. An encoder-decoder structure-based adversarial learning of f-divergence minimization function was introduced in [280] to segment oil spills from SAR images automatically. Different forms of deep networks are structured to produce a segmented instance of the input image via a generator initially and minimize the f-divergence between ground truth and the generated segmentation result by a regressor. One advantage of this model is the ability to segment irregular oil spills even in extremely noisy conditions given the comprehensiveness of the f-divergence and its capability to address rigorous situations. However, this method is limited to one-class segmentation (i.e., oil spill) without completely maximizing the pixel-wise classification delivered by semantic segmentation methods.

Discussion and Conclusions
Oil spills on seas and oceans, a major source of maritime and ocean pollution due to anthropogenic activities and the growing demand for oil and maritime transport capacity, pose a deleterious effect on aquatic and wildlife, maritime tourism, aquaculture, and commerce. Constant monitoring and early intervention of oil spills are crucial and urgently needed to minimize their environmental impacts and economies of coastal states. The capability of monitoring, detecting, and managing oil spills remotely is vital due to persistent dangers posed to marine biodiversity, wildlife, and habitats. The past decade has shown remarkable advances in the field of oil spill detection due to the increasing availability of remotely sensed data, growth of computation power, availability of cloud computing infrastructure, and development and adoption of state-of-the-art ML algorithms.
Satellite and airborne remote sensing techniques have been extensively used to detect, monitor, characterize, and estimate the thickness of oil spills. These techniques include the use of visible and infrared multispectral, hyperspectral, thermal, microwave, and laser fluorosensors. Oils in seas and oceans exhibit different characteristics in various wavelengths across the spectrum. Figure 2a,b and Figure 3a,b show the differences in the appearance and information of an oil spillage event captured by Sentinel multispectral and SAR sensors near the coast of Mauritius and Kuwait. Microwave satellite-based SAR data are widely used data source compared with other sources for oil spill detection due to their sunlight independence, cloud coverage, and availability in all types of weather conditions. The use of satellite-based multispectral data is increasing, owing to their growing availability, synoptic coverage, and unique spectral characteristics. These optical features assist the differentiation between oil spills and lookalikes. The utilization of other sensors-such as ultraviolet and laser fluorosensors-in oil spill detection systems remains limited. Each remote sensing technique has its own advantages and shortcomings (i.e., data cloud contamination and presence of shadow in optical data). Leveraging multisource data can provide valuable information, fill the temporal gap, and enable timely and effective oil spill monitoring and management.
Similarities between oil spills and other natural or manmade regions (lookalikes) in optical and SAR images affect the accuracy of oil spill detection systems. The inclusion and combinations of various feature categories (i.e., statistical, geometric, texture, and SAR polarimetric features) with robust discriminatory power help discriminate between oil spills and counterparts and improve the accuracy of oil spill classification models. A wide range of oil spill studies rely on the manual extraction and incorporation of different feature categories based on analysts' experience. However, few studies utilize feature selection techniques (i.e., filter and embedded methods) to select remarkable features with high discriminative power for improving the classification accuracy and generalization ability of oil spill classification systems. Considerable efforts should be exerted in evaluating the efficiency of extracted features from optical and SAR images and their contribution to the classification results in oil spill classification systems.
The acquisition and selection of adequate and high-quality representative training samples (ground truth samples) are critical factors that control the performance of classification algorithms. Maxwell et al. [312] argued that information regarding the minimum number of samples required by ML classifiers is still unknown. Ample high-quality representative training samples of oil spills and lookalikes are essential in selecting discriminatory features and developing accurate and reliable classification systems. The collection of accurately labeled oil spill samples is a challenging task that requires meticulous attention considering similarities between natural phenomena known as lookalikes, which produce a signal similar to oil spills. In some circumstances, a human expert may have difficulty in determining whether dark regions on the image are oil spills or lookalikes. These uncertainties may result in the introduction of some false positive and negative errors in the process.
Moreover, oil spill incidents spatially cover a small percentage of the entire data and training datasets are collected from multiple time-series images acquired at different locations with varied oil characteristics; thus, expected dissimilarities between samples may ultimately affect the training and generalization capability of the developed classification approach [14,313]. The scarcity of in situ oil spill data, uncertainties encountered during the selection of oil spill samples from SAR images, presence of class imbalance between oil spills, and lookalikes are some challenges that may affect the development of accurate ML classification models with high generalization capabilities.
Various classical and advanced ML models have been adopted in the past decade for oil spill detection and classification. Different classical ML models were used in 72% of the reviewed studies. The generic framework for developing automatic oil spill detection systems from SAR images using traditional ML models may include preprocessing of remotely sensed data, identification/segmentation of dark spots, extraction of discriminative features, and classification of image pixels/objects with various classification models. ANN, SVM, DT and fuzzy logic are among the widely used classical ML models for oil spill detection. However, the literature presented varied classification results for these methods. A thorough comparison between various classical ML classifiers under the same data source, preprocessing techniques, size of training samples, number and quality of selected features, and choice of parameter settings of classification algorithms is ideal.
The potential for overfitting occurs when a classifier achieves high accuracy on a dataset while failing to generalize well on unseen data. This condition is a common concern when ML models are developed with limited training samples. Thus, evaluating the performance of a classifier on a new dataset that is not used during the training phase is imperative.
Versatile DL models (accounting for 28% of the reviewed literature) recently demonstrated remarkable success in detecting oil spills from SAR and optical images by automatically extracting discriminative features to differentiate among oil spills, lookalikes, and other targets. The generalization capability of DL models addresses the case-specific nature of classical ML techniques. DL algorithms with different architectures were used to perform diverse tasks, including object detection, patch-based classification, and semantic and instance segmentation of oil spills. CNNs, AEs, RNNs, DBNs, and GANs are commonly used DL models for oil spill classification. CNNs and AEs are utilized more than other DL models for oil spill detection and segmentation. Although the adoption of various DL models and architectures for the identification and detection of oil spills/slicks has increased and achieved promising results, the following challenges still exist:

•
The process of preparing considerable amounts of labeled data to train a DL model is a laborious and time-consuming task. Given the similarities between oil spills and lookalikes (i.e., dark spots created by natural phenomena, such as regions with low wind speed, wave shadows, and biogenic slicks/films) in SAR images, the process of defining training samples is challenging and susceptible to human errors.

•
The limitation or absence of accessible open-source annotated datasets compromise oil spill/slick images collected from various multisensory sources at different locations with diverse environmental variations and oil characteristics.

•
The fine-tuning of DL model hyperparameters (i.e., number of filters, batch size, learning rate, momentum, weight decay, and others) requires an extensive trial-and-error experimentation to determine optimum configurations of parameters. A wide variety of hyperparameters should be considered and investigated for practical use.

•
A thorough investigation on the performance and generalizability of DL models to detect the presence of oil spills from unseen datasets collected from different environments in the literature is lacking. • A detailed classification of oil spills/slicks-including oil type, thickness, or other chemical properties-via DL models is lacking in the literature.
Considering the continuous development in remote sensing technologies, cloud computing services, and computer vision along with the increasing accessibility of publicly annotated remotely sensed data (i.e., SpaceNet [314], CleanSeaNet service [315],), the aforementioned challenges can be mitigated in the future. The development of real-time monitoring and detection systems of oil spills with unmanned aerial vehicles (UAVs) is inevitable due to the miniaturization of sensor technologies and advances in UAV technology.