1. Introduction
An oil spill is an environmental threat that has adverse effects on bodies of water, land, and air [
1]. Further, it can cause pollution to sea surfaces and harm bird species, fish, and other aquatic creatures. They are primarily caused by accidents involving oil tankers, ships, and pipelines where crude oil, gasoline, fuel, and oil by-products are released into the water. Removal of oil slicks is crucial to maintain a safe and clean environment and protect aquatic life.
Synthetic Aperture Radar (SAR) is usually mounted on aircraft or satellites to obtain images for sea and land surfaces [
1]. Sensors deployed by the SAR send radio waves that are then reflected off the surfaces, allowing for a visual representation of the target surface. Captured SAR images may include sea, land surfaces, oil spills, ships, and look-alikes. Look-alikes may represent a vast range of environmental phenomena, including low-speed wind areas, sea wave shadows, and grease ice. Radio waves reflected by oil spills or look-alikes are represented as dark or black spots in SAR images, identifying oil spills and discrimination from other look-alikes a challenge.
Several approaches were recently investigated to classify and segment oil spills using deep learning and neural networks classifiers [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28]. Although most of these approaches are promising, they rely on handcrafted feature extraction (i.e., semi-automatic classification) or the classification of oil spills with modest accuracy or a Jaccard score.
In this study, we introduce a two-stage convolutional network to classify and segment images with oil spills. The first stage is realized using a novel 23-layer Convolutional Neural Network (CNN) that can classify patches into less than 1% oil spill patches and more than 1% oil spill patches with a five-fold well as ten-fold accuracy, sensitivity, and specificity of almost 99%. The second stage takes the patches with significant oil spill presences. Oil spill instances constitute more than 1% of the entire patch resolution and segment them using a five-stage U-Net with an optimized generalized Dice loss. The proposed framework accurately detected oil spill pixels with an accuracy of 92%, a precision of 84%, and a Dice score of 80%. The proposed study provides improved precision and Dice score compared to a state-of-the-art architecture.
The remainder of this paper is organized as follows.
Section 2 provides a review of the related work followed by a description of the used dataset and proposed approach in
Section 3. The experimental study and results are discussed in
Section 4. Finally, the conclusion and summary are presented in
Section 5.
2. Related Work
Several works have adopted semantic segmentation and deep convolutional neural networks to detect the oil slicks on the sea surface [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28]. Del Frate et al. introduced a multilayer perceptron with 11-8-8-1 topology that classifies instances into oil spills and look-alikes based on a features vector extracted from the dataset after specifying the area of interest is usually a dark spot [
1]. An overall of 18% of oil spill instances and 10% of look-alikes were misclassified. De Souza et al. introduced a similar approach with an area of interest specified by a human operator [
2]. Stathakis et al. introduced a feature selection approach based on a genetic algorithm that searches for the best set of features and the number of activation maps generated by the hidden layers of a neural network classifier [
3]. The proposed approach provided a classification accuracy of 84.8% with an improvement of 7.6% over the standard approach, which uses all features as inputs for the neural network classifier.
Topouzelis et al. introduced two artificial neural networks for oil spill detection [
4]. The first network with 1-3-1 topology detects dark formation with an accuracy of 94%. Further, a set of ten features were extracted from the dark formation, and a fully connected multilayer perceptron with 10-51-2 topology was applied on the features obtained, providing an accuracy of 89% for discriminating oil spills from look-alikes. Singha et al. also proposed two neural networks in a sequence to detect oil spills [
5]. The first network with 3-6-2 topology is used to segment the SAR images to a dark formation or background, while a second neural network with 14-14-5-1 topology is used to identify a pixel as an oil spill or look-alike. The proposed network correctly classified 91.6% of oil spill pixels and 98.3% of look-alike pixels.
Song et al. applied an optimized wavelet neural network on a selected set of fully polarimetric features extracted from SAR images [
6] with an overall accuracy of 96.55% and 97.67% on two different datasets. Chen et al. introduced Stacked Autoencoder (SAE) and deep belief networks on polarimetric SAR features to detect oil spill instances [
7]. The use of the aforementioned deep-learning approaches was shown to be promising as compared to Support Vector Machine (SVM) and typical artificial neural networks with a testing error below 1.4% when the SAE was used. Gallego et al. introduced the use of deep autoencoders to address the oil spill segmentation problem [
8,
13]. The authors used two deep autoencoders with 4 and 6 layers, respectively, on Side-Looking Airborne Radar images [
8]. Five-fold cross-validation was used, with 80% of the samples used for training and 20% used for testing. Authors claimed that one of the deep autoencoders achieved a Jaccard score of 1 and
f-1 score of 0.93, while the other achieved a Jaccard score of 0.92 and
f-1 score of 0.89.
Orfanidis et al. introduced the use of DeepLab model with ResNet 101 as a base semantic segmentation network [
9] for pixel-wise classification into the oil spill, look-alike, and background. The authors applied localization, cropping, radiometric calibration, speckle filtering, and linear transformation from dB to actual luminosity values prior to segmentation. The model introduced a Jaccard score of 0.41 for oil spills. A patch image segmentation was introduced where patches were picked under certain constraints elevating the Jaccard score for oil spill patches to 0.86 for 3 × 3 patches and 0.89 for 5 × 3 patches. Hidalgo et al. introduced a two-stage CNN for coarse detection of ships, oil spills, and coasts and precise detection at the pixel level [
10] with the highest accuracy, precision, recall, and
f-1 score of 99%, 65%, 86.8%, and 71%, respectively.
Yu et al. introduced a deep neural network that minimizes the
f-divergence between the ground truth and predicted segmentation masks [
11]. For the two oil spill region cases, the proposed approach provided a 15% accuracy increase and a 25% Region Fitting Error (RFE) decrease for the Gamma noise concerning the Generative Adversarial Network (GAN). Further, for the three oil spill region cases, the proposed method outperformed GAN in terms of RFE, with decreases in the range of 25.5% to 33% for Gamma noise, the range of 46.5% to 50.7% for Rayleigh noise, and the range of 49% to 51% for log-normal noise.
Liping et al. investigated and provided long-term prediction for the drifting path of the oil-contaminated water due to a collision of the Sanchi oil tanker in the East China Sea using the oceanic surface wave-tide-circulation coupled forecasting system (OFS) developed by the First Institute of Oceanography, State Oceanic Administration (SOA), China [
12]. Further, Qia et al. proposed a 3D model for short-term and long-term oil spill paths caused by the Sanchi tanker [
19]. Guo et al. introduced the use of SegNet to segment oil spills represented as dark spots in SAR images [
14] with an accuracy of 93% under high noise. Further, Li et al. introduced the use of polarimetric SAR filters (e.g., Boxcar, Refined Lee, and Lopez filters) to extract respective polarimetric SAR features and feed the features to a stacked autoencoder [
15]. Authors were able to distinguish between crude oil, biogenic slicks, and clean seawater using the Lopez filter and autoencoder with an overall accuracy of 99.38%.
Jiao et al. introduced a pre-trained deep convolutional neural network based on VGG-16 for classifying oil spill instances and the Otsu algorithm to reduce the false positive rate and Maximally Stable Extremal Regions (MSER) algorithm for locating the oil spill by generating a detection box [
16]. The VGG-16 achieved a recall of 99.5% and an
f-measure of 98.5%. The use of the Otsu algorithm improved the precision from 97.7% to 98.3%, while the oil spill was marked using the MSER method at a proper threshold setting. Zhu et al. experimented with SVM, fully connected neural networks, SAE, and CNN on hyperspectral remote sensing images for oil film thickness classification [
17]. The CNN provided better accuracy and performance as compared with SAE by almost 5% improvement. Krestenitis et al. generated a pre-processed SAR images dataset acquired by the Sentinel-1 satellites of the European Space Agency (ESA) [
18]. Further, UNet, LinkNet, PSPNet, DeepLabV2, and DeepLabV3+ with MobileNetV2 as a base network were trained on the dataset and provided modest Jaccard scores for the oil spill (i.e., 0.54, 0.52, 0.4, 0.26, and 0.53 respectively).
Yang et al. proposed a seven-layer CNN applied on features extracted at various scales using the Wavelet transform of ALSA+ hyperspectral remote sensing images [
20]. The proposed approach achieved an accuracy above 85%. Park et al. introduced the use of artificial neural networks to detect oil leaks in optical PlanetScope satellite images acquired close to Ras Al Zour town in Kuwait [
21]. Sun glint effects and dust were subtracted from the images and then provided to an artificial neural network to classify the target pixels into three types of oil leaks and sea surfaces with an overall accuracy of 82%. Li et al. introduced a one-dimensional CNN that classifies the oil film based on the detection of the associated spectral bands with an overall accuracy of 83% [
22]. Zeng et al. introduced an oil spill CNN for oil spill detection on Spaceborne SAR images [
23]. The proposed approach is based on VGG-16 applied on dark patches obtained by a dark patch generation algorithm from SAR images. The proposed network achieved an accuracy of 94%, a recall of 83.5%, a precision of 85.7%, and an
f-measure of 84.6%.
Yekeen et al. [
24,
25] introduced the use of mask-region-based CNN to distinguish between ships, oil spills, and look-alikes where pre-trained ResNet 101 and feature pyramid network were used for feature extraction, regional proposal network was deployed for the region of interest extraction, and the mask-region-based CNN was used for semantic segmentation. The proposed model introduced a classification accuracy of 96%, and 92% for oil spills and look-alikes, respectively. Bianchi et al. proposed an oil-based fully convolutional network where the weighted cross-entropy loss is minimized in order to segment, detect, and categorize oil spills into one of 12 categories [
26]. Zhang et al. introduced a semantic segmentation approach based on CNN, which receives different polarized parameters generated via four channels processed by Lee Refined filter and Simple Linear Iterative Clustering (SLIC) superpixel [
27]. The proposed approach attained a mean intersection of 90.5% when the Yamaguchi parameters were extracted as feature sets. Further, Baek et al. addressed the problem of detecting oil spills dual-polarized Terra-SAR-X images using artificial and convolutional neural network regression models with
f1-score of 0.83 and 0.82, and AUC of 0.986 and 0.987, respectively [
28].
4. Results
A total of 210 images with a resolution of 1250 × 650 × 3 were split into patches of 64 × 64 × 3 by cropping and scanning the images using a kernel with a dimension of 64 × 64 × 3 and a stride of 1. Further, the generated patches are screened for patches with less than 1% oil spill pixels and patches with more than 1% oil spill. We thus allow for the identification of patches with significant oil spill distribution. A total of 199,990 patches were selected, with half the patches having less than 1% oil spill and the other half with more than 1% oil spill. The proposed novel 23-Layer CNN model, as described in
Table 1, was trained and validated on the patched dataset via a five-fold and ten-fold cross-validation. Model hyperparameters were selected based upon continuous monitoring for the training loss and training accuracy to avoid model overfitting. The maximum number of training epochs is 40, the learning rate used is 0.00005, and the batch size is 500 patches.
Table 3 shows the accuracy, weighted Kappa score, sensitivity, specificity, and AUC for a five-fold and ten-fold cross-validation model. Both the mean and standard deviation of the performance mentioned above measures were calculated.
Based on
Table 3, it is evident that the proposed CNN model possesses a superior performance with a validation accuracy, sensitivity, specificity, and a weighted kappa score of 99%. We thus ensure that all patches with high oil distributions are detected to be further segmented using the next stage U-Net.
Patches with more than 1% oil spill are provided to the five-stage U-Net as described in
Figure 4 to provide a segmentation mask. To train the U-Net and avoid pixel-wise classification bias towards the most popular class (i.e., background), we have created patches of 64 × 64 × 3 with 40–60% oil spill pixels to maintain balanced pixel-wise classification. As a result, 215,394 patches were extracted from 168 images (80% of the dataset) to be deployed in the U-Net model training. In comparison, testing is done on the 684 patches of the remaining 42 images (i.e., 20% of the dataset). There is no overlap between both the training and testing datasets.
Table 4 and
Table 5 show the results of the semantic segmentation of the training and testing patches, respectively.
Based on
Table 5, the proposed framework produced segmenting patches of 64 × 64 × 3 with an accuracy of 92%, precision of 84%, and Dice score of 80%. By comparing both
Table 4 and
Table 5, the testing performance results are quite close to the training performance measures indicating a robust generalized model with minimum overfitting. We have also compared the proposed framework with semantic segmentation models directly applied on the dataset where five-stage U-Net, and SegNet [
34] with VGG-19 encoder were trained on patches with 40–60% oil spill pixel concentrations with GDL minimized.
Table 6 presents the performance measures of the proposed framework and the aforementioned semantic segmentation methods.
As shown in
Table 6, although the accuracy and recall of the U-Net, and SegNet were improved as compared with the proposed framework, which is mainly due to the use of GDL loss minimization to update the model parameters, the false positive rate was high, leading to a low precision as well as Dice loss. In addition, we have varied the patch size (i.e., Patch size of 32 × 32, and Patch size of 128 × 128 instead of 64 × 64), and the loss function (i.e., Recall loss, and Jaccard loss [
32] where the model sensitivity loss and Intersection over union loss respectively were minimized instead of minimizing the GDL)) as shown in
Table 7.
It is obvious from
Table 7 that GDL minimization provides a slight improvement over recall loss optimization in terms of precision and Dice scores. However, it provides the same performance as when the Jaccard score is minimized. This is due to the fact that both Jaccard and Dice scores are related. Also, from
Table 7, the patch size of 64 × 64 pixels provided by the proposed patch generator is considered an optimum size offering higher precision and Dice score as compared with patches with 32 × 32, and 128 × 128 pixels. It is also observed that as the patch size decreases, the sensitivity of the model is enhanced. This can be well explained due to the less deviation between the number of pixels representing the two different classes when smaller patches are used.
Furthermore, the proposed model offers significantly better precision and Dice score than the two-stage convolutional neural network introduced by Hidalgo et al. [
10]. It also generated more accurate predictions rather than a mere use of a segmentation network to predict the oil spill mask in a severely imbalanced dataset as studied by Krestenitis et al. [
18]. The proposed model’s performance is comparable to that of Zeng et al. [
23] in both precision and Dice score.
Table 8 highlights the difference between the proposed framework and work introduced in [
10,
23].
Figure 5 shows examples of predicted labels using the proposed deep learning framework under different class imbalance conditions and their corresponding ground truth masks. As shown in
Figure 5, the proposed model was able to detect oil spills in patches with relatively balanced pixel labels. Simultaneously, it could not accurately segment patches with relatively higher background distribution as in
Figure 5e or relatively higher oil spill distribution as shown in
Figure 5g.
5. Discussion
In this study, we have addressed the challenge of detecting and segmentation irregular-sized oil spill instances that constitute a small portion of low-resolution spaceborne SAR images using deep-learning structures. Semantic segmentation using fully convolutional, U-Net, and other segmentation architecture has failed to accurately detect the spills described by Krestenitis et al. [
18], where the highest Jaccard score of 0.54 was obtained by the mere application of the U-Net on the SAR images dataset. To obtain an accurate segmentation of the oil spill instances, a balance between the oil spill and background classes may not be feasible. Yet, most of the successful solutions introduced by the related work [
1,
2,
3,
4,
5,
6], which achieved an oil spill detection accuracy of over 92%. This relied on the extraction of specific handcrafted features from SAR images such as object standard deviation, object power to mean ratio, and background standard deviation.
Recent deep-learning-based methods [
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28] that used CNN structures for both automated feature extraction, as well as classification of SAR images have relied on the use of patches to reduce the background concentration in the tested images. Pre-trained models, such as ResNet 101, VGG-16, and GAN networks as in [
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28] or multi-level CNN networks as in [
9], were introduced to classify patches with modest performance (i.e., precision and Dice score). In [
23], authors obtained improved results via introducing the use of VGG-16 and dark batch generation algorithm on spaceborne SAR images. However, we believe that the direct application of deep-learning structures may not be sufficient for accurate and sensitive predictions due to the unbalanced nature of pixel-wise classes even when pre-processing and image enhancement are adopted. This was also experimentally shown in this study, where the direct use of U-Net and SegNet models provided poor precision as well as Dice scores.
Our approach relied on patch generation from SAR images with an emphasis on creating balanced patches and reducing bias towards the background class. We have noticed that by discarding patches with very low oil spill presence using a proposed novel CNN structure, we were able to improve the outcome of the semantic segmentation framework. Training the five-stage U-Net on balanced patches with 40–60% oil spill presence and testing on patches generated by the 23-layer CNN with significant oil pixel distribution (i.e., more than 1% of the patch labelled as an oil spill) elevated the precision to 84% and f-1 score to 80%.
The limitations of the proposed framework are as follows: (1) The proposed two-stage deep-learning network provides an improved pixel-wise classification into oil spills and background, and it is not useful for a multi-class problem, including other targets such as ships and look-alikes. However, most of the related work, as well as our work, were interested in identifying oil spills as it is of significant importance as compared with ships and look-alikes. This will allow the removal of oil leaks and the protection of marine ecosystems. (2) Although the proposed method managed to improve the overall performance of the deep-learning-based semantic segmentation of oil spills based on SAR images, it does not consider the segmentation of patches with insignificant oil spill concentration. However, we have been able to identify the oil spill pixels and patterns with high accuracy and precision which will be potentially sufficient for dispersants, booms, and skimmers to clean up the detected oil leaks. We are also planning to address this problem in future work. (3) Although several handcrafted features have been presented in the literature that supports an accurate prediction for a neural network classifier, the use of deep-learning to detect oil spills does not offer a justification for the classification decision. Accordingly, we will use feature visualization techniques and heat maps to present the unique features based on which the model came up with an accurate prediction. (4) The use of unsupervised learning (e.g., Autoencoders) has been investigated and adopted for efficient classification of oil spill pixels. Therefore, we plan to study and investigate the application of an unsupervised Autoencoder approach that generates unique compressed codes for pixels of different classes prior to the use of a supervised semantic segmentation network such as U-Net that will potentially provide superior performance as compared with the use of either unsupervised or supervising learning.