1. Introduction
With the rapid development of marine transportation and the development of marine oilfields, the incidence of oil spill accidents is increasing. Marine oil spills can cause great harm to the surrounding marine environment, threatening animals and plants in the ocean, and causing huge economic losses to human production and life. Therefore, the detection and identification of the marine oil spill is particularly important. Marine oil spills are usually large in scope and not fixed in shape, so it is difficult to see all oil spills by manual means. Remote sensing can easily solve this problem.
Remote sensing technology can accurately observe a wide range of ground objects in a short time. Optical remote sensing based on the spectral response difference of ground objects can identify, determine and quantitatively extract the information of oil spill pollution, which can greatly improve the response ability and governance ability of oil spill events at sea. However, the optical remote sensing images are greatly affected by clouds. These clouds may cover the area to be observed, making researchers unable to obtain effective information in time, which will cause great obstacles to the rapid processing of offshore oil spill accidents. SAR (Synthetic Aperture Radar) satellite imaging is one of the most effective means of oil spill real-time monitoring, since the satellite can observe the ocean synchronously in all weather, all day, producing high resolution images at wide ranges. Such imagery are less affected by cloud and fog.
SAR can penetrate the atmosphere and cloud. Moreover, it can effectively identify the camouflage and cover, which is suitable for oil spill detection and can effectively identify the oil spill regions. These advantages are not achievable by visible light and infrared sensors. Therefore, SAR has become one of the effective sensor technologies for detecting marine oil spills and has been widely used in marine oil spill detection.
After an oil spill accident, the oil spilled on the sea surface will form oil film. The oil film on the sea surface will change the wave number spectrum of the sea surface tension attenuation Bragg shortwave. This will inhibit the backscattering echo signal received by SAR. Therefore, the oil film will appear as a dark low scattering region on SAR images. It looks quite different from the surrounding high roughness seawater environment. This is the theoretical basis for the detection and extraction of marine oil spills using SAR images. However, it is inefficient to manually label the location of oil spills in a large number of remote sensing images. Machine learning methods can solve this problem. Through the deep learning, the machine independently divides the oil spill area, and only a small part of the process is manual.
The improvement of SAR image segmentation can benefit from the domain adaptation method. Farahani [
1] proposed a method based on self-encoders, which realized the fusion characteristics of synthetic aperture radar(SAR) and optics to benefit from their complementary information. The method aligned multi-temporal characteristics while reducing spectral and radiation differences, thus improving the accuracy of change detection. Stan [
2] proposed a domain adaptation algorithm based on unsupervised learning. The algorithm uses the intermediate multimodal prototype distribution to minimize the distribution cross-domain differences in the shared embedding space. The proposed scheme is still competitive compared with the UDA algorithm based on joint domain model training. Zhang [
3] proposed a domain adaptive neural network based on heterogeneous optics and SAR remote sensing image change detection. The scheme extracted heterogeneous depth features with the pseudo-Siam structure with non-shared weights. In order to bridge the gap between the source domain and the target domain in the unsupervised domain adaptation (UDA), Liu et al. [
4] designed a novel edge-preserving self-determined progress contrast learning (MPSCL) model for cross-modal medical image segmentation. In that scheme, for the first time contrast learning was introduced to contrast learning to cope with the challenge of unsupervised domain adaptation in medical image segmentation. All the above work is useful for domain adaptation from an optical image to SAR image.
The research on marine oil spill detection can be divided into three categories: traditional image segmentation algorithms, machine learning and shallow neural network algorithms, and deep neural network algorithms. The most widely used method is based on threshold segmentation. The threshold is determined according to the bimodal histogram generated by oil-free surface and oil film covered surface. Based on the threshold, oil film is separated from the oil-free surface.The intensity-based threshold algorithm is an efficient means of calculation. Nirchio et al. [
5] derived the wind intensity from the whole SAR image and the distance from the coastline to test the ability of SAR to reveal oil spills. Benito-Ortiz [
6] determined the potential area of dark spots based on texture parameters, and used sea clutter statistical information to determine the adaptive threshold. Alattas [
7] improved the threshold method of Gamma distribution, and combined the minimum cross entropy with a Gamma distribution function to form a double-layer threshold method to detect the oil spill of SAR images. Fan et al. [
8] obtained the high frequency components of global features by a threshold segmentation method to weaken the influence of point noise in SAR images, and then superimposed these features to the downsampling layers, so that the model can make more accurate decisions. However, these threshold methods are easily affected by the sea surface speckle noise, resulting in a low accuracy of oil film segmentation.
With the support of a large data volume, the detection algorithm based on neural networks can usually achieve better detection results. However, these networks need many parameter settings. Liu et al. [
9] proposed a texture index calculated from the four texture features of the gray level co-occurrence matrix for texture analysis, and used a machine learning method to extract the crude oil leakage area. Lyu [
10] used a gray level co-occurrence matrix and Tamura features to extract the required features from SAR images, and improved the accuracy of oil spill detection with the help of an extreme learning machine model. Yekeen et al. [
11] used a novel deep learning instance segmentation model for marine oil spill detection based on the Mask-RCNN model, which has a better performance than other traditional machine learning models and semantic segmentation models. Baek [
12] used the support vector machine (SVM), random forest (RF) and deep neural network (DNN) models to compare and analyze the performance of oil spill classification in different polarization modes of X-band synthetic aperture radar (SAR) images. Taravat [
13] used a pulse coupled neural network and a multi-layer perceptron for image segmentation, and subsequently used a filter based on Weibull multiplication model to filter out false targets for improving the performance of the model. Ronci [
14] innovatively used an adversarial loss function to train convolutional neural networks, and achieved promising results. According to the geometric characteristics of an oil spill, Wang et al. [
15] used the long-term and short-term memory network to process the memory information, thus obtaining the relationship between characteristics and influencing factors. Using these factors, he established the initial system model and oil spill behavior monitoring model.
Shaban et al. [
16] proposed a new deep learning framework based on a 23-layer convolutional neural network and a five-stage U-Net structure for the oil spill event recognition task of highly imbalanced datasets. This set-up improved the accuracy and the Dice score. The above methods were used to process the original SAR oil spill image, but there was no correction for inhomogeneous SAR images. The SAR images are highly speckled. Thus, the inhomogeneous areas in the images will make the oil spill characteristics unclear to be detected. The first step of oil spill monitoring scheme is usually the detection of dark strata [
17]. If the spill area is missed in this step, it is difficult to be detected in subsequent steps.
Due to the complexity of the marine environment and the multi-interpretation of oil spill characteristics, many marine phenomena may also lead to the enhancement or attenuation of SAR image echo signals. This in turn creates interference, resulting in the misclassification of oil spill detection in SAR images. The phenomenon is usually shown as too bright or too dark regions in SAR images. Low backscatter pixel values may be associated with sea clutter, ground clutter, floating oil or analogues [
17], whereas high backscatter pixel values may lead to background blurring.
SAR data is highly speckled and inhomogeneous areas widely appear in various SAR images. The inhomogeneous areas will have a great impact on the identification and segmentation of marine oil spill areas. The Gamma correction method can enhance or weaken the image gray value by changing the parameter gamma value, so as to increase the image contrast. Logarithmic correction is better in enhancing images with low overall contrast and low gray value. Therefore, this paper proposes a general module for adaptive adjustment of inhomogeneous SAR images. This module will perform Gamma correction and Logarithmic correction on image features, so as to improve the image segmentation effect. The segmentation effect is verified on six commonly used networks, including UNet [
18], UNet++ [
19], and Attention-UNet [
20], etc.
3. Results and Discussion
The experiments in this paper are executed in the pytorch environment on NVIDIA 2080ti GPU. In
Figure 6, we compared the prediction results using the Gamma-Log correction method compared to using only the Gamma correction method. From the images in the second, third, fourth and fifth column, it can be found that when different images are corrected, the result of using only Gamma correction is worse than that of using the Gamma-Log correction. After adding the Gamma-Log Net architecture to multiple network frameworks, the trained model is tested, and the results are demonstrated in
Figure 7. It can be observed from the
Figure 7 that the whole original test image is dark, and the boundaries between the background and some oil spill areas are difficult to distinguish. This is consistent with the performance of heterogeneous SAR images. In UNet, Attention-UNet and FCN8s as can be observed from
Figure 7, and wrongly detected areas have been significantly reduced. This shows that the segmentation accuracy has been greatly improved. In
Figure 8, after adding Gamma-Log Net, it can be seen that the difference between the model predicted image and the ground truth in
Figure 7 has decreased in each network structure. After adding the Gamma-Log Net architecture to multiple network frameworks, the trained model is tested, and the results are demonstrated in
Figure 9. It can be observed from
Figure 9 that the background is bright. UNet, UNet++ and FCN8s with Gamma-Log Net can identify more oil spill areas compared to the models without Gamma-Log Net. After adding the Gamma-Log Net architecture to R2UNet, the wrongly detected area is confined to a small part of the image and the segmentation accuracy is improved slightly. The Equation (
8) is to illustrate how to generate an image of the difference between the prediction and the ground truth.
The
in the Equation (
8) denotes the model prediction results. The
denotes the ground truth. In addition, the
O denotes the output images. In the Equation (
8), the union and intersection between the model prediction results and the ground truth are calculated. The output of the Equation (
8) is the difference between the predicted results and the ground truth. The ground truth and the prediction result of each model in
Figure 7 and
Figure 9 are the inputs to Equation (
8). After calculation by Equation (
8), the corresponding outputs are shown in
Figure 8 and
Figure 10, respectively. In
Table 2 and
Table 3, This column named pixel is the statistics of the difference in pixels between the predictions and the ground truth. These areas are marked as black in
Figure 8 and
Figure 10. The percentages are the proportions of these pixels in the entire image. In
Table 2, we can observe comparisons of the pixels and the percentages before and after adding Gamma-Log Net. In UNet, R2UNet and FCN32s with Gamma-Log Net, the area of differences between the model predicted image and the ground truth has been significantly reduced. The area of differences in UNet++, Attention-UNet and FCN8s has been also reduced slightly. The
Figure 10 presents the results of the differences between the model predicted image in
Figure 9 and the ground truth before and after adding Gamma-Log Net. In
Table 3, we can see comparisons of the pixels and the percentages before and after adding Gamma-Log Net. In UNet, UNet++, Attention-UNet and FCN8s with Gamma-Log Net, the area of differences between the model predicted image and the ground truth has been reduced. The area of differences in FCN32s has been reduced slightly. The differences in R2UNet are slightly increased.
In order to evaluate the oil spill segmentation effect accurately, the metrics MIoU(Mean Intersection over Union), aver-Dice(average Dice coefficient) and aver-HD(average Hausedorf Distance-95) are introduced. Intersection over Union (IoU) is essentially a method to quantify the overlap percentage between target and prediction mask. Specifically, it refers to the ratio of the number of pixels in the common area of the target mask and the prediction mask to the total number of pixels in the image. MIoU is the average IoU for each category.
The calculation of the above metric is performed in two steps. In the first step, for each category, the ratio of intersection and union is calculated. This proportion of molecules is the correct number predicted under this category; a larger range of denominators means that this category is predicted as the sum of other categories. The second step is to average the calculation results of all categories.
MIoU is used to calculate the average cross-over ratio between test samples and real samples. The general definition formula of MIoU is shown in Equation (
10),
where TP is the number of true positives. FP is the number of false positives. TN is the number of true negatives. FN is the number of false negatives.
The MIoU results before and after adding the Gamma-Log Net architecture to six networks are shown in
Table 4. In the six network models, UNet++ and Attention-UNet have good performance, with MIoU value higher than 95%. After adding the Gamma-Log Net architecture to all network frameworks, Unet’s MIoU value increased by 1.45%. Attention-UNet was the second best, with its MIoU value increasing by 0.52%. R2UNet, FCN32s and UNet ++ also have a good improvement. Their MIoU values increased by 0.43%, 0.4% and 0.37%, respectively. Finally, the MIoU value of FCN8s increased by 0.14%. Since using the average value makes it difficult to reflect the overall deviation of a set of data, the standard deviation is introduced. By comparing the change of standard deviation before and after adding the Gamma-Log Net, the change of the outliers in the overall data can be obtained. In the six network models, we added the Gamma-Log Net architecture to all network frameworks. It can be observed that the standard deviations of UNet and FCN8s decreased the most by 0.0238 and 0.0211, respectively. FCN32s, Attention-UNet, UNet++ and R2UNet also have a good performance. Their MIoU’s standard deviations value decreased by 0.0105, 0.0097, 0.0066 and 0.0052, respectively.
The Dice coefficient is a metric for measuring the similarity of sets. It is used to calculate the similarity of two samples. The value range of the Dice coefficient is between 0 and 1. The general calculation formula of the Dice Coefficient is shown in Equation (
11).
X represents the prediction samples.
Y represents the ground truth.
is the intersection between
X and
Y, which is the intersection between the predicted results and the ground truth.
and
represent the number of elements of
X and
Y. The molecular coefficient in Equation (
11) is 2, which is due to the repeated calculation of common elements between
X and
Y. The molecular coefficient is multiplied by 2 to ensure that the value range of the denominator is between 0 and 1 after the repeated calculation. According to the above definition of the Dice coefficient, the following formula can be obtained by converting Equation (
11) into the confusion matrix shown in Equation (
12):
The aver-Dice coefficient results before and after adding the Gamma-log Net architecture to six networks are shown in
Table 5. Among the six network models, the highest aver-Dice values are achieved for UNet++ and Attention-UNet, reaching over 96%. After adding the Gamma-Log Net architecture to all network frameworks, UNet improves the most with its aver-Dice coefficient increasing by 1.09%. Attention-UNet and FCN32s had the second best improvements with their aver-Dice coefficients increasing by 0.4% and 0.42%, respectively. Finally, the other three network models UNet++, R2-UNet and FCN8s have improved their aver-Dice coefficients after adding the Gamma-Log Net architecture by 0.27%, 0.33% and 0.16%, respectively. The aver-Dice values of R2UNet, UNet++ and FCN8s increased by 0.33%, 0.27% and 0.16%, respectively. In the six network models, after adding the Gamma-Log Net architecture to all network frameworks, the Dice’s standard deviations of UNet and FCN8s decreased the most by 0.0204 and 0.0245, respectively. FCN32s, Attention-UNet, UNet++ and R2UNet also have a good performance. Their Dice’s standard deviations value decreased by 0.0127, 0.0082, 0.0046 and 0.0048, respectively.
HD (the Hausedorf Distance-95) is an indicator to measure the accuracy of boundary segmentation. The Dice coefficient is sensitive to the segmented internal filling, while the Hausdorff distance is sensitive to the segmented boundary. The general definition for HD (Hausdorff-95) is shown in Equation (
13).
The difference between the predicted target region boundary and the real target region boundary is
. 95% HD is obtained by multiplying
by 95%. 95% HD is similar to the maximum HD. The purpose of using this confidence interval is to eliminate the influence of the minimum subset. It can be inferred from Equation (
13) that HD (Hausdorff-95) can be used to characterize the maximum deviation between the predicted boundary and the ground truth boundary of segmentation in the image.
The aver-HD results before and after adding Gamma-Log Net architecture to six networks are shown in
Table 6.
After adding the Gamma-log Net architecture to the six network frameworks, the aver-HD of most network models has been reduced to some extent. This shows that our Gamma-Log Net architecture can indeed produce prediction results close to the ground truth. Among them, the aver-HD value of UNet decreased the most by 0.51. Furthermore, the aver-HD of R2UNet, Attention-UNet and FCN32s also decreased by 0.126574, 0.011 and 0.042, respectively.
However, the HD values of UNet++ and FCN8s increased. This is because when the corrected feature blocks are combined together, there is a large gap in the characteristics at the splicing edge of different feature blocks. This will form the basis of our further work on the problem.
In our experiments, six network structures were used to verify the improvement of the oil spill segmentation task due to adding the Gamma-Log module. The prediction results of six network models before and after adding the Gamma-Log correction module are shown in
Figure 7, and the prediction results are compared with the ground truth.
MIoU is a commonly used metric in semantic segmentation. After adding the Gamma-Logarithm correction module, the MIoU of the model has been improved, which means that the network has improved the segmentation accuracy of the oil spill images. At the same time, the Dice coefficient value has increased, which means the segmentation results are more similar to the ground truth. After the Gamma-Log correction module is added to the network, the Hausdorff-95 value is reduced, which means that the outliers at the boundary of the segmentation result are reduced, and the segmentation accuracy at the boundary is improved. Therefore, the changes of the above three indicators fully demonstrate that the addition of Gamma-Log correction module will improve the segmentation results of marine oil spill images.