Oil Spill Detection in Quad-Polarimetric SAR Images Using an Advanced Convolutional Neural Network Based on SuperPixel Model

Oil spill detection plays an important role in marine environment protection. Quad-polarimetric Synthetic Aperture Radar (SAR) has been proved to have great potential for this task, and different SAR polarimetric features have the advantages to recognize oil spill areas from other look-alikes. In this paper we proposed an oil spill detection method based on convolutional neural network (CNN) and Simple Linear Iterative Clustering (SLIC) superpixel. Experiments were conducted on three Single Look Complex (SLC) quad-polarimetric SAR images obtained by Radarsat-2 and Spaceborne Imaging Radar-C/X-Band Synthetic Aperture Radar (SIR-C/X-SAR). Several groups of polarized parameters, including H/A/Alpha decomposition, Single-Bounce Eigenvalue Relative Difference (SERD), correlation coefficients, conformity coefficients, Freeman 3-component decomposition, Yamaguchi 4-component decomposition were extracted as feature sets. Among all considered polarimetric features, Yamaguchi parameters achieved the highest performance with total Mean Intersection over Union (MIoU) of 90.5%. It is proved that the SLIC superpixel method significantly improved the oil spill classification accuracy on all the polarimetric feature sets. The classification accuracy of all kinds of targets types were improved, and the largest increase on mean MIoU of all features sets was on emulsions by 21.9%.


Introduction
Marine environment plays a crucial part in global ecosystems.Oil spill is one of the main marine pollution, which will cause serious damage to ocean ecology and resources.In 2010, the accident of the Gulf of Mexico oil spill lasted for about three months.Beaches and wetlands in many states of the United States were destroyed and local marine organism was devastated [1].Thus it is necessary to monitor sea surface and detect oil spill.Remote sensing plays a crucial role in achieving this goal, and relevant methods have been effectively applied to oil spill detection.
Space-borne Synthetic Aperture Radar (SAR) is widely applied for oil spill detection due to its all-weather and all-time ability and wide area coverage.Full polarization SAR data provides four channels according to the transmit and receive mode of radar signal and they are HH, HV, VH and VV channels.The clean sea can be regarded as a rough surface, while the smooth oil spill layer usually floats on the water surface, existing as dark spots since it dampens capillary waves, short gravity waves and Bragg scattering [2].The general steps for oil spill detection are divided into: (1) spot extraction, (2) feature extraction, (3) classification [3].Early researches mainly focused on textual information of dark spots area.Several textual features including first invariant planar moment and power-to-mean ratio were extracted from SAR data, supplemented by statistical model or machine learning, to perform oil spill detection [4,5].Some experiments were carried out to perform oil spill detection on different band SAR images [6,7].Yongcun Cheng et al. used VV channel data acquired by COSMO-SkyMed to monitor oil spill and simulate a model [8].M. Migliaccio et al. proposed a multi-frequency polarimetric SAR processing chain to detect oil spill in the Gulf of Mexico, and has been applied successfully [1].These methods could successfully distinguish oil spill area from sea surface, and they are known as mature classification algorithms.
However, several environmental factors including low-speed winds, internal wave and biogenic films also appear as dark spots in SAR images [9], and they are called look-alikes.The most challenging thing of oil spill detection from SAR images is to distinguish the oil spill area from these look-alikes.That is the main obstacle of the early researches of texture analysis focused on single pol-SAR data.Oil spill area may experience complex deformation on the sea surface, which is easy to be confused with look-alikes.Moreover, a large amount of data are required for texture analysis.These problems become major hindrances for high-accuracy oil spill detection.With the new development of SAR satellites in recent years, the research focus of oil spill detection began to incline to dual-pol or quad-pol SAR image, and the derived compact SAR [10], which not only retain texture characteristics of dark spots area, but also provide a lot of polarized information.Polarimetric decomposition essentially reflects the scattering modes of microwave on the sea surface, which highlights the subtle differences of different ocean objects [11].Many polarized parameters extracted from different SAR channels has been proved to have great ability to perform high accuracy oil spill detection [12][13][14][15].On the perspective of polarized feature, S. Skrunes used k-means classification method to detect oil spill area on several polarized parameters [16].With the rise of machine learning algorithms in recent years, neural networks have also been applied into oil spill detection.Yu Li et al. performed several comparative experiments between different machine learning classifiers based on multi polarized parameters [17], and the differences between fully and compact polarimetric SAR images [18] were explored.
Meanwhile, as a classical feedforward neural network, convolutional neural network (CNN) is widely used in image classification and recognition.Since it was proposed in 1989, CNN has experienced many improvements and changes, and derived several classic network structures such as Inception, Resnet and Cliquenet [19][20][21].Min Lin et al. used the global average pooling (GAP) layer to replace the fully connection layer to reduce the parameter amount in 2014 [22]; Andrew Howard et al. put forward the depthwise separable convolution in MobileNet [23,24], which can maintain high accuracy even when the amount of parameters and calculation is reduced.
In 2015, Jonathan Long et al. proposed a fully convolutional network (FCN) with transposed convolution for image semantic segmentation [25].The end-to-end operation is implemented with an encoder-decoder structure, and the classification prediction is given for each pixel on the image.The concept of dilated convolution into semantic segmentation in 2016, and it greatly improved classification accuracy [26].Follow by that, advanced models have been developed for high precision segmentation, and they are Unet, Linknet and Deeplab series [27,28].The encoder-decoder structure of semantic segmentation model based on CNN has been used in oil spill detection in recent studies [29,30], and achieved high accuracy.With the application of TerraSAR-X and other SAR satellites, dual-polarized SAR images are also introduced into oil spill detection.Daeseong Kim et al. extracted polarized parameters from dual-pol TerraSAR-X images, successfully mapped oil spill area with artificial neural networks [31].
The concept of superpixel is an image segmentation technology proposed in 2003 [32].It refers to an irregular pixel block with certain visual significance.It is composed of adjacent pixels with similar texture, color brightness and other characteristics.The similarity of features between pixels are used to form a group of pixels, and image are expressed by a small number of superpixels.Superpixel greatly reduces the complexity of image post-processing, and it is used as a pre-processing step for image segmentation algorithm.Simple Linear Iterative Clustering (SLIC) is a widely used superpixel segmentation method [33], and has been introduced into some SAR scenes.Some researchers also use multi-chromatic analysis to perform target detection and analysis on SAR images [34,35].
Many current oil spill detection methods only divided images into oil or non-oil areas, which may cause false alarm and cannot recognize every target on the sea surface.The classification method combining neural networks also could not distinguish oil spill and look-alikes well, while the flexible structure of CNN provides the possibility to solve these problems.It allows a variety of parameters input, can simultaneously take into account the task of dark spots extraction and classification and identify every target on the sea surface.In this paper we proposed an oil spill detection method using SLIC superpixel and semantic segmentation algorithm based on CNN, combining several convolution kernels including dilated and depthwise separable convolution.It allows multiple-parameters input and realize pixel-level oil spill area classification, which finally realize further accuracy improvement.We carried out experiments on five groups of polarized parameters extracted from SLC quad-polarimetric SAR data of Radarsat2 and Spaceborne Imaging Radar-C/X-Band Synthetic Aperture Radar(SIR-C/X-SAR), and evaluated the classification results of superpixel segmentation combined with polarized parameters.The experiments results show that our method could effectively extract and classify dark spots on a SAR image.SLIC superpixel could further improve classification accuracy of oil spill area, and Yamaguchi 4-component decomposition combined with SLIC superpixel classification is considered the most suitable parameters for oil spill detection in our case.

Overall Framework
The flowchart of our oil spill detection method is illustrated as Figure 1.The four channels data was processed by Lee refined filter firstly, and different polarized parameters were then extracted from these four channels.All polarized parameters used in our experiments are divided into five groups according to different scattering principles and calculation methods.For monostatic SAR system, reciprocity always holds, which means that the complex scattering coefficient obeys HV = VH.For this reason, HV is considered for cross polarization channel in the analysis.Three channel data should be used to generate image in CIElab color space for SLIC superpixel model.We choose data of HH, HV and VV channel to do it, since co-polarized channels (HH/VV) data contains more polarized information than cross-polarized channels (VH/HV) [36].The HH, HV and VV data were also used to calculate SLIC superpixels.Sections 2.2 and 2.3 explains the method used to extract polarized parameters and they are both set as the input of neural network models.The neural network is composed of an encoder and a decoder section.The output of the neural network is segmentation results of oil spill detection.
We designed a semantic segmentation model based on CNN as the classifier, as shown in Figure 2. The dims in the diagram represents the number of polarization parameters, since multiple polarization modes are applied in our work, the network parameters will be adjusted according to the parameter of dims.The depthwise separable convolution and dilated convolution was used in several layers in the bottom parts.The subsequent encode task is completed by standard convolution layer.Green blocks represent a skip connection structure similar to residual learning [19], in order to make top layers accessible to the information from bottom layers and help train the network easier.The feature maps extracted by encoder is decoded by progressive transposed convolution layers, and skip connection is also applied to absorb more features.The specific principle and implementation of network are explained in Section 2.4 in detail.
The different polarized parameters groups and superpixel segmentation results will be combined to input into the neural networks for training.The output is the result of oil spill detection.Finally, Mean Intersection over Union (MIoU) was calculated between segmentation result and annotation images to evaluate the accuracy.We designed a semantic segmentation model based on CNN as the classifier, as shown in Figure 2. The dims in the diagram represents the number of polarization parameters, since multiple polarization modes are applied in our work, the network parameters will be adjusted according to the parameter of dims.The depthwise separable convolution and dilated convolution was used in several layers in the bottom parts.The subsequent encode task is completed by standard convolution layer.Green blocks represent a skip connection structure similar to residual learning [19], in order to make top layers accessible to the information from bottom layers and help train the network easier.The feature maps extracted by encoder is decoded by progressive transposed convolution layers, and skip connection is also applied to absorb more features.The specific principle and implementation of network are explained in Section 2.4. in detail.The different polarized parameters groups and superpixel segmentation results will be combined to input into the neural networks for training.The output is the result of oil spill detection.Finally, Mean Intersection over Union (MIoU) was calculated between segmentation result and annotation images to evaluate the accuracy.Structure of neural networks in our segmentation method.Blue and green blocks donate encoder parts, which consist of multi convolution layers, here we used depthwise separable convolution, dilated convolution and standard convolution as filter kernel, respectively.Purple-red blocks constitute the decoder part, it outputs a classification map with the same size as original image.

Polarimetric Decomposition
The whole process of extracting polarimetric parameters is shown in Figure 3.The boxes are different polarized parameter combinations for classification.This section will explain the calculation method of different polarized parameters in the followings.
blocks constitute the decoder part, it outputs a classification map with the same size as original image.
The different polarized parameters groups and superpixel segmentation results will be combined to input into the neural networks for training.The output is the result of oil spill detection.Finally, Mean Intersection over Union (MIoU) was calculated between segmentation result and annotation images to evaluate the accuracy.

Polarimetric Decomposition
The whole process of extracting polarimetric parameters is shown in Figure 3.The boxes are different polarized parameter combinations for classification.This section will explain the calculation method of different polarized parameters in the followings.

H/A/Alpha Decomposition
The scattering matrix of a fully polarimetric SAR image can be expressed as

H/A/Alpha Decomposition
The scattering matrix of a fully polarimetric SAR image can be expressed as where |S XX | and ∅ XX represent the amplitudes and phases of the complex scattering coefficients, each complex element donates a polarization component.The two crossed-polarized terms are identical in Radarsat-2, i.e., S HV = S VH .Polarization covariance matrix C and coherency matrix T contain abundant physical information of polarization characteristics of ocean objects.Cloude and Pottier outlined a scheme for parameterizing polarimetric scattering problems based on matrix T in 1997, the covariance matrix can be derived by where * represents conjugate, stands for multilook with an average window (we set the window size to 3, the same is true in later equations).The expression of matrix T is listed as follow: , it can also be expressed in another form Remote Sens. 2020, 12, 944 6 of 26 where H donates transpose conjugate, and the formula of U 3 is cos(α 1 )e j∅ 1 cos(α 2 )e j∅ 2 cos(α 3 )e j∅ 3 cos(α 1 ) cos(β 1 )e jδ 1 sin(α 2 ) cos(β 2 )e jδ 2 sin(α 3 ) cos(β 3 )e jδ 3 sin(α 1 ) sin(β 1 )e jγ 1 sin(α 2 ) sin(β 2 )e jγ 2 sin(α 3 The column vectors → u 3 of U 3 are the eigenvectors of matrix T 3 , corresponding to eigenvectors λ 1 , λ 2 and λ 3 .Cloude decomposition regards the scattering behaviors of targets as the superposition of three independent scattering behaviors, and the probability of three eigenvectors, which represents the weights of each basic scattering can be calculated by The polarimetric entropy describes the randomness of the scattering mechanisms and is defined by The formula of anisotropy is and the mean scattering angle is where α i = arccos(ν j ), ν j donates the eigenvalue of T 3 .

Single-Bounce Eigenvalue Relative Difference
Allain et al. proposed Single-Bounce Eigenvalue Relative Difference (SERD) based on Cloude decomposition in 2004.The correlation between co-polarized and cross-polarized channels is almost equal to 0 for sea surface microwave scattering, so the matrix T 3 can be simplified as and the eigenvalue of matrix T 3 can be calculated by The first two eigenvalues are related to the co-polarized backscatter coefficient, and the third one is related to the cross-polarized channel and multiple scattering.Calculate the value of scattering angle α i according to the eigenvectors corresponding to eigenvalues λ 1nos and λ 2nos to distinguish the type of scattering mechanisms: the eigenvalue corresponds to a single scattering when α i ≤ π 4 , and it is a double scattering when α i ≥ π 4 .The SERD is defined as λ s = λ 1nos when α 1 ≤ π 4 or α 2 ≥ π 4 , and λ s = λ 2nos when α 1 ≥ π 4 or α 2 ≤ π 4 .SERD is very sensitive to the surface roughness.A large value of SERD indicates a strong single scattering in the scattering process of the target, while the small SERD value indicates weak single scattering.For the high entropy scattering area of oil spill surface, the scattering is composed of many kinds of scattering mechanisms.Single scattering is not dominant, that is, the SERD value at oil film is relatively small, and then it can be used for oil spill detection.

Co-and Cross-Polarized Decomposition
This section will introduce two parameters based on the scattering matrix: co-polarized correlation coefficients and conformity coefficients.Correlation coefficients can be expressed as The conformity coefficients were firstly introduced into compact polarimetric SAR to estimate soil moisture by Freeman et al. [18].Extending the conformity coefficients to quad-polarimetric SAR, it can be expressed as

Freeman 3-Component Decomposition
Freeman and Durden [11] proposed a three-component scattering model for polarimetric SAR data in 1998, and it includes three simple scattering mechanisms: volume (or canopy) scattering, double-bounce scattering and rough surface scattering.Assuming those three scatter components are uncorrelated, the scattering process of radar wave on the sea surface can be regarded as the composition of these three mechanisms, so the model for total backscatter is where f s , f d and f v are the contribution of surface, double-bounce and volume scattering to the VV cross section.Once f s , f d and f v are estimated, we can also get contributions of three scatter to HH, HV and VH channels.α in the formula is defined by R th and R tv donate the reflection coefficients of vertical surface for H and V polarizations, while R gh and R gv are the Fresnel reflection coefficients of horizontal surface.The propagation factors e j2γ h and e j2γ v are used to make the model more general, γ represents any attenuation and phase change of the V and H polarized waves when they propagate from the radar to the ground and back again.
The volume scattering contribution can be calculated directly from Equation (15d).We can estimate the contribution of each scattering mechanism to the span P with Then we can use Equations ( 15)-( 18) to calculate the scattering power of three mechanisms: P s , P d and P v .They are the result of Freeman decomposition.

Yamaguchi 4-Component Decomposition
In 2005, Yamaguchi et al. [12] proposed a four component decomposition method based on Freeman decomposition, included the helix scattering power as the fourth term for a more general model, which is essentially caused by the scattering matrix of helices and is mainly used in urban areas.What is more, Yamaguchi decomposition modify the volume scattering matrix according to the relative backscattering magnitudes of Assume the magnitude of the helix scattering power f c , the corresponding magnitude of S HV S VV * becomes f c /4, and the power relation becomes and we can get the following five equations α, β, f s , f d , f v , f c by comparing the covariance matrix element: f c can be measured directly.The volume scattering coefficient f v is calculated by α and β is calculated as the same way as Freeman decomposition, so we can get contribution of four mechanisms: f s , f d , f v and f c .The scattering powers P s , P d , P v and P c corresponding to surface, double bounce, volume and helix scattering contributions can be obtained by P s , P d , P v and P c are the results of Yamaguchi decomposition.

SLIC Superpixel
The superpixel algorithm was first proposed in 2003 by Xiaofeng Ren et al. [32].Adjacent pixels with the same attribute are divided in one region (one superpixel) and then the whole image can be indicated by a certain number of superpixels, which allows better performance for subsequent image processing.SLIC adopted k-means algorithm to generate superpixels.The algorithm limited the search space to a region proportional to the size of superpixels, and reduced the number of distance calculation in optimization and the linear complexity of pixels.
The SLIC segmentation result relies only on the number of superpixels k.Each superpixel has approximately the same size.These k initial cluster centers are sampled on a regular grid with S pixel intervals.S can be calculated by S = N k , where N is the number of pixels.Each pixel i is assigned with the nearest clustering center if their search area could overlap its position, then SLIC allows faster clustering than traditional k-means does.The distance measurement D determines the closest clustering center C k for each pixel i.The expected spatial range of the superpixel is an area of approximate size S × S. A similar pixel search is performed in the area 2S × 2S around the center of the superpixel.SLIC realizes the above steps based on labxy color image plane space.The value of the pixel is expressed as [l a b] T in CIELab color space.However, the position [x y] T of the pixel changes with the size of the image.In order to combine them into a single measurement, we need to standardize the color proximity and spatial proximity by their maximum distances N c and N s in the cluster.Then D can be calculated by where When N c is fixed as a constant m, the Equation (24c) can be listed as the following: where in gray-scale.m allows us to balance the relative importance between N c and N s .When m increases, the superpixel result depends more on the degree of spatial proximity.
Once each pixel has been associated with the nearest cluster center, the update step adjusts the cluster center to the average vector of all pixel.L 2 norm is used to calculate the residual error E between the new and the previous cluster center position.The allocation and iterative process ends when E is less than the setting threshold.In our experiments, we transform the HH, HV and VV channel SAR data to labxy color space for superpixel calculation.

Semantic Segmentation Algorithm
We constructed a refined segmentation method based on CNN to perform oil spill detection.The structure used in our network will be described in details in the followings.

Convolutional Layer and Dilated Convolution
CNN has been widely used in image classification and object detection for its good generation ability.Compared with traditional neural networks, CNN imitates the human visual nerve and allows automatic feature extraction.Two main processes in the training of CNN are forward propagation and backward propagation.Forward propagation expresses the transmission of characteristic information, while backward propagation mainly uses error information to correct model parameters.
Convolutional layer is the core component of CNN.In forward propagation, it sets up a filter kernel to slide on the input tensor and obtain image features, the number of layers of these kernels equals to the input tensor.Convolution operations can be expressed as θ k and b k are weights and biases that need to be trained in neural networks, while f ( * ) represents the activation function, herein Rectified Linear Units(ReLU) function and tanh() function were used, and their equations are The backward propagation process of neural networks depends on the backward derivation of the output layer (loss function) and the calculation of errors.The parameter adjustment is further optimized by the error function, and Adam optimizer was adopted in this paper.It adjusts the value of weights and biases iteratively, which allows little error of the output of neural networks.The gradient of output layer transfer between convolutional layers can be expressed as where a l donates the output tensor of layer l and z l = W l a l−1 + b l , and σ z l is the formula of convolution.donates the Hadamard product.
) is the loss function between output tensor and ground truth.
In our case we used the cross entropy, which is described by p(x i ) and q(x i ) donate the probability of x i classification of the output and ground truth.
The recurrence relation between layer l and layer l − 1 is Then the gradient of layer l − 1 is where rot180() means that the convolution kernel is rotated 180 degrees when the derivative is calculated, and the gradients of all layers can be calculated.Assuming that the gradient after t iterations is g t = δ l (t), the exponential moving average of the gradient is calculated by where β 1 is the exponential decay rate.The exponential moving average of gradient square is Revised m t and v t as the formula m = m Then the formula for updating parameters is where α represents the learning rate.
In our paper, dilated convolution is applied to extract features from input layer, which adopts inject holes into traditional convolutional kernels and it can increase the reception field.The difference between standard kernel and dilated kernel is represented in Figure 4.The kernel will slide from left to right, top to bottom on the image.As shown in Figure 4a, the red points are standard kernel.For dilated kernel (see Figure 4b), several inject holes highlighted as blue or dark blue points were added.The values at these points are set as 0. Only values at red points are calculated.Suppose k : Ω r → R, Ω r = [−r, r] 2 is a discrete filter with the size of (2r + 1) 2 , the discrete convolution operator can be defined as When l is a dilation factor, * l should be defined as That is the calculation formula of dilated convolution.
Remote Sens. 2019, 11, x FOR PEER REVIEW 12 of 28 In our paper, dilated convolution is applied to extract features from input layer, which adopts inject holes into traditional convolutional kernels and it can increase the reception field.The difference between standard kernel and dilated kernel is represented in Figure 4.The kernel will slide from left to right, top to bottom on the image.As shown in Figure 4(a), the red points are standard kernel.For dilated kernel (see Figure 4 (b)), several inject holes highlighted as blue or dark blue points were added.The values at these points are set as 0. Only values at red points are calculated.Suppose : Ω  → , Ω  = [−, ] 2 is a discrete filter with the size of (2 + 1) 2 , the discrete convolution operator can be defined as When  is a dilation factor, *  should be defined as That is the calculation formula of dilated convolution.

Depthwise Separable Convolution with Dilated Kernel
Suppose the size of input tensor is N × H × W × C and there is a h × w × k convolution kernel, the output of this layer would be an N × H × W × k tensor when pad = 1 and stride = 1.The whole process needs h × w × k × C parameters and h × w × k × C × H × W times multiplication.
Depthwise separable convolution decomposes traditional convolution layer into a depthwise convolution and a pointwise convolution.Depthwise process divides the N × H × W × C size input tensor into C groups.A convolution operation with a h × w kernel is carried out on each group.This process collects the spatial feature of each channel, i.e., depthwise features.The output N × H × W × C size output tensor is operated by a traditional 1 × 1 × k convolution kernel, which extract the pointwise feature from each channel.Its output is also a N × H × W × k. size tensor.Depthwise and pointwise can be regarded as a convolution layer with much lower amount of computation.The two processes need (H × W × C) × (k + h × w) times multiplication in total.
In order to combine the reception field of dilated convolution with the calculated performance of depthwise separable convolution, we adopt the strategy of adding holes into depthwise convolution kernel in several bottom layers of neural network.

Transposed Convolution
Transposed convolution, also known as deconvolution, is often used as decoder in neural networks.In the semantic segmentation task, transposed convolution upsample the feature map extracted by convolution layer.The final output is a fine classification map with the same size as the original image.In fact, it transposes the convolution kernel in the ordinary convolution we used in the encoder section and inverts the input and output.For example, Figure 5 shows a highly condensed feature map extracted by multilayer network and how it is decoded by a transposed convolution layer.For example, the 2 × 2 feature map padded with 2 × 2 border of zeros using 3 × 3 strides is convolved by a 3 × 3 kernel.Its output is a 6 × 6 tensor when there is no padding in convolution process.
Remote Sens. 2019, 11, x FOR PEER REVIEW 13 of 28 Transposed convolution, also known as deconvolution, is often used as decoder in neural networks.In the semantic segmentation task, transposed convolution upsample the feature map extracted by convolution layer.The final output is a fine classification map with the same size as the original image.In fact, it transposes the convolution kernel in the ordinary convolution we used in the encoder section and inverts the input and output.For example, Figure 5 shows a highly condensed feature map extracted by multilayer network and how it is decoded by a transposed convolution layer.For example, the 2 × 2 feature map padded with 2 × 2 border of zeros using 3 × 3 strides is convolved by a 3 × 3 kernel.Its output is a 6 × 6 tensor when there is no padding in convolution process.The detailed parameters of each layer are listed in Table 1.The encoder section contains 10 convolution layers and 2 residual blocks.Convolution layers 1 and 3 adopted depthwise separable convolution, and layer 5 was dilated convolution layer.The decoder section consisted of five deconvolution (transposed convolution) layers, in which layer 1 and layer 2 are connected with convolution layer 7 and layer 3, respectively.The detailed parameters of each layer are listed in Table 1.The encoder section contains 10 convolution layers and 2 residual blocks.Convolution layers 1 and 3 adopted depthwise separable convolution, and layer 5 was dilated convolution layer.The decoder section consisted of five deconvolution (transposed convolution) layers, in which layer 1 and layer 2 are connected with convolution layer 7 and layer 3, respectively.

Evaluation Method
MIoU is usually used as an index to measure the accuracy in semantic segmentation task, it is to calculate the intersection between prediction and ground truth.MIoU can be expressed as which is equivalent to where TP is the abbreviation for true positive, which means the number of samples when real value and model prediction are both positive.FN represents false negative, which means real value is positive while model prediction is negative.FP represents false positive.k is the number of classifications.

SAR Data and Preprocessing
There are three images used in our experiments.Image 1 is a quad-pol oil spill image obtained by C-band Radarsat-2 satellite over the North Sea of England in 2011 during the oil-on-water exercise conducted by the Norwegian Clean Seas Association for Operating Companies (NOFO).The whole image contains five parts in total: clean sea, ships, biogenic look-alike film, emulsion and crude oil spill.The biogenic look-alike film was simulated by Radiagreen plant oil, while emulsion area was composed of Oseberg blend crude oil mixed with 5% IFO380 (Intermediate Fuel Oil).The oil spill area was the Balder crude oil.It was released 9h before SAR acquisition [16].Emulsions are classified as an independent class in this paper since they have different composition and polarimetric scattering characteristics in SAR images.Image 2 and Image 3 are acquired by C-band SIR-C/X-SAR in 1994, the dark spots contained in images are biogenic look-alike and oil spill, respectively.The biogenic look-alike was composed of Oleyl Alcohol in the experiment [37].The detailed information of SAR acquisition is listed in Table 2.The Single Look Complex (SLC) radar images experienced multi-look process and was filtered by Refined Lee Filter.Figure 6 shows the image extract from coherence matrix T before and after filtering.The Single Look Complex (SLC) radar images experienced multi-look process and was filtered by Refined Lee Filter.Figure 6 shows the image extract from coherence matrix T before and after filtering.It helped suppress speckle noise and enhance the edge of dark spots, and some early experiments have proved that Refined Lee Filter could help increase oil spill detection accuracy.The filtered images were processed by different polarized decomposition methods according to the steps listed in Section 2.2. Figure 7 lists the five groups of polarized parameters extracted from Image 1 as an example: H/A/Alpha, H/A/Alpha/SERD, correlation/conformity coefficients, Freeman decomposition and Yamaguchi decomposition, and characteristics of all these parameters are listed in Table 3.The filtered images were processed by different polarized decomposition methods according to the steps listed in Section 2.2. Figure 7 lists the five groups of polarized parameters extracted from Image 1 as an example: H/A/Alpha, H/A/Alpha/SERD, correlation/conformity coefficients, Freeman decomposition and Yamaguchi decomposition, and characteristics of all these parameters are listed in Table 3.The filtered images were processed by different polarized decomposition methods according to the steps listed in Section 2.2. Figure 7 lists the five groups of polarized parameters extracted from Image 1 as an example: H/A/Alpha, H/A/Alpha/SERD, correlation/conformity coefficients, Freeman decomposition and Yamaguchi decomposition, and characteristics of all these parameters are listed in Table 3.

SLIC Superpixel Segmentation
The HH, HV and VV data was taken as input data to perform SLIC superpixel segmentation.We used these three channels of SAR data to generate a new image, it was converted into CIElab color spaces.Following the steps of SCIC superpixel method described in Section 2.3, the superpixel segmentation results of SAR data are shown in Figure 8.The superpixel number of three images was set to 250, 40, 40, respectively.They are another type of input besides polarized parameters for CNN training.It can be seen from Figure 8 that SLIC superpixel divides the image into several independent areas, and can initially locate dark spots, especially in Image 2 and Image 3. Polarimetric decomposition and superpixel images are divided into five groups as listed in Figure 2. The three SAR images are divided into five categories pixel by pixel: clean sea background (CS), emulsion (EM), biogenic look-alike (LA), oil spill (OS) and ships (SH).All the images are divided into 48 × 48 small pictures in the experiment.When multiple parameters are input into CNN, they are stacked along the third axis of images to form a three-dimensional array.The original SAR images only contained 5 ships, in order to increase the number of samples, especially ships, we sampled the same target area for multiple times.We extracted image of target areas from different positions, and these images are divided into 48*48.Thus, we can get several sampling images on the same area.We randomly selected training set and test set from sample images, the number of samples are listed in Table 4.The MIoU was calculated on the test set.They are trained with the proposed network Polarimetric decomposition and superpixel images are divided into five groups as listed in Figure 2. The three SAR images are divided into five categories pixel by pixel: clean sea background (CS), emulsion (EM), biogenic look-alike (LA), oil spill (OS) and ships (SH).All the images are divided into 48 × 48 small pictures in the experiment.When multiple parameters are input into CNN, they are stacked along the third axis of images to form a three-dimensional array.The original SAR images only contained 5 ships, in order to increase the number of samples, especially ships, we sampled the same target area for multiple times.We extracted image of target areas from different positions, and these images are divided into 48*48.Thus, we can get several sampling images on the same area.We randomly selected training set and test set from sample images, the number of samples are listed in Table 4.The MIoU was calculated on the test set.They are trained with the proposed network described in Section 2.4 and the output segmentation results are verified with ground truth.

Oil Spill Classification
In order to evaluate the influence of SLIC superpixel on segmentation results, we carried out comparative experiments based on each group of polarized parameters with and without superpixel segmentation.Figure 9 presents the segmentation results of five groups of polarized parameters on five dark spots areas of three images.The oil spill area is marked with the dark spots and the light grey means the biogenic look-alikes.The medium grey represents emulsion.
As shown in Figure 9, the dark spots area can be extracted effectively and classified accurately in each group.The classification result of oil spill area in Image 3 showed the best.Among all the polarized decomposition parameters, the performance of Yamaguchi 4-component parameters was the best, followed by Freeman 3-component parameters and H/A/SERD/Alpha.H/A/Alpha could also distinguish each category in the images except ships.The parameter SERD effectively increased the classification accuracy on the basis of H/A/Alpha decomposition.The segmentation result of co-polarized correlation coefficients and conformity coefficients does not perform well nearly in all categories, indicating they are not optimal polarized parameters for detecting oil spill areas.
Remote Sens. 2020, 12, 944 18 of 27 these images are divided into 48*48.Thus, we can get several sampling images on the same area.We randomly selected training set and test set from sample images, the number of samples are listed in Table 4.The MIoU was calculated on the test set.They are trained with the proposed network described in section 2.4 and the output segmentation results are verified with ground truth.

Oil Spill Classification
In order to evaluate the influence of SLIC superpixel on segmentation results, we carried out comparative experiments based on each group of polarized parameters with and without superpixel segmentation.Figure 9 presents the segmentation results of five groups of polarized parameters on five dark spots areas of three images.The oil spill area is marked with the dark spots and the light grey means the biogenic look-alikes.The medium grey represents emulsion.As shown in Figure 9, the dark spots area can be extracted effectively and classified accurately in each group.The classification result of oil spill area in Image 3 showed the best.Among all the polarized decomposition parameters, the performance of Yamaguchi 4-component parameters was the best, followed by    Considering all categories, the classification results of clean sea (CS) is the best.Then it is followed by oil spill (OS) areas, which is slightly better than look-alikes (LA).The classification accuracy of the categories of emulsions (EM) and ship (SH) are the lowest.The false detection mostly occurred in emulsions.A number of emulsion areas were misclassified into oil spill or look-alikes, especially in the experiments of H/A/Alpha, H/A/SERD/Alpha and co-polarized CC/conformity coefficients.Compared with those results, Freeman 3-component and Yamaguchi 4-component decomposition could distinguish most of these categories successfully.Moreover, the experiment results by applying these two groups of polarized parameters could also detect ships with high reliability, which are almost all misclassified as oil spill areas in other groups' experimental results.
In the followings, we added the SLIC segmentation result from SAR data as another input besides polarized parameters and inputted them together into neural network and repeated the above experiments.The output results are represented in Figure 10.The classification results of each category has been improved significantly, especially for emulsion areas.Compared with the segmentation results without applying superpixel model results, the edge of different classes become more distinct.The numerical comparison was carried out by calculating the MIoU of each polarized parameter group on the test set.The compared results with and without SLIC superpixel are listed in Table 5.The accuracy of Yamaguchi and Freeman decomposition is significantly higher than other groups of polarized parameters, and that of each classification category has been also improved by SLIC superpixel to varying degree.
For further analysis, Table 6 shows the total MIoU of different polarized parameters decomposition methods.The average MIoU of each classification in all experiments is shown in Table 7.Both Table 6 and Table 7 are calculated from the average value of Table 5.The overall accuracy of different polarimetric parameters after combined with SLIC superpixel segmentation maintained the same trend in previous analysis as illustrated in Tables 5 and 6.Yamaguchi 4-component decomposition achieved the highest MIoU by 90.5%, followed by Freeman parameters and H/A/SERD/Alpha.Although SLIC superpixel just provide a rough classification of dark spots area, it could also improve MIoU values of each polarimetric parameters, increased by 12.3%, 11.3%, 21.2%, 2.5%, 4.0% relatively.Take Yamaguchi parameters as example, the MIoU of OS area increased from The numerical comparison was carried out by calculating the MIoU of each polarized parameter group on the test set.The compared results with and without SLIC superpixel are listed in Table 5.The accuracy of Yamaguchi and Freeman decomposition is significantly higher than other groups of polarized parameters, and that of each classification category has been also improved by SLIC superpixel to varying degree.For further analysis, Table 6 shows the total MIoU of different polarized parameters decomposition methods.The average MIoU of each classification in all experiments is shown in Table 7.Both Tables 6 and 7 are calculated from the average value of Table 5.The overall accuracy of different polarimetric parameters after combined with SLIC superpixel segmentation maintained the same trend in previous analysis as illustrated in Tables 5 and 6.Yamaguchi 4-component decomposition achieved the highest MIoU by 90.5%, followed by Freeman parameters and H/A/SERD/Alpha.Although SLIC superpixel just provide a rough classification of dark spots area, it could also improve MIoU values of each polarimetric parameters, increased by 12.3%, 11.3%, 21.2%, 2.5%, 4.0% relatively.Take Yamaguchi parameters as example, the MIoU of OS area increased from 94.0% to 96.8%, and increased by 0.8%, 12.3% and 9.2% in CS, EM and LA area relatively.What's more, the largest increase of MIoU occurred in EM area, which increased by 21.9% in average in five groups of polarimetric parameters, as shown in Table 8.CS and OS areas achieved the highest MIoU by 95.9% and 94.1% in all experiments with and without SLIC superpixel, and SH was significantly lower than other parts.It is worth noting that the number of superpixels in SLIC superpixel segmentation will also affect the final segmentation accuracy.We tested the number of superpixels from 150 to 400 with the step of 50 on Image 1 alone.Figure 11 shows the SLIC segmentation results of different numbers of superpixels.We carried out the comparison experiments with the use of the polarized parameter group of Yamaguchi 4-component decomposition, since it achieved the highest MIoU in the previous experiments.Table 10 lists the MIoU for oil spill segmentation accuracy under different superpixel numbers.The highest accuracy is 91.0%when superpixel number was set to 250.
Finally, the classification results of the whole image without and with SLIC superpixel by applying Yamaguchi parameters are represented in Figure 12.Each category on the sea surface can be distinguished with high accuracy.SLIC superpixel helped further improve the accuracy of each category, especially for emulsions.Biogenic look-alikes were also better classified with less misclassification pieces inside.Emulsions can be well detected from oil spill and biogenic look-alike areas, and the segmentation results of other categories also perform better.The improvement effect in Image 1 was the most obvious, while SLIC superpixel mainly helped improve the accuracy of CS area in Image 2 and Image 3.
It is worth noting that the number of superpixels in SLIC superpixel segmentation will also affect the final segmentation accuracy.We tested the number of superpixels from 150 to 400 with the step of 50 on Image 1 alone.Figure 11 shows the SLIC segmentation results of different numbers of superpixels.We carried out the comparison experiments with the use of the polarized parameter group of Yamaguchi 4-component decomposition, since it achieved the highest MIoU in the previous experiments.Table 10 lists the MIoU for oil spill segmentation accuracy under different superpixel numbers.The highest accuracy is 91.0%when superpixel number was set to 250.Finally, the classification results of the whole image without and with SLIC superpixel by applying Yamaguchi parameters are represented in Figure 12.Each category on the sea surface can be distinguished with high accuracy.SLIC superpixel helped further improve the accuracy of each category, especially for emulsions.Biogenic look-alikes were also better classified with less misclassification pieces inside.Emulsions can be well detected from oil spill and biogenic look-alike areas, and the segmentation results of other categories also perform better.The improvement effect in Image 1 was the most obvious, while SLIC superpixel mainly helped improve the accuracy of CS area in Image 2 and Image 3. In order to evaluate the algorithm complexity, we calculate the calculation time of the superpixel segmentation and CNN classification with different polarized parameters, the results are listed in Table 9.Table 10 shows the memory usage of different neural network models.Due to the limitation of experimental conditions, our experiments are carried out on a device without independence GPU.It should be noted that the processing speed will be more than several tens of times faster on a device on the type and size of objects.That means that the SLIC superpixel numbers should be adjusted depending on real conditions.

Conclusions
In this paper, we proposed an oil spill detection method combining SLIC superpixel model and semantic segmentation algorithm based on CNN.The dilated convolution kernel and depthwise separable convolution kernel was adopted for better computing performance and larger sensing area.SLIC superpixel segmentation is set as an input for the CNN model for auxiliary classification.
The experiments were carried on a C-band fully polarized SAR data of Radarsat-2.We extracted several polarized parameters according to different methods, and tested their performance in oil spill classification based on the proposed method.The results showed that in each group of experiments, this network structure can effectively distinguish the oil spill area and other areas.The highest MIoU value appeared in Yamaguchi decomposition parameters experiment, followed by H/A/SERD/Alpha and Freeman decomposition.
The introduction of SLIC superpixel greatly improved the recognition accuracy.The MIoU values of each group are improved, and their numerical order of the polarimetric feature sets is almost the same as in experiments without SLIC superpixel.Hence, it is suggested that Yamaguchi parameters combined with superpixel segmentation is the most suitable method for oil spill detection.

28 Figure 1 .
Figure 1.Flow chart of our oil spill detection method.

Figure 1 .
Figure 1.Flow chart of our oil spill detection method.Remote Sens. 2019, 11, x FOR PEER REVIEW 5 of 28

Figure 2 .
Figure 2.Structure of neural networks in our segmentation method.Blue and green blocks donate encoder parts, which consist of multi convolution layers, here we used depthwise separable convolution, dilated convolution and standard convolution as filter kernel, respectively.Purple-red blocks constitute the decoder part, it outputs a classification map with the same size as original image.

Figure 2 .
Figure 2.Structure of neural networks in our segmentation method.Blue and green blocks donate encoder parts, which consist of multi convolution layers, here we used depthwise separable convolution, dilated convolution and standard convolution as filter kernel, respectively.Purple-red blocks constitute the decoder part, it outputs a classification map with the same size as original image.

Figure 3 .
Figure 3. Polarized parameters extraction.We extracted 13 polarized parameters in total.They are divided into five groups as the position of the box in the figure, each group was input into neural network for classification.

Figure 3 .
Figure 3. Polarized parameters extraction.We extracted 13 polarized parameters in total.They are divided into five groups as the position of the box in the figure, each group was input into neural network for classification.

Figure 4 .
Figure 4. Convolution kernels for (a) standard kernel, which has a receptive filed of 3 × 3, and (b) dilated kernel with dilation rate = 2, and its receptive field is 7 × 7.

Figure 4 .
Figure 4. Convolution kernels for (a) standard kernel, which has a receptive filed of 3 × 3, and (b) dilated kernel with dilation rate = 2, and its receptive field is 7 × 7.
It helped suppress speckle noise and enhance the edge of dark spots, and some early experiments have proved that Refined Lee Filter could help increase oil spill detection accuracy.Remote Sens. 2019, 11, x FOR PEER REVIEW 15 of 28
parameters and H/A/SERD/Alpha.H/A/Alpha could

Figure 9 .
Figure 9.The results of dark spots area verified by polarized parameters, 1-3 in each group represents emulsion, 2 for biogenic look-alike, 3 for oil-spill area of Image 1, 4 and 5 represent biogenic look alike

Table 1 .
The detailed parameters of segmentation networks.

Table 1 .
The detailed parameters of segmentation networks.

Table 2 .
Details of Synthetic Aperture Radar (SAR) image acquisition.

Table 3 .
Characteristics of polarized parameters in experiments.

Table 4 .
Number of samples of each category.

Table 4 .
Number of samples of each category.

Table 5 .
Mean Intersection over Union (MIoU) result on each classification of polarized parameters experiments.
1scattering coefficient means the combination of correlation coefficients and conformity coefficients.

Table 6 .
Total MIoU results of each group of polarimetric parameters.

Table 7 .
Average MIoU of each classification.

Table 8 .
Total MIoU results of Yamaguchi parameters combined with different SLIC parameters.

Table 8 .
Total MIoU results of Yamaguchi parameters combined with different SLIC parameters.