Oil Spill Detection with Multiscale Conditional Adversarial Networks with Small-Data Training

: We investigate the problem of training an oil spill detection model with small data. Most existing machine-learning-based oil spill detection models rely heavily on big training data. However, big amounts of oil spill observation data are difﬁcult to access in practice. To address this limitation, we developed a multiscale conditional adversarial network (MCAN) consisting of a series of adversarial networks at multiple scales. The adversarial network at each scale consists of a generator and a discriminator. The generator aims at producing an oil spill detection map as authentically as possible. The discriminator tries its best to distinguish the generated detection map from the reference data. The training procedure of MCAN commences at the coarsest scale and operates in a coarse-to-ﬁne fashion. The multiscale architecture comprehensively captures both global and local oil spill characteristics, and the adversarial training enhances the model’s representational power via the generated data. These properties empower the MCAN with the capability of learning with small oil spill observation data. Empirical evaluations validate that our MCAN trained with four oil spill observation images accurately detects oil spills in new images.


Introduction
Frequent oil spill accidents have caused great harm to marine life and national economies in recent years. Accurately detecting oil spills in remote sensing images plays an important role in environmental protection and emergency responses for marine accidents. Among the many monitoring methods that use remote sensing, synthetic aperture radar (SAR) is an essential tool for observing oil spills with its broad view and all-time and allweather data acquisition [1][2][3]. Oil spill detection based on SAR images is an indispensable research topic in the field of ocean remote sensing [4][5][6][7].
The principle of oil spill detection lies in the differential display between oil and water in oil spill observation images [8,9]. As oil spills can weaken the Bragg scattering and result in dark regions in the observation images, numerous researchers are dedicated to analyzing the physical characteristics of oil spills. In particular, the polarimetric characteristics have been effectively used to enhance the comprehensive observation effect [10,11]. Moreover, other kinds of techniques for oil spill detection based on semplice image processing have mainly drawn support from energy minimization [12], which is the optimization objective of energy functions.
Mdakane and Kleynhans [5] achieved efficient oil spill detection with an automatic segmentation framework that combines automated threshold-based and region-based small amounts of data remains a challenge in the literature. To eliminate the dependence on vast observation data of oil spills, we developed a multiscale conditional adversarial network (MCAN) to achieve such a goal with small amounts of training data.
MCAN consists of a series of adversarial networks at multiple scales. Both images of observed oil spills and detection maps are used in coarse-to-fine representations at multiple scales. Each adversarial network of the MCAN at each scale is composed of a generator and a discriminator. The generator captures the observed image's characteristics and produces an oil spill detection map as authentically as possible. The discriminator distinguishes the generated detection map from the reference data. The output of each generator is used as the input of the following finer-scale generator and the current-scale discriminator. The training procedure of each scale is conducted independently in an adversarial fashion.
The three features (i.e., (i) multiscale processing, (ii) coarse-to-fine data flow in a cascade, and (iii) independent adversarial training) enable MCAN to comprehensively capture data characteristics, and they empower it with the capability of learning with small amounts of data. The experimental results validate that MCAN produces accurate detection maps for sophisticated oil spill regions based on only four training data samples. The detection performance of MCAN outperforms those of other methods.
The main contributions of this article are summarized as follows.

1.
We propose a novel oil spill detection method based on MCAN, which employs a lightweight network for each generator and discriminator.

2.
We implement adversarial training independently at each scale and achieve a coarseto-fine data flow of oil spill features in a cascade.

3.
We elaborately set small training data with different characteristics to conduct an experimental evaluation.
The rest of this article is structured as follows. Section 2 describes our MCAN framework and presents the training procedure. Section 3 provides the experimental settings and evaluations. Section 4 discusses the experimental results. Finally, Section 5 presents conclusions about the proposed method and its performance.

Materials and Methods
Adversarial learning with one generator and one discriminator is widely used in the task of semantic segmentation. The generator is trained by a loss function that is automatically learned from the discriminator. The discriminator learns the distribution between the real and generated maps, allowing for flexible losses and alleviating the need for manual tuning.
Oil spill detection with small-data training is a promising and practically significant learning method. It requires a well-designed network and, in the case under study, an efficient training mode in order to learn an effective oil spill detection model. Adversarial learning is adequate for small-data training. This section describes how to detect oil spills via a multiscale conditional adversarial network (MCAN) trained with few samples.

The MCAN Architecture
We denote an oil spill observation image as I 0 , the reference data of the oil spill region as S 0 , and the corresponding oil spill detection map produced by the MCAN as S 0 . Both S 0 and S 0 are binary images in which 0 represents the oil spill and 1 represents the ocean surface. We establish multiscale representations for I 0 , S 0 , and S 0 , starting from the 0-th scale.
The representations I n and S n at the n-th scale are obtained by downsampling (by averaging values) the 0-th-scale data by a factor r n , where r is usually set to 2. We denote by G n the n-th generator network, which produces S n . The size of S n is 1/r n of that of S 0 , and S n is considered as the n-th-scale representation for S 0 . Representations across N + 1 scales from the N-th scale to the 0-th scale form the coarse-to-fine multiscale representation set that is used in the MCAN oil spill detection.
We propose a multiscale conditional adversarial network consisting of a series of generators and discriminators at multiple scales. The generator G n and the discriminator D n process representations at the n-th scale. Figure 1 illustrates the architecture of MCAN. The oil spill observation image I 0 and the oil spill detection map S 0 are the input and output of the overall MCAN framework, respectively. The generator G n takes both the oil spill observation image I n at the n-th scale and the generated oil spill detection map S n+1 at the (n + 1)-th scale as input and produces the generated oil spill detection map S n as the output. The discriminator D n takes (I n , S n ) or (I n , S n ) as input, and it separately outputs the corresponding discriminant scores. D n aims to distinguish between the reference oil spill detection map S n and the generated oil spill detection map S n . Figure 1. Architecture of the multiscale conditional adversarial network (MCAN): G n and D n are the generator and the discriminator at the n-th scale, respectively. An image or detection map at the n-th scale is obtained by down-sampling its original representation n times. D n distinguishes between the reference oil spill detection map S n and the generated oil spill detection map S n . G n takes both the observed oil spill image I n at the n-th scale and the generated oil spill detection map S n+1 at the (n + 1)-th scale as input, and produces the generated oil spill detection map S n as output. Figure 2 shows the architecture of the generator G n at the n-th scale. The inputs of G n are the observation image I n and the detection map S n+1 from the (n + 1)-th scale. By upsampling S n+1 by a factor r, we obtain S ↑ n+1 with the same size of I n . The convolutional network C n processes the pair I n , S ↑ n+1 with five convolutional blocks. Each block consists of three layers, including a convolutional layer, a batch normalization (BN) layer, and a LeakyReLU (or Tanh) layer. The sum of the convolutional network's output and S ↑ n+1 forms the output of G n , i.e., an oil spill detection map S n at the n-th scale. The operation within G n is: Figure 2. The generator G n at the n-th scale. Each solid rectangle consists of a convolutional layer, a batch normalization (BN) layer, and a LeakyReLU (or Tanh) layer.
The sum C n (I n , S ↑ n+1 ) + S ↑ n+1 is a typical residual learning scheme that enhances the representational power of the convolutional network. At the n-th scale, the generator G n aims to produce the oil spill detection map S n as authentically as possible.
At the coarsest scale, the generator G N only takes the oil spill observation image I N as input and outputs its oil spill detection map S N , which is represented as follows: (2) Figure 3 shows the architecture of the discriminator D n at the n-th scale. The network used in D n has five convolutional blocks. Each of the first four blocks has a convolutional layer, a batch normalization (BN) layer, and a LeakyReLU layer. The fifth one only has a convolutional layer. The input of D n is either I n , S n or I n , S n . The output of D n is a discriminant score X n that reflects the confidence on the detection map: X n = D n (I n , S n ) for generated S n as input, D n (I n , S n ) for reference data S n as input, where X n is the average of the feature map F n that is output from D n 's final convolutional layer. Each element of F n corresponds to a patch of the input image. The last output of D n integrates all of the elements of F n and aims to classify if each patch in the input image is real or generated. D n penalizes the generated pair I n , S n and favors the actual pair I n , S n . At the n-th scale, the discriminator D n tries to distinguish the generated S n from the reference data S n .

Training MCAN
The training of MCAN is conducted hierarchically from the N-th scale to the 0-th scale. At each scale, the training is performed independently in the same manner using a Wasserstein GAN-gradient penalty (WGAN-GP) loss [39] for stable training. Once the training at the (n + 1)-th scale has concluded, the generated detection map S n+1 is used for the training at the n-th scale.
The training loss for the generator G n is: where λ 1 is a balance parameter. The term −D n (I n , S n ) is the adversarial loss that encourages G n to generate a detection map S n that is as close as possible to the reference data S n . The term S n − S n 1 is the 1 norm loss. It penalizes the per-pixel distance between the reference data S n and the generated S n . Minimizing (4) trains G n to generate detection maps as authentically as possible, and finally, they successfully fool the discriminator. The training loss for the discriminator D n is given as: where λ 2 is a balance parameter, and S n denotes a random variable that samples uniformly between S n and S n . The term D n (I n , S n ) − D n (I n , S n ) represents the adversarial loss that strengthens the discrimination power of D n ; it makes D n try its best to classify S n as false and S n as true. The term ( ∇ S n D n (I n , S n ) 2 − 1) 2 is the gradient penalty loss. It results in stable gradients that neither vanish nor explode [39]. Minimizing (5) trains D n to distinguish the generated detection map from the reference data. The training procedure of the proposed MCAN is described in Algorithm 1. The input consists of original SAR images and their corresponding reference data of oil spill detection results. The output is the trained parameter set of MCAN. For example, consider the training sample I 0 and its reference data S 0 . If MCAN is set to have three scales, I 0 and S 0 are downsampled to obtain (I 1 , S 1 ) and (I 2 , S 2 ). The training procedure is conducted from generator G 2 and discriminator D 2 . Firstly, the output of G 2 is computed by S 2 = G 2 (I 2 ). Secondly, D 2 takes S 2 and S 2 separately as input. The parameters of D 2 are updated by Equation (5), and the parameters of G 2 are updated according to Equation (4). Thirdly, S 2 and I 1 are concatenated as the input of G 1 to obtain the output S 1 = G 1 (I 1 , S 2 ). D 1 and G 1 are updated as Equation (5) and Equation (4), respectively. Finally, S 1 and I 0 are concatenated as the input of G 0 to obtain the output S 0 = G 0 (I 0 , S 1 ). D 0 and G 0 are updated by Equations (5) and (4), respectively. At this point, the two images I 0 and S 0 go through one training iteration to update the parameters of MCAN. The next training sample pair will follow the same procedure as that for I 0 and S 0 . Input a downsampled sample I n and S n+1 from the (n + 1)-th scale Compute S n = G n (I n , S n+1 ) train D n : Compute L D n as Equation (5) and update the parameters of D n train G n : Compute L G n as Equation (4) and update the parameters of G n end for end for

Rationale
The capability of MCAN of learning with small data is threefold.
• Firstly, the multiscale strategy comprehensively captures the characteristics of oil spills. The multiscale representations characterize oil spills from the coarsest representation at the N-th scale, which reflects the global layouts, to the finest representation at the 0-th scale, which has rich local details. It comprehensively depicts oil spills from both the global and local perspectives and exhibits a representational diversity with few samples. • Secondly, the multiscale learning strategy intrinsically takes advantage of the multiscale representational diversity of the small oil spill data to hierarchically train multiple generators and discriminators. The cascaded coarse-to-fine data flow enhances the model's representational power due to the benefit of the processing scheme, in which the output of each generator is used as the input of the following finer-scale generator. • Thirdly, data diversity is further increased via the adversarial training in a multiscale manner, where the data generated at one scale are used in training at the subsequent finer scale.
Therefore, MCAN comprehensively mines the characteristics of oil spills on a smalldata basis, providing an effective oil spill detection strategy in a situation of limited observations.

Experimental Settings
We evaluated the performance of the proposed multiscale conditional adversarial network oil spill detection method on actual SAR images. We compared the performance of MCAN with that of three typical detection methods: adaptive thresholding (AT) [45], level set (LS) [46], and a conditional generative adversarial network (CGAN) [42]. This comparison used images of the same size as the training set. The network architecture of CGAN was set to the same size as the finest scale of MCAN. These three alternative methods are representative of the thresholding, energy minimization, and adversarial learning approaches, respectively. We also compared the detection performance of MCAN with that of two full convolutional methods-namely, FCN [34] and U-net [44]-by using images with different sizes from the training set.
We obtained all of the oil spill observation images used in the experiments from the NOWPAP database (http://cearac.poi.dvo.ru/en/db/ (accessed on 16 June 2021)). The training set and test set were composed of oil spill image patches from larger satellite SAR images acquired by ERS-1, ERS-2, and Envisat-1. We did not perform any preprocessing on the images, and they were used directly in the training procedure. We implemented MCAN with the pytorch framework on a PC server with an NVIDIA Tesla K80 GPU and 64 GB memory.
We used the same training set and hyperparameters for CGAN, FCN, U-net, and MCAN to grant fair comparisons. Figure 4 shows the training set of four oil spill pairs with a size of 256 × 256 pixels. Each pair consists of an original oil spill SAR image (top) and its corresponding reference data (bottom). An expert produced the reference data by analyzing the images pixel by pixel.  The top-row images of Figure 4 have different characteristics, and the oil spills have distinct shapes, thus providing diversity in the training. Figure 4a has low signal-to-noise ratio (SNR). Figure 4b has a high SNR. The oil spill in Figure 4c has an intricate shape with strong interference spots. Figure 4d is an elongated oil spill.
The test set consisted of another 30 oil spill pairs that were disjoint with the training set. The test set had 26 pairs with the same size as the size of the training set and four pairs with different sizes. MCAN was trained with 100 epochs, and the iteration of each scale was set to 1. All of the data pairs in the training and test sets were images with values scaled to [0, 1].
We trained each generator G n and each discriminator D n with the Adam optimizer using β 1 = 0.5 and β 2 = 0.999. The learning rate for each network was 0.0005, and the minibatch size was 1. The balance parameter of the 1 -norm constraint was λ 1 = 10, and the gradient penalty weight for the WGAN-GP loss was λ 2 = 0.1.
To select the optimal scale number for input images, we conducted a preliminary experiment. The preliminary experiment was implemented with the same training set and test set as the following experiment, and we used F1-score described in Section 3.2 to evaluate the performance. Table 1 presents the average performance and computing time for different scales. When r N > 4, I N is too small to provide effective global layouts. In this scenario, the N-th scale is not beneficial to the training as N increases. In our experiments, we set r = 2 and N = 2. The parameter settings of the generator's network architecture at each scale are described in Table 2. The parameter settings of the discriminator's network architecture at each scale are described in Table 3. The kernel size is the parameter of the convolution kernel for each block. The stride is the step size of the convolution kernel slide. Padding is the parameter of filling the areas around the image with zeros.  Tanh   Table 3. The parameter settings of the network architecture at each scale of the discriminator.

Evaluation Criteria
We compared the performance of MCAN and other methods by using the accuracy, precision, recall, and F1-score of their detection maps. W denote S (i) and S (i) as the elements of S 0 and S 0 , where i is the pixel index. Let S TP , S FP , S TN , and S FN denote the numbers of pixels satisfying S (i) + S (i) = 0 (true positive), S (i) − S (i) = −1 (false positive), S (i) + S (i) = 2 (true negative), and S (i) − S (i) = 1 (false negative), respectively. These evaluation measures are given by: F1-score = 2 · Precision · Recall Precision + Recall . (9) As shown in the formulas above, each of the four evaluation criteria has its own emphasis. Accuracy is a comprehensive measurement for detection maps that simultaneously considers the oil spill and ocean surface detection. Precision is the proportion of coincidences between the reference data and the generated detection map. Recall is the proportion of the detection map that is correct in the reference data. F1-score integrates Precision and Recall.

Detection on Actual Oil Spill Images
In this section, we present the detection performance of the six methods. To compare the detection performance of the deep-learning-based methods, we separately trained CGAN, FCN, U-net, and MCAN with the same training samples. Figure 5 shows the detection results for four representative test images with large areas of oil spills. Figure 6 shows the detection results for the other four representative test images with small areas of oil spills. Figure 7 shows the detection results for four test images with different sizes.
As shown in Table 4, we computed the measures presented in Section 3.2 to evaluate the performance of AT, LS, CGAN, and MCAN on each actual oil spill image. The best results are highlighted in gray cells. Table 5 presents the average measures on the overall test images with the same sizes and comparisons between the four methods. The best results are highlighted in gray cells.

Input
Reference The oil spill images in Figure 6a,b had a low contrast. The oil spill image in Figure 6d had a blurry boundary.  Table 6 presents the measures on the test images with different sizes and comparisons between three methods. The sizes of four oil spill images are 944 × 912, 352 × 404, 400 × 420, and 398 × 398 pixels. The best results are highlighted in gray cells.  Figure 8 shows the boxplots of the accuracy, precision, recall, and F1-score for the four detection methods. MCAN had a better accuracy, precision, and F1-score than the other three methods. Our proposal was surpassed only in recall by LS, but the difference was negligible (64.0 % and 66.5 %, respectively). Globally, the two most competitive techniques were LS and MCAN, but our approach was better and/or less variable than LS.

Qualitative Evaluation
As illustrated in Figure 5, the input SAR images contained large oil spills. The oil spill images in Figure 5a-c had intricate oil spill shapes. Additionally, the oil spill image in Figure 5d exhibitsd a simple shape: a long strip. The detection results of AT and the reference data had similar shapes. Nevertheless, the salt-and-pepper effect produced by the speckle affected the quality of the detection results. The detection results of LS either missed or over-segmented the oil spill regions. From the comparison of two deep-learningbased methods, MCAN produced more accurate detection maps that were also visually closer to the reference data. The detection maps generated by MCAN had less superfluous areas than those produced by CGAN.
As illustrated in Figure 6, the input SAR images had small oil spills. The oil spill images in Figure 6a,b had a low contrast. The oil spill image in Figure 6d had a blurry boundary. In this scenario, AT, LS, and CGAN performed poorly, while MCAN generated accurate detection maps. The detection results of MCAN were closer to the reference data.
As illustrated in Figure 7, the input SAR images had large areas and different sizes. The oil spill images in Figure 7a,b had small and fragmented oil spill targets. Figure 7c had intricate oil spill shapes, and there was a land interference area on the right side of the image. Figure 7d exhibits a long strip and low contrast. In this scenario, the detection results of MCAN were closer to the reference data. MCAN and U-net performed better than FCN in terms of suppressing the interference of the land area in Figure 7c.
There is, thus, qualitative evidence that MCAN performs better than the other methods. Table 4 presents the performance measures of the four detection methods on eight oil spill images. The detection maps based on the MCAN had higher evaluation scores compared with those based on the AT, LS, and CGAN. For each test image, the highest evaluation scores for the four criteria are marked by a gray background. It is obvious that the MCAN-based detection method had the highest values. Table 5 shows the average performance of the four detection methods on all test images with the same size. MCAN outperformed the three other methods on the whole. The detection maps based on MCAN had the highest average evaluation scores in terms of accuracy, precision and F1-score. Recall represents the proportion of the detection map that is correct in the reference data. The other methods had higher average recall scores because of their over-segmentation. Although their detection results had more areas that overlapped with the reference detection results, they had more superfluous areas and shape distortions. MCAN had a slightly lower recall score, but obtained accurate detection results with less superfluous areas. Table 6 shows the performance of three detection methods on test images with different sizes. When the proportion of the oil spill area in a large image is small, accuracy is not a good indicator. We only utilized the other three criteria to measure the performance of the different methods. It was obvious that MCAN had higher evaluation values than those of FCN and U-net.

Quantitative Evaluation
The data used to draw the boxplots in Figure 8 are from Table 4. MCAN had higher evaluation scores and detection robustness on the whole. There was, thus, also quantitative evidence that MCAN enhanced the detection of oil spills in a variety of scenarios.
The qualitative and quantitative evaluations validated the advantages of the multiscale architecture. This relaxed the training data requirement, which, according to [44], is at least twenty training samples. It was noted that MCAN achieved high oil spill detection accuracy with only four training data pairs. Therefore, MCAN provides an efficient vehicle for addressing oil spill detection with limited training data.

Conclusions
We developed a multiscale conditional adversarial network (MCAN) that is able to adversarially learn an oil spill detection model with a limited amount of training data. This limitation is the most frequent in practice. MCAN consists of a series of adversarial networks, and each adversarial network of the MCAN is composed of a generator and a discriminator. The generator aims to produce an oil spill detection map as authentically as possible. The discriminator tries its best to distinguish the generated detection map from the reference data.
MCAN effectively incorporates the oil spill images' multiscale characteristics and benefits from the adversarial strategy in order to enhance its representational power. The trained MCAN is capable of generating reliable detection maps based on only four training data pairs. The experimental results validated that MCAN can accurately detect intricate oil spill regions with minimal training data. We have released our code for public evaluation in order to support reproducibility and replicability in remote sensing research [47].
In addition, there are some aspects that can be improved in future research. For example, the proposed method is incapable of performing well on images with various oil spill lookalikes, with no pollution at all, or with very small percentages of oil spills. These aspects are also commonly acknowledged difficulties and need to be further researched by expert peers.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: