Oil Spill Detection in SAR Images Using Online Extended Variational Learning of Dirichlet Process Mixtures of Gamma Distributions

: In this paper, we propose a Dirichlet process (DP) mixture model of Gamma distributions, which is an extension of the ﬁnite Gamma mixture model to the inﬁnite case. In particular, we propose a novel online nonparametric Bayesian analysis method based on the inﬁnite Gamma mixture model where the determination of the number of clusters is bypassed via an inﬁnite number of mixture components. The proposed model is learned via an online extended variational Bayesian inference approach in a ﬂexible way where the priors of model’s parameters are selected appropriately and the posteriors are approximated effectively in a closed form. The online setting has the advantage to allow data instances to be treated in a sequential manner, which is more attractive than batch learning especially when dealing with massive and streaming data. We demonstrated the performance and merits of the proposed statistical framework with a challenging real-world application namely oil spill detection in synthetic aperture radar (SAR) images.


Introduction
The use of statistical machine learning has proliferated in many fields, especially to solve a broad range of problems ranging from signal processing, speech recognition, to geosciences and remote sensing where strong models are needed to apply statistical methodology. In the case of geosciences and remote sensing, for instance, statistical machine learning techniques have been deployed successfully in a variety of problems and applications in many parts of the earth system and beyond [1]. In particular, images modeling (e.g., SAR images) has received much attention due to its importance and applications in real world nature tasks related to land, climate, disturbance attribution, vegetation dynamics, urbanization, etc.
Among the probabilistic generative models, the so-named finite mixtures have been successfully applied due to their capability to represent large-scale complex probability densities and to offer a principled way for analyzing missing data [2,3]. Mixture models provide, in general, a formal approach to unsupervised learning and allow, in particular, to select the optimal number of clusters for a given dataset. This fact has been largely detailed in the literature (see, for example, [4,5]). This growing interest has led to developing several fascinating and flexible mixture models such as Gaussian-based mixture models (GMM) which have became popular even though they are not the most appropriate for fitting complex non-Gaussian shapes [6,7]. To deal with conventional GMM limitations, many other alternatives, such as Gamma (GaMM) mixtures [8][9][10][11], have shown to perform significantly better than GMM [12] thanks to its compact analytical form which is able to Accidents on offshore oil platforms, refineries, and pipeline can cause serious oil spills. However, these accidents represent only 5% of the total oil pollution worldwide, and 95% are caused by illegal discharges by ships that prefer to dispose, cheaply, of oil residues in their tanks (according to many studies such as the European Space Agency) [24][25][26]. Oil pollution may result from several sources such as industrial discharges, oil production, natural oil seepage, and urban runoff. Natural slicks are of bacterial or biological decomposition or geological origin. Oil spills can devastate naval life as well as harm humans and animals by reducing dramatically air-sea exchanges processes, such as surface evaporation. Oil spills are then of great public, political and scientific concern. Therefore, there is an urgent need to monitor and detect oil spills on ocean so as to facilitate government decision making. The detection of these oil spills is considered an important and challenging problem to effectively conduct countermeasures. An effective approach is the use of satellites which provide radar images of the sea surface (500 × 500 km 2 in a single image). Satellites radar images supply an occasion to monitor coastal waters day and night, regardless of weather conditions allowing an early warning of oil spills. Moreover, satellite detection is well adapted to this kind of problems by producing images of difficult access areas [24]. Among different satellite imagery technologies, active microwave sensors such as synthetic aperture radar (SAR), has been frequently investigated for remote sensing of oil pollution [27]. The synthetic aperture radar emits and receives radio wave in order to acquire a representation of the target scene. Detecting oil spill in SAR images (as shown in Figure 1) is very complex procedure that involves many steps [26].  Taiwan (right image). These images (area: 100 km × 100 km) showing an oil spill [28].
For several decades, extensive works have been provided [27,29,30] to distinguish oil slicks from natural biogenic slicks via analyzing satellite radar images. Most of conventional oil slick (or dark objects) detection procedures are carried out in three steps: (1) a presegmentation of dark spot, (2) the extraction of dark spot feature, and (3) a classification step of these dark spots. Some early and recent review articles summarize different oil slick detection methods [26,28,31]. These reviews state that most methods are based on using statistical patterns to discriminate between oil slicks and look-alikes under varying conditions. They conclude also that the automatic and accurate discrimination between oil spills and look-alikes is a challenging problem and need more investigations in the future. On the other side, a lot of efforts have been devoted to apply classic classifiers and descriptive statistical approaches learned from training data [25,30,[32][33][34]. These works rely on highly trained human operators to asses and verify each region in a given SAR image. In [33], authors proposed a one-class based approach for image classification to detect oil-spill. First of all, a preprocessing step is used to identify related areas to oil spills. A feature selection step to select relevant features is also performed given that the contrast between spill's region and the surrounding regions depends on the type and amount of oil and other environmental factors (i.e., wave height, wind speed, and sea). Finally, a one-class classifier is used to detect oil spills. A geometric level-set based segmentation method of oil spills and illegal oil discharges was developed in [35]. According to this work the regions in SAR images can be classified into pure oil spills or look-alikes on the basis of the following measurements: orientation, area, shape complexity, perimeter, eccentricity, and mean border gradient. In [36], a region-based method was also proposed. It involves both conventional detection theory and image segmentation techniques (such as N-nearest-neighbor) to have more accurate results. In [37], authors developed an adaptive thresholding-based algorithm to classify each slick as oil or look-alike. Here, involved features are derived from shape (slick complexity, width, area, moment), slick surroundings, contrast (slick local contrast, border gradient, smoothness contrast), and slick homogeneity. Their algorithms have been trained on two datasets, namely Radarsat and Envisat Advanced Synthetic Aperture Radar (ASAR) images. Fuzzy classifiers have been also used in [38] to identify all possible oil spills (dark patterns) in SAR images. A set of operations based on the fuzzy theory are used to establish the likeness of each candidate to be an oil spill or not. In the last few years, artificial neural network algorithms have been broadly applied in the context of remote sensing image segmentation and classification. Indeed, authors in [39][40][41][42][43] proposed different neural network-based methods (like CNN and Deep NN) in order to improve oil spill detection and classification. Some other notable interesting CNN-based oil spill detection and classification frameworks include the works in [44,45].
While considerable progress has been made in this field over the past few years, designing more robust tools still needs wide amounts of specialized knowledge and manual work. The goal here is to propose a method based on a nonparametric Bayesian model (infinite model) as well as to learn it using variational inference. Our main contributions are summarized as follow: First, we start by extending the finite Gamma mixture to the infinite case via a nonparametric Dirichlet process prior such that the problem of selecting the suitable number of clusters is solved fashionably. Then, we investigate the developed approach for remote sensing image classification. Indeed, after extracting effective features as in [46], we shall focus on modelling and classifying oil spills and other similar sea surface features using the infinite mixture model. The merits of our approach have been demonstrated using real datasets.

Statistical Model Specification
In this section, we present our developed variational learning approach based on the infinite Gamma mixture model.

Finite Gamma Mixture Model
These feature vectors are supposed to be drawn from a mixture of Gamma distributions with parameter Θ. Let M denotes the number of mixture's components. Y i (i = 1, . . . , N) are independent and identically distributed (iid). The density function of multi-dimensional Gamma distribution is defined as follows: where θ = {α d , β d } is the set of parameters of the distribution such that α d denotes the shape and β d the location parameter. Here, Γ(.) is the Gamma function which is given as: Suppose that the D-dimensional random vector Y i (observed data) is drawn from a finite mixture of Gamma (GaMM) distributions and consisting of M components which is established to model the data with different shapes. The probability density function (pdf) of a GaMM is then given as: where Θ = {θ 1 , θ 2 , . . . , θ M , π 1 , . . . , π M }. The parameters of the j th mixture component is represented by θ j = {α j , β j }. π j is the vector of the mixing weights subject to 0 π j 1, and ∑ M j=1 π j = 1.

Infinite Gamma Mixture Model
The Dirichlet process (DP) is a stochastic process with a positive scaling factor and base distribution used in Bayesian nonparametric models of data, notably in infinite mixture models. The DP is an effective concept for various applications (for more details please refer to [47]). In this section we address the issue of assuming an infinite number of components. In order to solve properly this problem which is important for well describing the observed data without over-or under-fitting, we propose a Dirichlet process mixture of Gamma distributions. In other words, we construct our infinite model by following the principle of Dirichlet process (DP) through stick-breaking representation [48,49]. Thus, the number of components is intended to be infinite. In this case, let's denote G a Dirichlet process distributed with a base distribution H and a concentration parameter ψ. The construction of G ∼ DP(ψ, H) is defined as where δ Ω j represents the Dirac delta measure centred at Ω j . The proportions π j are determined by cutting a unit length stick, regularly, into an infinite number of pieces such that ∑ ∞ j=1 π j = 1 and ψ is a real number. Consequently, the infinite mixture model of Gamma distributions Y is expressed as Subsequently, a latent variable Z i = (Z i1 , Z i2 , . . .) is introduced for observed data Y. These latent membership vectors are used to point out if the vector Y i belongs to component j (Z ij = 1) or not (Z ij = 0). Now, the complete-data likelihood is expressed as According to the stick-breaking construction of DP (see Equation (3)), π j can be expressed as a function of λ j and after replacement, we have the following: The resulting complete-likelihood of the infinite Gamma mixture is finally expressed as (including latent variables):

Batch Variational Bayesian Learning
It is noteworthy that, when dealing with intractable models, variational inference is presented as a powerful deterministic alternative to approximate posteriors and likelihoods. In this section, we propose to develop a variational learning method to approximate inference for the DP, where the truncated stick-breaking construction [50] is applied to derive an approximate posterior and to estimate the model parameters. On the other side, we proceed by determining an approximation Q(Θ) for true posterior p(Θ | Y) such that Θ = {Z, α, β}. After that, we use the well-known KL divergence in order to reduce the difference between Q(Θ) and p(Θ | Y): . From Equation (9), it is possible to deduce that L(Q) ≤ lnp(Y ) and so L(Q) is a lower bound to lnp(Y ). However, it is difficult to solve the true posterior which cannot be directly estimated because of the complexity of calculation. We get around this matter by taking into account a restricted family of Q(Θ) that can be calculated [21]. In particular, the mean field theory [51] is adopted to factorize Q(Θ) into different tractable distributions such that To maximize L(Q), we apply variational methodology with respect to each Q i (Θ i ). Then, the optimal form of Q i (Θ i ) denoted by Q s (Θ s ) is given as where . j =s is the expectation value of Q, with respect to all Q i (Θ i ) excluding that case of j = s. It is noted that we have to take into account the truncation of the stick-breaking representation [49] to take advantage of the bound. Therefore, we take λ M = 1 and π j = 0 when j > M which leads to ∑ M j=1 π j = 1.

Prior Distributions for Parameters
To complete the probabilistic formulation, we have to place proper conjugate priors over the parameters λ, α and β. In particular, the Beta distribution is selected for the parameter λ (referring to Equation (3)) as follow Here, the hyperparameters of the Beta distribution is denoted by ψ = (ψ 1 , ψ 1 , . . . ) [52]. Moreover, it is possible to assign a conjugate Gamma prior to ψ: For α and β, a prior Gamma distribution is imposed for them as suggested in [8] which is reasonable given that α and β are positives and also Gamma density is assumed to be too flexible and simple distribution to be selected as prior.
Following the graphical model in Figure 1, the resulting joint distribution is expressed as

Learning Algorithm
As explained at the beginning, the objective of this work is to approximate the true posterior p(Θ | Y) with a new tractable approximation denoted by Q(Θ). Furthermore, the optimal solution of variational learning is reached while maximizing the lower bound w.r.t Θ = {Z, λ, α, β}. The factorization of Q(Θ) (while taking into account the truncation M) leads to following parametric form which optimal solution is presented in Appendix A: Once the optimal variational factors are in hand, the calculation of the lower bound L(Q) is then straightforward.   Update the variational solution for the factor Q(ψ) using Equation (A6). 10 Update the variational solution for the factor Q(λ) using Equation (A7).

11
Update the variational solution for the factor Q(α) using Equation (A10).

12
Update the variational solution for the factor Q(β) using Equation (A13). 13 until Until convergence is reached 14 Calculate the expected value of λ j using Equation (A9). Then estimate the mixing coefficients according to Equation (3). 15 Return the optimal number of components M opt by eliminating the components with small mixing coefficients close to 0.

Online Variational Bayesian Learning
Early warning and immediate detection of oil spills has many advantages such as immediate response and reducing damage to the environment. The development of realtime monitoring and detection system is of great importance in order to minimize the volume of oil spilled. To address this problem, we propose to develop an online learning approach which is being commonly used in many other areas especially when data points are continuously arriving over time [53]. The online setting is particularly useful for incrementally training the system by feeding instances of data sequentially. It also has the benefit of making the learning process easier and faster than batch mode.
In what follows, we extend the batch variational method (presented in previous section) for unsupervised SAR images classification to an online setting. This process requires updating the model's parameters incrementally without degrading its efficiency and flexibility. To determine the lower bound, we suppose that we have at time t a fixed set of observed data. At time t + 1, a new SAR image Y N+1 comes out and is added to the dataset, hence, the mixtures' parameters have to be updated accordingly. Thus, in online setting, the lower bound at time t is expressed as in [54]: where Ω = {α, β}.
where u * (t) ∆ is the natural gradient of each hyperparameter in the previous equation. ρ t denotes the learning rate [55] expressed by following equation: where ∈ [0.5, 1] and η ≥ 0. This helps to guarantee convergence [55]. Please note that the expectation in the above mentioned equations are obtained with same manner as for the case of batch setting in the previous section and as in [56]. Since the online learning framework can be considered as a stochastic approximation algorithm, the convergence is ensured as prove in [53]. The proposed and developed online variational algorithm is presented in Algorithm 2. Compute the learning rate using Equation (25). 9 Calculate the hyperparameters using Equation (24). 10 Update the variational solutions for all factors Q (t) (α), Q (t) (β) and Q (t) (λ) using Equations (21)-(23).

11
Repeat the variational E-step and V-step until new data is observed 12 until for t = 1 to N

Data Sets
The main objective of this section is to investigate our developed online extended variational learning framework of Dirichlet process mixture of Gamma distributions to detect oil spills in several SAR images. The second objective is to compare the performance of the proposed statistical framework with other methods from the state-of-art. First, it should be noted that one of the challenges is the lack of already common data sets for oil spill detection and this problem has been addressed by many relevant research communities such as [57,58]. Very limited data sets have been proposed in the literature, and therefore, it is too difficult to compare between published results since each method uses different data sets with different settings. In this work, we are essentially concerned with two challenging SAR databases. The first data set is the SAR images containing oil spills collected via the European Space Agency (ESA) database [40] which is composed of 1112 images with 5 different classes: Land, Look-alike, oil-spill, ships, and sea surface. The second one is a labelled SAR dataset taken from Sentinel-1 wave mode (TenGeoP-SARwv) [59] Figures 3 and 4 show examples of images from these two datasets, respectively. For experiments, we randomly select half of the dataset as the training set and the rest for testing. In order to quantify how well SAR images are classified, we report the results in terms of average accuracy metric and false positive rate (FPR).
Modeling and classifying SAR requires powerful statistical models to represent their content (ex. color, texture). In this work we shall focus on the problem of SAR images modeling and classification via extracting local features that describe accurately input images. Indeed, feature extraction step is a part of the dimensionality reduction process that has been broadly studied in the past. It has an important role in many computer vision applications since it helps identifying the most discriminating characteristics, reducing ambiguity and enhancing the performance. However, the presence of speckle noise in synthetic aperture radar (SAR) images, as well as low-resolution between regions (surfaces) and poor contrast, make extracting relevant features too difficult. Thus, if the representative features are well extracted, then we can correctly interpret and classify images. Extracting local features from grey-scale images is a well-studied step in the fields of image processing and computer vision and various comparative measures have been studied for many years. The study of prior techniques is not within the scope of this paper. However, we suggest applying two successful methods of features extraction. The first one is based on imageNet pretrained deep learning model (resnet50) [60]. The flowchart diagram for extracting features using resnet50 is given in Figure 5. For each SAR image in the flowchart, we first apply different image processing operations like adjusting contrast value, thresholding, object edge detection by blurring noise and small objects. After this step, based on the number of detected dark spots, we extract different features including geometrical characteristics and texture of the object. Finally, we store the extracted features for the model evaluation. In the second approach, we extract a number of features based on geometrical characteristics, physical behavior, and those related to oil spill context of the dark formations as described in [61]. After extracting features, we applied principal component analysis (PCA) to reduce dimensionality of extracted datasets features.

Results and Discussion
Next, we apply our online extended variational algorithm (Section 5) over the extracted features. Thus, each image is represented by an infinite Gamma mixture model. We average the results over 30 runs to evaluate and compute the final performance. Tables 1 and 2 show the average classification accuracy and false positive rate (FPR) of our InGaMM-eV model. They are obtained with different classes in both datasets and by using two features extraction methods. Indeed, we considered a first experiment where the goal was to distinguish between oil spills versus the rest and a second one where the goal is to categorize some classes from each data set (4 categories are taken from the first data set and 9 from the second one). The testing data is assumed to arrive sequentially in an online mode.   Figures 6 and 7 present the confusion matrices for SAR images classification computed by the proposed InGaMM-eV using the two features extraction methods, respectively. It is noted that these matrices are used to describe the performance of the proposed model since they record true positives, false positives and false negatives. In fact, each matrix summarizes the prediction results on a classification problem and it offers a clear idea of what the proposed model is working correctly and what kinds of errors it commits. Each entry of index (u, v) represents the number of images in class u that are affected to class v. According to these results, the average classification accuracy is very promising and is equal to 90.57% (error rate of 9%) for the first dataset and 95.16% (error rate of 4%) for the second dataset. Table 1. Results for both dataset with different number of classes using first feature extraction approach (ImageNet pretrained (resnet50) features).

Datasets
No of Class Accuracy (%) FPR  6 and 7 present additional results obtained by changing the way visual features are extracted as well as the number of classes. Indeed, for the case of ESA-SAR dataset, InGaMM-eV provides high average accuracy of 97.96% using imageNet pretrained deep learning model (resnet50), and 89.94% using Dark spots, geometrical, physical char-acteristics features. In both cases, the false positive rate is very low. For Sentinel-1 wave mode SAR dataset, the average accuracy to classify SAR images is 95.16% using resnet50, which is better than the second method for extracting features (only 88.68%). According to these results, we notice that the overall average classification accuracy is very encouraging, taking into account the complexity of treated images. It is noteworthy that, due to low resolution of images in the second dataset (Sentinel-1 wave mode), it was very difficult to extract features using the second feature extraction method (i.e., detecting dark objects). Thus, we have low accuracy than expected for this dataset.  In this experiment, our second goal is also to demonstrate the advantages of using extended variational framework over the maximum likelihood (via EM-algorithm), as well as the merits of infinite mixture model over its finite counterpart. Therefore, we compared the classification results using the following mixture models: InGaMM-eV (our infinite Gamma model using extended variational inference), GaMM-eV (finite Gamma model using extended variational learning), GaMM-EM (finite Gamma mixture model using expectation maximization learning), InGMM-eV (infinite Gaussian model using extended variational learning), and GMM-EM (finite Gaussian mixture model using expectation maximization learning). The average performances of all tested learning approaches, using the two features extraction methods, are depicted in Tables 3 and 4. We can see clearly that the extended variational approach provides better results than the EM. Furthermore, the merits of using a Dirichlet process mixtures of Gamma distributions (i.e. infinite mixture model ) over a finite mixture model is clear by noting that better result was found with the infinite mixtures. In particular, in Table 3, the InGaMM-eV (90.05%) outperforms GaMM-eV (88.33%) in terms of classification accuracy rate for both datasets. On the other side, it is worth mentioning that our approach provides better results than the implemented frameworks based on Gaussian mixtures. We can then deduce that the infinite Gamma model has better modeling and classification capability than the Gaussian when dealing with SAR images analysis. Table 3. Overall oil spill detection rate of different models for 2 datasets using the first feature extraction approach (ImageNet pretrained (resnet50) features) .

Dataset
InGaMM-eV (Our Approach) GaMM-eV GaMM-EM InGMM-eV GMM-EM Next, The proposed learning approach (InGaMM-eV) is compared with some methods from the literature and the comparative study is presented in Table 5. As we can see, the proposed online algorithm performs better than other algorithms. Accordingly, it is important to emphasize the advantage of our developed extended variational formalism for infinite Gamma mixture, which can provide interesting results. It is also important to underline the merit of the online learning process, which is able to maintain high performance of oil spill prediction as well as handling data faster as they arrived. Moreover, it has the capacity to update the model incrementally without the need for retraining. All these results confirm that the proposed infinite Gamma mixture using the extended variational learning mode is a better choice thanks to the flexibility of the infinite Gamma mixture over the finite models. All these benefits make it more appropriate especially for SAR images classification especially in the case or large scale data sets.  [34] Sentinel-1 SAR Dark spot, shape features 87% Method in [63] Sentinel-1 SAR Dark spot features 81% Method in [64] Sentinel-1 SAR Dark spot, shape features 82.61%

Conclusions
In this paper an effective online nonparametric Bayesian analysis method based on Dirichlet process mixture of Gamma distributions (i.e., infinite Gamma mixture model) is developed to deal with the challenging problem of oil spill detection in SAR images. The Gamma distribution is considered because of its flexibility for semi-bounded data modelling. This framework is learned using an extended version of conventional variational inference in a flexible way which has certain advantages such as approximating the posteriors effectively in a closed form, easy assessment of convergence and easy optimization by offering a trade-off between frequentist techniques and MCMC-based ones.
An important property of our approach is that it does not need the specification of the number of mixture components in advance. The proposed online algorithm has also the benefit to allow data instances to be treated in a sequential manner, which is more attractive than batch learning especially when dealing with massive and streaming data. Through the challenging application of oil spill detection in SAR images, we have demonstrated the performance of our statistical framework, which is able to provide very encouraging results in terms of SAR images modeling and classification capabilities. As future work, we plan to integrate a feature selection mechanism into the proposed framework in order to improve more the classification accuracy. It is our hope that many other real-world applications related to image processing and machine learning can be addressed via our developed framework.  Data Availability Statement: Data available in a publicly accessible repository: The first data set: European Space Agency (ESA) database [40]. The second data set: Sentinel-1 wave mode (TenGeoP-SARwv) [59].
The expectation of Z ij is determined as: (2) Optimal solution to Q(ψ) and Q(λ).
From the previous equations, we obtain the following expectations: where Ψ is Digamma function.
From the previous equations, we obtain the following expectations: Optimal solution to Q( β).
From the previous equations, we obtain the following expectations: