Oil Spills or Look-Alikes? Classiﬁcation Rank of Surface Ocean Slick Signatures in Satellite Data

: Linear discriminant analysis (LDA) is a mathematically robust multivariate data analysis approach that is sometimes used for surface oil slick signature classiﬁcation. Our goal is to rank the effectiveness of LDAs to differentiate oil spills from look-alike slicks. We explored multiple combinations of (i) variables (size information, Meteorological-Oceanographic (metoc), geo-location parameters) and (ii) data transformations (non-transformed, cube root, log 10 ). Active and passive satellite-based measurements of RADARSAT, QuikSCAT, AVHRR, SeaWiFS, and MODIS were used. Results from two experiments are reported and discussed: (i) an investigation of 60 combinations of several attributes subjected to the same data transformation and (ii) a survey of 54 other data combinations of three selected variables subjected to different data transformations. In Experiment 1, the best discrimination was reached using ten cube-transformed attributes: ~85% overall accuracy using six pieces of size information, three metoc variables, and one geo-location parameter. In Experiment 2, two combinations of three variables tied as the most effective: ~81% of overall accuracy using area (log transformed), length-to-width ratio (log- or cube-transformed), and number of feature parts (non-transformed). After verifying the classiﬁcation accuracy of 114 algorithms by comparing with expert interpretations, we concluded that applying different data transformations and accounting for metoc and geo-location attributes optimizes the accuracies of binary classiﬁers (oil spill vs. look-alike slicks) using the simple LDA technique.


Introduction
The sea-surface signature of mineral oil contamination ("oil slicks") can be the result of natural causes seeping out of the sea floor ("oil seeps") or being spilled through human intervention ("oil spills"). Petroleum pollution in both coastal and open-ocean waters is of great ecological concern [1,2]. Oil-related incidents usually draw media attention and public awareness, leading the oil and gas industry to enforce rigorous safety protocols and invest in contingency plans, as well as causing political conflicts, economic issues, ecological problems, and scientific concerns [3,4]. A recent catastrophic oil spillage, unprecedented in the last decades, occurred at the end of 2019 when an unknown source caused a myriad of massive oil slicks along Brazil's shoreline [5].
Remote sensing can help the detection of severe events, including the recent Brazilian case [6], or in the relatively frequent minor oil slicks observed at the ocean surface; satellite here, where we study the classification between oil spills and look-alike slicks. While LDAs were applied to remotely sensed features obtained with the Canadian RADARSAT-2 to classify seeps and spills in Gulf of Mexico waters [41], here LDAs are applied to features retrieved in images of the Canadian RADARSAT-1 to distinguish the presence of mineral oil on the sea-surface from other petroleum-free features off the Brazilian coast [42].
Our overall objective here is to rank algorithms applied to many satellite-derived parameters in various data combinations with simple data transformations, according to their success in oil-slick classification. Two experiments to assess the classification of oil spills from look-alike slicks were designed to fulfill our two objectives to rank several combinations of (i) variables and (ii) data transformations using satellite-derived measurements (microwave, infrared, and visible): • Exclusion or inclusion of specific types of data (Experiment 1); and • Data transformations applied to the attributes (Experiment 2).
Besides ranking the algorithms and to find the best binary classifiers, our research also seeks to provide improved baseline information for future analyses to discriminate sea-surface features identifiable in SAR imagery. The research reported here introduces five innovations (referred to as "developments"): 1.
Implementation of stringent knowledge-driven filters; 2.
Use of simple morphological characteristics (or simply "size information"); 3.
Exploration of several combinations of Meteorological-Oceanographic parameters (collectively referred to as "metoc variables"); 4.
Assess the value of the including geo-location parameters ("geo-loc"); 5.
Application of different data transformations to the attributes in the same analysis.
Following the introduction and statement of objectives given in Section 1, information about the study area and the satellite-based datasets are found in Section 2; the methods are given in Section 3; results are presented in Section 4; important remarks are reported in Section 5 in the discussion of the major findings; and the paper concludes with a summary of our results and some recommendations for future work in Section 6.

Study Region
Our area of interest is the Campos Basin offshore of the southeast coast of Brazil ( Figure 1). The relevance of this region to the Brazilian economy is due to its numerous offshore oil and gas exploration and production facilities-in 2020, 38 operational fields rep-resented~25% of the country's fossil fuel supply with 989,949 barrels of oil equivalent [43]. The Campos Basin has very dynamic meteorological and oceanographic conditions throughout the year: during the austral summer, constant northeasterly winds support upwelling events that drop the surface water temperature and increase the primary biological production, but in the winter months, strong southwesterly winds tend to roughen the sea and primary biological production is reduced [44,45]. These phenomena are not confined to the offshore region between the Cabo de São Tomé and Cabo Frio, near Guanabara Bay, but that is where they are most frequently observed ( Figure 1).

Database
A tabular remote sensing dataset, including microwave, infrared, and visible satellite measurements, was exploited here. This dataset was first utilized by Bentz [46], and later explored by Moutinho [47] and Carvalho et al. [42]. An important characteristic of this dataset for our study is the classification of oil spills vs. look-alikes based on expert interpretation. We use these interpretations as the basis for assessing the LDA accuracies.
The original dataset contained 779 individual polygons that were identified in 402 scenes of the Canadian RADARSAT-1 taken between July of 2001 and June of 2003. These 8-bit, HH polarized, C-band SAR images are from different beam modes [48,49]: ScanSAR Narrow (incident angles: 20 to 46) and Extended Low (incident angles: 10 to 23). Their data were re-sampled to ground resolutions of 100 m [46]. The borders of all observed features with low-backscatter radar signal, i.e., oil and non-oil, were delimited using a multiple resolution segmentation approach [50]. 358 spills are associated with oil samples from identified exploration or production facilities and ship-spills; confirmed spills but from unknown origins are referred to as orphan-spills. 421 look-alike slicks are sea-surface expressions of five different environmental phenomena: biogenic films, algal blooms, upwelling, low wind conditions, and convective rain cells.
Each polygon was described using 34 main descriptive characteristics divided into six attribute types:

1.
Two textural (i.e., contrast and entropy of the pixels within the features); 2.
Four related to SAR-signatures (e.g., standard deviation and mean ratios between the pixel values inside and outside of the targets); 3.
Nine pieces of size information (e.g., area and perimeter); 5.
Twelve geo-loc parameters (e.g., bathymetry (BAT) and distance to coastline (CST) calculated to the feature centroid).
The textural and SAR-signature attributes were calculated from uncalibrated SAR measurements, i.e., digital numbers (DNs [51]). Metoc measurements were retrieved from auxiliary environmental Earth-Observation System (EOS) satellites: WND from SeaWinds scatterometer onboard the Quick Scatterometer (QuikSCAT [52]), SST from AVHRR on the National Oceanic and Atmospheric Administration (NOAA) satellites [53,54], and CHL from SeaWiFS on the OrbView-2 satellite [55] or MODIS on the Terra satellite [56]. Additionally, ancillary WND, SST, and CHL maps, derived from measurements from these sensors, were also utilized by the experts to assist their binary classifications.
All algorithms evaluated here use part of the data records and some of the attributes contained in the "original dataset" [46]. The subset of the database analyzed here is defined after the discussion of our research strategy and data mining.

Methods
A pair of methodological steps was performed: research strategy and data mining exercises ( Figure 2). These evolved from prior analyses using LDAs to: (i) differentiate oil spills from oil seeps in RADARSAT-2 images off the Gulf of Mexico coast (Campeche Bay, Mexico) proposed by Carvalho [33] and further developed by Carvalho et al. [38][39][40]; and (ii) distinguish oil spills from look-alike slicks observed in RADARSAT-1 scenes off the coast of Brazil (Campos Basin) [42]. We explored many data combinations: 60 combinations of variables (Experiment 1) and 54 combinations of data transformations (Experiment 2). In practice, each combination was considered as an individual "LDA algorithm". The data combinations in our algorithms are different to those explored in earlier studies, but are similar in number of combinations in three other papers, i.e., 32 [39] + 61 [40] + 39 [42] = 132. Of the combinations analyzed here (60 + 54 = 114), only nine have been previously investigated, but were modified as discussed below.

Research Strategy
This section has three parts describing the data filtering, the removal or inclusion of data (Experiment 1), and the consideration of various data transformations in the same analysis (Experiment 2).

Data-Filtering Scheme
The first development is that we removed samples based on the likelihood of them being outliers. Because of a common issue faced in data classification problems, i.e., to define a good collection of instances with representative characteristics of each class [57,58], the proposed filtering was based on local, historical, and empirical knowledge. As a result, we designed quality control tests to remove samples that include values of any variables that are unlikely to contribute to the oil spill vs. look-alike classification. The number of instances in the experiments was determined by this filtering.

Data Information: Removal or Inclusion
This section presents the different ways the attributes were combined to verify the consequences of removal or inclusion of data. These actions assisted in the ranking of the different combinations of variables, which is our first objective.
Of the six attribute types in the original dataset, three were not considered: textural, SAR-signature, and scene-related information (Section 2.2). In the original dataset, texture and SAR-signature had not been converted to backscatter coefficients (sigma-, beta-, or gamma-naught [59]) making it impossible to compare time series of images, but instead, they had been registered in uncalibrated DN values, therefore permitting only relative comparisons within individual scenes. Scene parameters (i.e., number of identified features per scene, sum of the areas of all features within each SAR image, etc.) cannot contribute to a classification scheme, as these are functions of the SAR swath width and not of the slicks. We thus utilized variables from the remaining three attribute types: size information, metoc variables, and geo-loc parameters (Section 2.2). Within these attribute types, we explored  . Data combinations explored to evaluate the linear discriminant analysis (LDA) algorithms during the data-information experiment fulfilling our first objective, i.e., determine the best combination of variables for linearly discriminating oil spills from look-alike slicks. Colorcoded circles represent attribute types. Yellow: size information-area, compact index (CMP: (4.π.area)/(perimeter 2 )), aspect ratio (length-to-width ratio: LtoW), perimeter-to-area ratio (PtoA), fractal index (FRA: 2.ln(perimeter/4)/ln(area)), and number of parts of each feature (NUM). Black: Meteorological-Oceanographic (metoc) variables-wind speed (WND), sea-surface temperature (SST), and chlorophyll-a concentration (CHL). White: geo-location (geo-loc) parameters-bathymetry (BAT) and distance to coastline (CST). Colored panels correspond to attribute-type subdivisions: (A) blue ("Size Plus Metoc Set"-9 data combinations); (B) green ("Size Set"-3 data combinations); and (C) gray ("Metoc Set"-8 data combinations). Each of these 20 combinations had all variables subjected to the same data transformation (i.e., non-transformed, cube root, or log 10 ), thus forming 60 combinations. Combinations previously explored in Carvalho et al. [42] are indicated (#). See also Section 3.1.2.

Size Information
The second development here is the independent use of simple size information. Besides the nine geometry, shape, and dimensions characteristics-area, perimeter, shape index (SHP = (perimeter/4).(area 1/2 )), compact index (CMP = (4.π.area)/(perimeter 2 )), asymmetry (ASY = 1 − (ratio between feature's length and width)), aspect ratio (LtoW = length/width), density (DEN), curvature (CUR), and number of parts of each feature (NUM)-we also explored two other morphologic variables: perimeter-to-area ratio (PtoA), and fractal index (FRA = 2.ln(perimeter/4)/ln(area)). However, several of these eleven attributes are correlated: area with perimeter, CMP with SHP and DEN, LtoW with ASY, and PtoA with CUR [38][39][40]42]. The FRA and NUM variables did not correlate with any other attribute. The choice of uncorrelated attributes is given below (Section 3.2.2). Because the five correlated characteristics (i.e., perimeter, SHP, DEN, ASY, and CUR) led to no LDA classification improvements [42], they are not pursued here. Thus, we use the six uncorrelated variables to define the Size Set; in Figure 3 they are represented by yellow circles:

Metoc Variables
Of the four metoc variables (clouds, WND, SST, and CHL), only cloud cover information was discretely registered as the absence (0) or presence (1) of clouds within the polygons, and is not explored further here due to its binary character. The third development explored three different combinations of metoc variables to quantify their influence (individual and combined) on the algorithm's accuracy. In Figure 3, black circles correspond to the three combinations defining the Metoc Set: • WND, SST, and CHL; • WND; and • SST and CHL.

Geo-Location Parameters
The fourth development is the use of geo-loc parameters. Because most geo-location attributes are site-specific (e.g., distance to petroleum platforms or to underwater pipelines) we only considered two of them: • bathymetry (BAT); and • distance to coastline (CST).
In Figure 3, these parameters are shown by white circles. One should note that even though they are considered independently, they are always analyzed together with size information and/or metoc variables.

Data Transformations
The application of data transformations to the attributes prior to using them in the machine learning methods is, in principle, capable of improving algorithm classification accuracy [35]. Carvalho et al. [39] tested the LDA performance with data from eight non-linear transformations, and based on their results, we analyzed the data without any transformation (i.e., "non-transformed set") and with two data transformations: • cube root; and • logarithm base 10 (log 10 ).
It should be noted that the FRA variable contains negative values and cannot be subjected to logarithmic transformation.

Data Combinations
Eleven variables were carried forward in our study: six pieces of size information (Section 3.1.2.1), three metoc variables (Section 3.1.2.2), and two geo-loc parameters (Section 3.1.2.3). These resulted in nine data combinations of the Size Plus Metoc Set subdivision with and without geo-loc ( Figure 3A), three Size Set combinations with and without geo-loc ( Figure 3B), and eight Metoc Set combinations with and without geo-loc ( Figure 3C). The three attribute-type subdivisions when analyzed with or without geoloc parameters formed 20 different data combinations. Each of these combinations was analyzed three times-in which all variables were subjected to the same data transformation: non-transformed, cube root, or log 10 (Section 3.1.2.4). In the first experiment (denoted as "Data-Information Experiment") we compared the performance of as many as 60 LDAs (20 × 3). This collection of LDAs was implemented to reach our first objective (Experiment 1) and differ from those proposed in the section to follow to attain our second objective (Experiment 2).
Three of the 39 combinations investigated by Carvalho et al. [42], indicated in Figure 3 by the # symbol, are also evaluated here: (i) all-size information plus all-metoc variables; (ii) all-size information; and (iii) all-metoc variables. However, Carvalho et al. [42] did not include any geo-location data, but all variables were also subjected to the same data transformations as those used in this experiment. This resulted in nine combinations (3 × 3) in common with their study, but here, these combinations are treated differently due to two of the five developments: the elimination of some samples and the analysis including geo-loc parameters.

Combined Use of Several Data Transformations in the Same Analysis
The fifth development of this research in relation to any other published binary classification studies (to our knowledge), is that we verified the influence of applying different data transformations to the attributes in the same analysis, i.e., our second objective. Three selected variables were each subjected to different transformations. Table 1  "Metoc Assemblage": WND, SST, and CHL; and 2.
"Size Assemblage": area, LtoW, and NUM. Table 1. The 27 possible data combinations of three variables (Var.) each of which are subjected to three data transformations in the same analysis: none, cube root, or log 10 . Two distinct assemblages were used in the "Data-Transformation Experiment" to address the second objective-establish the best combination of data transformation for the discrimination of oil spills from look-alike slicks. Baseline combinations with the same transformation are given in the first row. "Metoc Assemblage": wind speed (WND), sea-surface temperature (SST), and chlorophyll-a concentration (CHL). "Size Assemblage": area, aspect ratio (length-to-width ratio: LtoW), and number of parts of each feature (NUM)-see also Figure 4 in Carvalho et al. [42]. See also Section 3. None None log 10 Cube Cube log 10 log 10 log 10 Cube None log 10 log 10 Cube log 10 log 10 log 10 Cube Cube None log 10 None Cube log 10 Cube log 10 Cube log 10

None
Cube log 10 Cube None log 10 log 10 None Cube None log 10 Cube Cube log 10 None log 10 Cube None These two assemblages resulted in another series of 54 LDAs (27 × 2) that are used in the second experiment, referred to as the "Data-Transformation Experiment". Regarding the "Assemblage" nomenclature, the reader should not get confused with the terms using "Set" previously defined in Section 3.1.2: Size Set and Metoc Set.
While the Size Assemblage was chosen based on inspection of the dendrograms identifying uncorrelated variables (see Figure 4 in Carvalho et al. [42]), the Metoc Assemblage verifies if we can exclude the use of SAR data and solely use measurements from environmental EOSs sensors. One should note that even though the Metoc Assemblage has the same metoc variables as those from the first Metoc Set, the attributes of this assemblage are subjected to different transformations instead of the same transformation as in the set.

Data Mining Exercises
This section has three parts describing the selection of attributes, the LDA algorithms, and the evaluation of the algorithm accuracy. An open-access software package was used: Paleontological Statistics (PAST [60]).

Attribute-Selection Approach
Rooted tree dendrograms (Unweighted Pair Group Method with Arithmetic mean: UPGMA [61]) were used to assess the level of correlation among variables. The threshold for uncorrelated attributes using dendrograms is user-defined, and two of the most common approaches have been separately applied here:

•
In Experiment 1, an across-dendrogram numeric threshold (phenon line [62]) was used to identify groups of correlated variables from which one attribute is selected per group. This used a fixed Pearson's r correlation coefficient (0.3 > r > −0.3 [63]); and • In Experiment 2, a visual identification of correlated groups of variables, from which one attribute is manually selected for each group.

Linear Discriminant Analysis (LDA)
In addition to being used to reduce the dimensionality of data classification analyses, LDAs can be used as a classification technique [64]. In our analyses we explore conventional LDAs, but many other LDA variants exist: global-local LDA [65], probabilistic LDA [66], dual-space LDA [67], null-space LDA [68], penalized LDA [69], among others. While Tharwat et al. [70] and Legendre and Legendre [71] discuss these linear analyses in a wider context, a summary of the main benefits and weaknesses of conventional LDAs is given below: • Advantages: LDA is a supervised classification method that uses the observed values (attribute magnitudes) of the data (samples) to determine the location of a specific boundary (a linear discriminant axis) between each group (in our case, oil and look-alikes). The LDA general concept is to use the data according to two criteria: (i) maximization of the distance between the average value of each group; and (ii) minimization of the scatter within each group. The ratio of these two criteria, mean squared differences to sum of the variances, is projected onto a line (the linear discriminant axis), providing the ability to linearly separate the groups of samples. This projected lower-dimensional space inherently preserves the group discriminatory information, if one exists. A covariance matrix is calculated for each group along with a within-group scatter matrix to create what is called a discriminant function [72]. Numerically, this function, which corresponds to the dependent variable (DF(X)), is the sum of the product of the independent variables' values (X n ) with a calculated independent variables' weight (W n ); a constant offset may apply (C): DF(X) = (X 1 W 1 + X 2 W 2 + . . . + X n W n ) − C [73]. • Disadvantages: LDA outcomes tend to support good classification decisions, but there are limitations. The number of variables must not exceed the number of samples [74].
LDAs are restricted to linearly separable groups. In addition, the variables used should have as small a correlation as possible [75]. This was accomplished through the pre-selection of attributes. Another aspect to consider is that the dataset must include a binary labeling that can be used to assess the LDA performance [76]: the accuracies of our supervised learning method were verified against the baseline of the experts' classifications.

Classification-Accuracy Assessment
The outcomes of the LDA algorithms ("predicted classes") were assessed by comparison with the baseline interpretation of experts ("true classes") with all samples used as the training-set. We choose to work with five straightforward evaluators obtained from 2-by-2 confusion matrices [77] (Figure 4: Panel 1). Because the standalone use of the common performance metric, i.e., overall accuracy (ratio of all correct decisions to all possible outcomes), can be misleading, four additional metrics were used: sensitivity, specificity, positive-and negative-predictive values [78]. Different nomenclatures are found in the literature for these metrics, for instance: "recall" rather than sensitivity, "precision" instead of positive-predictive value, etc. [79]. These four performance metrics play equally important roles alongside the overall accuracy in measuring the success of binary classification algorithms. While sensitivity and specificity indicate the amount of previously known features correctly identified by the LDAs (the predicted classes), the positive-and negative-predictive values report how many of the features predicted by the LDA match the a priori knowledge (the true classes). Figure 4 illustrates the domains of these metrics:   (Table 1). See also Section 3.2.3.
The classification-accuracy assessment using these three 2-by-2 matrix domains (diagonal, horizontal, and vertical) differs from other published investigations exploring oil-slick LDA classifiers, which do not report their accuracies in such a succinct manner as we do here. Some papers ignore the vertical-analysis metrics (e.g., [35]) or even both, horizontal and vertical (e.g., [34,36]).
Algorithms were deemed "void" if an evaluator was below 60%. Another reason to void the algorithms was due to unbalanced classification rates, i.e., algorithms correctly identifying 30% or more of one class than another; see Section 4.1 for the balance sampling percentages of the database analyzed here.
Because of the generation of multiple 2-by-2-tables (60 + 54 = 114), the five performance metrics are given in a compact confusion matrix form. This compact structure is shown in Figure 4

Results
This section follows the research strategy ( Figure 2). Throughout this section we list 15 important "remarks" that are revisited in the discussion section.

Data-Filtering Scheme
In the first part of our research ( Figure 2) we indicated the number of instances utilized in the 114 LDA algorithms. The outcomes of the knowledge-driven filters are summarized in Table 2. Ten samples (eight spills and two look-alikes) were identified as having transcription errors, thus removing 1.3% of the original dataset (Table 2). Apart from these, only the WND and SST variables presented unexpected values, and their removal is summarized below: Table 2. Summary of the data-filtering scheme showing the number of eliminated records. Wind speed (WND) filter: <3 m/s and >6 m/s. Sea surface temperature (SST) filter: <11 • C. Transcription errors (typo) filter. The statistics of all removed samples, of the original dataset instances [46], and of the analyzed database are also given. See also Section 4.1.  • WND Filter: The SAR-detection ability to identify sea-surface features relies on reduced radar backscatter from the sea-surface, which is dependent on the local wind field [80]. However, the wind limits (lower and upper) to identify sea-surface features in SAR images are not agreed upon by the remote sensing community [81][82][83]. Weak wind conditions (<3 m/s) may prevent correct classification of features as the ambient water around them is also smooth [81]. Even though some authors have pointed out that oil slicks can be observed in~10 m/s or higher winds (e.g., [82]), others have found the upper wind limit is~6 m/s (e.g., [83]). To eliminate unwanted wind influence on our classifiers, samples having wind speed <3 m/s and >6 m/s were not considered. WND filtering removed 199 features (69 spills and 130 look-alikes) that represent 25.5% of the original dataset (Table 2). A primary concern about the WND variable is the ground resolution disparity between the QuikSCAT wind data and the SAR pixel:~25 km vs.~100 m. Although we used the wind information already included in the original dataset [46], finer wind measurements could produce different outcomes. The reader is referred to Remark 5 below, where we discuss the WND variable impact on the LDA classification decision. • SST Filter: The upwelled cold water that usually surfaces in the Campos Basin region comes from the South Atlantic Central Water and has temperatures between 6 • C and 20 • C [84]. However, an analysis of all AVHRR images from the year 2001 in this basin, 176 cloud-free scenes, did not indicate SSTs <11 • C even in the coldest core of the upwelling between Cabo de São Tomé and Cabo Frio [45]. Thus, all samples with SSTs <11 • C were removed prior to the analysis. This SST filtering did not remove any spill samples but eliminated 10 look-alike slicks amounting to 1.3% of the original dataset ( Table 2). The ground resolution discrepancy between the AVHRR SSTs and SAR measurements is not as marked as that with the wind, but may also be a matter to bear in mind:~1 km vs.~100 m. As this filter only removed 10 look-alikes (Table 2), it is most likely that it did not exert as much influence as the WND filter on the analysis. Even though our choice of 11 • C was based on an earlier analysis, other SST thresholds could influence the LDA outcomes.

Class/Category
These filters removed 21.5% of the oil spills (77) and 33.7% of the look-alike slicks (142) from our analyses ( Table 2), resulting in~28% fewer instances (219) being analyzed in relation to the 779 samples in the original dataset [46]. Consequently, the database analyzed here has 560 records. Since all LDAs were evaluated using the same collection of samples, the discretization resolution of our analyses is 0.18%, i.e., one misclassified feature (1/560).
While the original dataset had a somewhat unbalanced sampling percentage, 46% (358 spills) and 54% (421 look-alikes), the filtered database used here has fortuitously a well-balanced sampling: 50.2% (281 spills) and 49.8% (279 look-alikes); Table 2. This balance increased the chances of reaching good predictability levels among the five performance metrics, thus enabling a more meaningful comparison of the performance of the LDA algorithms.
The data-filtering scheme determined the most effective combination of samples by considering the magnitudes of all selected variables, thus adequately accomplishing its goal of establishing a collection of samples using a conservative approach to reduce the chances of incorrect classification in the two experiments presented below. Other factors influencing the delineation of oil-slick features in the SAR signal include oil type (light or heavy oil), slick age (time in the sea-surface since its release), acquisition geometry (incident angle), among others [85,86]. However, these were not stored as separate attributes in the dataset to allow their implementation as filters.
An inclusive hierarchy based on the classifier's overall accuracies is provided in Table 3A,B: running from 1 to 60. These are assembled into "hierarchy blocks", color-coded as in Figure 3. All combinations are grouped in three major blocks corresponding to the three proposed attribute-type subdivisions with and without one of the two geo-loc parameters: size information plus metoc variables (1 to 29: blue), Size Set (25 to 36: green), and Metoc Set (37 to 60: gray)-in Table 3A,B. See Remark 1 below. The averaged values per block are presented in Table 4. Each of these color-coded blocks was also ranked within attribute-type subdivisions. These define the "subdivision ranks" which are given in parentheses in Table 3A,B: 1-27 (Size Plus Metoc Set: blue), 1-9 (Size Set: green), and 1-24 (Metoc Set: gray). Each major block was further divided in "subgroups" (Table 3A,B), based on the characteristics of the variables. The averaged subgroup information is also given in Table 4. See Remark 2 below. Table 4. Averaged overall accuracies of Experiment 1 (Data Information). Three hierarchy blocks and their respective subgroups (as color-coded in Table 3A,B): size information plus Meteorological-Oceanographic (metoc) variables (blue: 1-29), "Size Set" (green: 25-36), and "Metoc Set" (gray: 37-60), all of which were analyzed with or without at least one geo-location (geo-loc) parameter and were subjected to the same data transformations. Averaged number of correctly classified samples is provided in parentheses. Blocks match the proposed attribute-type subdivisions ( Figure 3). + indicates the range of accuracies (and samples) in these blocks. * indicates unbalanced identification rate: algorithms correctly identifying 30% or more oil spills than look-alike slicks. ! indicates void algorithms: at least one performance metric below 60%, i.e., specificity. See also Section 4.2.  Table 3A,B that link combinations with equal overall accuracies. See Remark 3 below. Even though the hierarchy blocks and subdivision ranks are interchangeably used when we refer to blocks, hierarchies run from 1 through 60, whereas references to ranks match the attribute-type subdivision count given above. A series of findings apparent in Table 3A,B and Table 4 is discussed by subdivision rank below. 4.2.1. Size Plus Metoc Set, with or without Geo-Location (Blue:  Within this top hierarchy block, three subgroups are identified. The top ranked nine combinations are primarily formed by the combinations of the Size Plus Metoc Set. As stated above, the best accuracy is 84.6%. The middle group has eight combinations predominantly based on size plus WND combinations. The lowest subgroup has ten combinations mostly formed by size plus SST and CHL combinations. More details are given in Remark 3 below.

Blocks
Although the difference between the best and worst classification rate is 3.6% (20 samples; Table 4), there is a demonstrable synergy in combining different attributes: firstly, the six pieces of size information plus the three metoc variables (size + WND, SST, and CHL) out-performed size with only one metoc (size + WND), and secondly, size + WND surpassed size plus the other two metoc (size + SST and CHL). Regarding the use of geo-location parameters, when either of them was included, there was a gain in accuracy. In this hierarchy block, there was no improvement of the data-transformed combinations over the non-transformed set.

Size Set, with or without Geo-Location (Green: 1-9)
There are two subgroups in the middle hierarchy block. The first has five combinations, all of which were transformed: cube root or log 10 . The best combination was the six size plus BAT cube transformed: 81.4% (456 samples correctly classified; Table 3B). The second subgroup has four combinations (ranks 6-9) and most of them are non-transformed combinations: size with and without geo-loc. The exception was a cube-transformed combination (rank 8: size without geo-loc) that was not within the first subgroup, but in the second.
While the averaged overall accuracy of the first group was~81% (453 samples), the second group average was~80% (446 samples); Table 4. The inclusion of geo-loc parameters promoted an improvement of the classification accuracies. The difference between the most and least accurate classification in this block is 2.1% (12 samples; Table 4), but data-transformed combinations have better outcomes than those without transformationindeed this is the basis for the formation of groups in this block.

Metoc Set, with or without Geo-Location (Gray: 1-24)
The lowest hierarchy block has three subgroups. The top subgroup has six combinations using all three metoc variables (with and without one geo-loc) that have been transformed: cube root or log 10 . The most successful combination in this block has three metoc variables with log transformed BAT: 74.8% (419 samples). The middle subgroup has nine combinations (ranks 7-15) that include the three non-transformed combinations of three metoc variables (with and without one geo-loc), and the six combinations only using WND plus either of the geo-loc parameters. The lowest subgroup has nine combinations (ranks 16-24) using SST and CHL, with or without geo-loc. However, they were all considered void for the two reasons given in Section 3.2.3: (i) their specificity was below 60% (Table 3B); and (ii) they had unbalanced classification rates.
The averaged overall accuracies of these groups are~74%,~73%, and~65%, respectively with the number of samples correctly identified per group being 417, 410, and 365 ( Table 4). The highest and lowest classification rate had a difference of 12.1% (68 samples). There was an evident synergy in using all metoc variables together, as they improved the ability of the classifier to discriminate oil spills from look-alike slicks. Likewise, the sole use of WND (with any geo-loc) produced better classifiers than those using the other two metoc variables, i.e., SST and CHL (with or without geo-loc). The use of geo-loc parameters improved the classification accuracy. There was a clear dependence on the use of data transformations in the top and middle groups, with the absence of transformations producing the least accurate classifications.

Comparative Classification Accuracy
In this section we compare the results of nine data combinations that have been analyzed by Carvalho et al. [42] that are indicated in Figure 3. Table 5 shows the main classification accuracy differences extracted from Table 3A,B and Table 7 in Carvalho et al. [42]; see Remark 4 below. Two differences in percentages (Diff.) are reported in Table 5, comparing (i) our results with those of Carvalho et al. [42] and (ii) the inclusion of geo-loc parameters. These are described below:  Table 5 contains a local ordering of the three data transformations of each attribute-type subdivision. This ordering confirmed that there was no clear consistency to show which data transformation was best; in Table 5, asterisks indicate best accuracies per subdivision. An example of the lack of consistency is seen in the subdivisions of the Size Set that indicated different best transformations in each study: the overall accuracy without any transformation (79.1%) reported by Carvalho et al. [42] surpassed the application of transformations, while here, the most successful transformation without geo-loc was log 10 (80.7%), but the best outcome including a geo-loc parameter (BAT) was with the cube-transformed combination (81.4%). See Remark 6 below. • Including Geo-Location: In nearly all cases, combinations including at least one geo-location parameter had better performance than those without; the exception being the Metoc Set cube-transformed that remained the same with or without geoloc: 74.5%. The largest overall accuracy increases when geo-loc parameters were considered was~2%: the Size Set combination with cube root transformation (from 79.6% to 81.4%) and the Size Plus Metoc Set combination with log 10 transformation (from 82.5% to 84.3%). See Remark 7 below. In the combinations including geo-loc, BAT was preferable to CST. In only two of nine cases CST achieved superior accuracy. Indeed, among the combinations, the best classifier (cube transformed Size Plus Metoc Set) was improved by~1% with the use of BAT: from 83.9% to 84.6% (Tables 3 and 5). See Remark 8 below.

Experiment 2: Data Transformation
Fifty-four data combinations were considered in the third part of our research (Figure 2). The analyses of the UPGMA dendrograms showed that the correlation of these combinations of variables were within the recommended similarity threshold: 0.3 > r > −0.3. Tables 6 and 7 condense the classifications of the two distinct assemblages: Metoc Assemblage and Size Assemblage; each of which having 3 variables subjected to 3 transformations-27 LDAs each. See Remark 9 below. These Results are presented below. Table 5. Classification accuracy comparisons between our results (see the # symbol in Figure 3 and Table 3A,B) and those in Carvalho et al. [42]-see their Table 7. Attribute-type subdivision (Section 3.1.2): size information plus Meteorological-Oceanographic (metoc) variables, "Size Set", and "Metoc Set". In both studies, variables have been subjected to the same data transformation (Transf.). Herein, combinations were analyzed with or without at least one geo-location (geo-loc) parameter: bathymetry (BAT) or distance to coastline (CST). Overall accuracies are shown in bold font. A pair of differences in percentages (Diff.) are reported: (i) this study compared to Carvalho et al. [42]; and (ii) present study: with minus without geo-loc. A local order is provided per subdivision. The hierarchy (shown in parentheses) has been taken from Table 3A,B  and Table 7 in Carvalho et al. [42]. * indicates the best accuracy within subdivisions. See also Section 4.2.4.

Sub Division
Transf.

Metoc Assemblage (WND, SST, and CHL) with Different Data Transformations
Unlike the 27 combinations of the Size Assemblage (see below: Section 4.3.2), those with variables from the Metoc Assemblage did not form identifiable blocks (Table 6). Additionally, there was no combination being deemed void in the Metoc Assemblage (Tables 3 and 7). See Remark 10 below. Table 6. Classification accuracy hierarchy of the 27 algorithms using three variables subjected to different data transformations in the same analysis-Meteorological-Oceanographic data (metoc: "Metoc Assemblage"-wind speed (WND), sea surface temperature (SST), and chlorophyll-a concentration (CHL)). Bold font indicates baseline combinations with the same transformation. For the interpretation of thick table lines see Section 4.3.1. Detailed statistical information is found in Figure 4. See also Tables 1 and 7, and Section 4.3: Experiment 2 (Data-Transformation).  The minimum (73.4%) and maximum (74.8%) overall accuracy rate difference was only 1.4% (8 samples; Table 6). Within this classification range, there were six values that had many combinations with similar performance, 73.4% to 74.6%, these are delineated in Table 6 by thick lines. A characteristic of most of them is that they did not correctly identify the same samples; this is apparent in Table 6 in the number of correctly classified oil spills and look-alike slicks-for instance, hierarchies 19, 20, and 21 all identified 414 samples (73.9%) but their classifications per class varied: spill (213, 214, and 218 samples, respectively) and look-alikes (201, 200, and 196, respectively). See Remark 11 below.

Hierarchy
If we consider the baseline combinations with the three variables subjected to the same transformation (bold font in Tables 1 and 6), the cube root (74.5%) surpassed the log-transformed (74.3%), as well as the non-transformed (73.4%). These are hierarchies 7, 11, and 24, in Table 6. Note that the non-transformed version was worse by~1% compared to the two with a transformation.
A remarkable accuracy improvement was observed from worst to best classifiers with different data transformations: 13.9% (78 samples; Table 7). Considering the baseline combinations with the three variables subjected to the same transformation (bold font in Tables 1 and 7), the log-transformed (78.6%) surpassed the cube-transformed (77.3%), and the non-transformed (70.2%; void). These are hierarchies 10, 14, and 20, in Table 7. The non-transformed version was poorer by >7% and voided. See Remark 14 below.
The 27 combinations within the Size Assemblage were divided into three major blocks mostly guided by a specific attribute: area (Table 7). A secondary group, apparent in these blocks, is controlled by another variable: NUM-these are shown in Table 7 by thick lines. In the blocks guided by the area variable, the application of the log transformation forms the top block, followed by combinations subjected to the cube transformation, and lastly by non-transformed versions. On the other hand, the groups controlled by the NUM variable had the non-transformed assemblage being more accurate than those with the application of cube root and log transformations. See Remark 15 below.

Discussion
Other than the oil-slick classification studies described in Carvalho et al. [38][39][40]42], involving LDA algorithms to discriminate surface ocean slicks detected in RADARSAT measurements, there are few publications in the literature (to our knowledge) classifying satellite-detected features using LDAs in a similar fashion as reported here. Most papers using LDAs to classify oil slicks differ from our research in that: (i) they were only successful once LDAs were postulated with another machine learning technique (e.g., [34]), while we reached successful discriminations solely based on the use of conventional LDAs; (ii) they fail to report essential accuracy metrics (e.g., [35]), thus ignoring the importance of reporting a full algorithm's accuracy assessment in a more efficient and effective manner; (iii) they explored marine radar images (e.g., [36]), rather than SAR satellite imagery. A pair of other characteristics set our study apart from these earlier investigations: the pre-selection of specific data (Experiment 1) and combination of attributes subjected to several data transformations in the same algorithm form (Experiment 2). In addition, of the 114 LDA algorithms tested here, only nine have been previously examined, with these being modified here. The remainder of this section discusses the 15 remarks previously introduced in the results section. Table 7. Classification accuracy hierarchy of the 27 algorithms using three attributes subjected to different data transformations in the same analysis-morphological characteristics ("Size Assemblage": area, aspect ratio (length-to-width ratio: LtoW), and number of parts of each feature (NUM)). The explanation of the hierarchy blocks: 1-6, 7-18, and 19-27 is given in the text. * indicates unbalanced identification rate: algorithms correctly identifying 30% or more oil spills than look-alike slicks. ! indicates void algorithms: at least one performance metric below 60%, i.e., specificity. Bold font indicates baseline combinations with the same transformation. For the interpretation of thick table lines see Section 4.3.2. Detailed statistical information is found in Figure 4. See also Tables 1 and 6, and Section 4.3: Experiment 2 (Data-Transformation).

Data-Information Experiment
• Remark 1: Considering the hierarchy blocks, when variables from Size Plus Metoc Set were combined, the algorithms were more accurate than those using variables from one type alone. Additionally, when comparing the sole use of size information, the classification accuracies were superior to those using only the metoc variables. A corresponding hierarchical pattern was also observed among the 61 data combinations reported in Carvalho et al. [40]. The hierarchy block formation was only disrupted by two combinations of the Size Set (hierarchies 25 and 28: green group) that were more accurate than a few combinations of the Size Plus Metoc Set (hierarchies 26, 27, and 29: blue group). • Remark 2: Regarding the subgroups, it is noteworthy that some data combinations achieve classifications better than others (Table 3A,B). Table 4 shows the top-blue (Size Plus Metoc Set) and middle-green (Size Set) blocks have an average difference of~1% between each of their groups:~84% to~80%. The differences between the middle-green and lowest-gray (Metoc Set) blocks were greater, as were those within the groups in the last block. • Remark 3: Of the many combinations that had the same overall accuracies (to the number of decimal places indicated), most of them did not correctly identify the same samples-this is seen in  [42], the same did not hold for the Metoc Set subdivision that had its overall accuracies reduced (Table 5). While the largest improvements were~3% in two log-transformed Size Set combinations: without geo-loc (from 78.0% to 80.7%) and with geo-loc (from 78.0% to 81.3%), the best of all combinations (cube transformed Size Plus Metoc Set) had its accuracy increased by~1% by the inclusion of one geo-loc parameter (BAT): from 83.7% to 84.6% (Table 5). These improvements demonstrate the success of the removal of samples that are unlikely to contribute to the classification and the addition of geo-loc attributes.

Comparisons with Earlier Results
• Remark 5: The Metoc Set combinations did not produce high-ranking accuracies in comparison with the earlier results of Carvalho et al. [42] (Table 5). This may be due to many records having been removed based on the WND thresholds: lower (<3 m/s: 105 samples) and upper (>6 m/s: 94 samples)-i.e., 25.5% of the original dataset (Table 2), even though the exclusion of these cases was based on physical reasoning. • Remark 6: There was not a clear pattern to indicate which data transformation was best. The non-transformed set and log 10 had only two cases each as the best combination among the nine compared, and the cube-transformed combinations were more accurate in five cases (Table 5).

Geo-Location Inclusion
• Remark 7: Two geo-loc parameters available in the original dataset were studied here, but they were not considered together because they are highly correlated. The inclusion of geo-loc parameters results in improved accuracies (Table 5). • Remark 8: Combinations using Bathymetry (BAT, ranging from 5 m to~4 km) tended to have improved accuracies compared to those using the distance to coastline (CST, 186 m to~435 km); Table 5.  Table 6), we notice that subjecting variables to different data transformations in the same analysis slightly improved the accuracies of the LDA algorithms.

Size Assemblage: Area, LtoW, and NUM
• Remark 12: The use of three pieces of size information subjected to different transformations (i.e., the two combinations that tied with 80.9%-area (log 10 ), LtoW (logor cube-transformed), and NUM (non-transformed); hierarchies 1 and 2 in Table 7) reached an equivalent accuracy to the best combination of six pieces of size information log-transformed without geo-loc or metoc (80.7%; hierarchy 31 in Table 3B). Clearly, the combination of various attributes subjected to several data transformations in the same analysis, can lead to improving the LDA algorithm accuracy. • Remark 13: The combinations using non-transformed areas were void-hierarchies 19 to 27 in Table 7. The lack of data transformation may also be negatively influencing other combinations of variables using the non-transformed area, for example, those among the 60 depicted in Figure 3, and presented in Table 3A,B. As such, other variables may also be suffering from using non-transformed areas, and this should be further investigated. See also Remark 15 below. • Remark 14: The best of the three baseline combinations of three pieces of size information with the same transformation (shown in bold font in Tables 1 and 7) was subjected to log 10 -78.6% (hierarchy 10 in Table 7). However, nine other combinations were better, the best being 80.9% (hierarchies 1 and 2 in Table 7). This improvement of 2.3% is another indication that the combined use of attributes subjected to different data transformations improves the LDA classification accuracy. • Remark 15: Considering the major hierarchy blocks and secondary groups, among the 27 combinations that use three pieces of size information with three data transformations (Table 7), one reason is given for this ranking: among the 560 analyzed features, areas have a large range of continuous values (from oil spills with 0.45 km 2 to look-alikes with 8177.24 km 2 cause by upwelling events), whereas the NUM variable with its discrete values had features with only 1 part up to look-alike slicks with 24 different parts caused by biogenic films.

Summary and Conclusions
We report on successful differentiation of oil spills from look-alike slicks using simple, linear discriminant analyses (LDAs) of satellite-based information (RADARSAT-1, QuikSCAT, AVHRR, SeaWiFS, and MODIS) from the Campos Basin, Brazil (Figure 1). A series of effective classification algorithms was produced based on the combination of characteristics of three attribute types: (i) morphological characteristics (size information: area, compact index (CMP), aspect ratio (length-to-width ratio: LtoW), perimeter-to-area ratio (PtoA), fractal index (FRA), and number of feature's parts (NUM)); (ii) Meteorological-Oceanographic (metoc) variables (wind speed (WND), sea-surface temperature (SST), and chlorophyll-a concentration (CHL)); and (iii) geo-location (geo-loc) parameters (bathymetry (BAT) and distance to coastline (CST)). Two data transformations were considered in addition to non-transformed: cube root and log 10 . The quantitative accuracy of 114 LDA algorithms was evaluated and ranked with five performance metrics: overall accuracy, sensitivity, specificity, positive-, and negative-predictive values (Figure 4). This study was built upon the ability to distinguish sea-surface features in SAR images using LDAs-oil spills vs. look-alike slicks [42], as well as oil spills vs. oil seeps [38][39][40]-and included developments beyond past research [33][34][35][36][37]. Our two objectives have been achieved through two separate experiments ( Figure 2):

Objective 1
The "Data-Information Experiment" sought the most effective combination of variables among 60 combinations (Figure 3). Three proposed attribute-type subdivisions were hierarchized in major blocks: "Size Plus Metoc Set", "Size Set", and "Metoc Set" (Table 3A,B). These were considered with or without at least one geo-loc parameter and all variables were subjected to the same data transformations. The best accuracies were reached with all variables from each subdivision. Each block was further stratified in subgroups related to the variable's characteristics (Table 4). Bathymetry (BAT) was generally better than distance to coastline (CST). The main developments used here-sample removal (data filter) and inclusion of geo-loc information-improved classification accuracy ( Table 5). The main results regarding the LDA accuracies ( • if only Meteorological data and geo-loc are used, the best accuracy is 73.8% (hierarchy 43; cube-transformed); and • if only Oceanographic data are accounted for (with or without geo-loc), the results are considered void (hierarchies 52-60).

Objective 2
The "Data-Transformation Experiment" sought the most effective combination of data transformations to improve accuracy. This experiment is a development over published binary classification papers, as here we combined variables undergoing different data transformations in the same analysis. Two distinct assemblages of 27 data combinations each with three variables were tested with three data transformations (Table 1). In the first assemblage ("Metoc Assemblage": WND, SST, and CHL- Table 6), there was no noteworthy classification improvement as revealed by the small range of~1.5% (8 samples) from its best (74.8%) to worst (73.4%) overall accuracies. On the contrary, the second assemblage ("Size Assemblage": area, LtoW, and NUM- Table 7) showed accuracy improvements from different transformations-the best (80.9%) to worst (67.0%) accuracy had a remarkable difference of~14.0% (78 samples). Two combinations subjected to three transformations tied as the most effective LDA-80.9% (453 samples): area (log 10 ), LtoW (log-or cube-transformed), and NUM (non-transformed). These two best combinations of three variables vs. three transformations were superior to the best baseline combination with the same transformation applied to all variables-78.6% (440 samples): area (log 10 ), LtoW (log 10 ), and NUM (log 10 ); Tables 1 and 7. Moreover, they achieved a comparable outcome to the best combination using the six pieces of size information (without metoc or geo-loc) all being subjected to log 10 transformation (80.7%; 452 samples). The framework of combining different data transformations in the same classification algorithm simplifies and optimizes the LDA classification as fewer attributes were used to reach the same result.

Future Work Recommendations
Future work could apply other linear and non-linear methods (e.g., decision tree, random forest, support vector machine, artificial neural network) to guide the development of improved classifiers. A continuation of this research could include a larger collection of variables being subjected to different data transformations in the same classification algorithm, as it would be interesting to investigate if the behavior observed in the Data-Transformation Experiment also occurs with other attributes, i.e., testing different data transformations on a greater number of variables. For instance, what would happen if in the best Size Set combinations that accounts for six variables (without metoc or geo-loc) all of which were subjected to log 10 (i.e., 80.7%; 452 samples; hierarchy 31 in Table 3A  (CSA) and the National Aeronautics and Space Administration (NASA) for data from their Earth observation satellites, the developers of the open-access PAleontological STatistics (PAST) software, and the reviewers for their comments that led to an improved paper.

Conflicts of Interest:
The authors declare no conflict of interest.