A Spectral Feature Based Convolutional Neural Network for Classification of Sea Surface Oil Spill

Spectral characteristics play an important role in the classification of oil film, but the presence of too many bands can lead to information redundancy and reduced classification accuracy. In this study, a classification model that combines spectral indices-based band selection (SIs) and one-dimensional convolutional neural networks was proposed to realize automatic oil films classification using hyperspectral remote sensing images. Additionally, for comparison, the minimum Redundancy Maximum Relevance (mRMR) was tested for reducing the number of bands. The support vector machine (SVM), random forest (RF), and Hu’s convolutional neural networks (CNN) were trained and tested. The results show that the accuracy of classifications through the one dimensional convolutional neural network (1D CNN) models surpassed the accuracy of other machine learning algorithms such as SVM and RF. The model of SIs+1D CNN could produce a relatively higher accuracy oil film distribution map within less time than other models.


Introduction
The ocean is an important part of the earth's surface, accounting for approximately 71% of the total surface area, and serves as an indispensable component of the Earth's ecosystem.In recent years, sudden oil spill accidents have become more frequent with increasing maritime traffic.These accidents include oil pipeline ruptures, oil and gas leakages, vessel collisions, illegal dumping, and blowouts, causing serious damage to the marine environment and ecological resources [1][2][3].To manage oil spill detection and post-disaster cleanup, planners require instantaneous information regarding the location, type, distribution, and thickness of an oil slick [4,5].Compared with the traditional direct detection method that requires human control, satellite remote sensing technology can enable large-area monitoring of the spread, thickness, and type of an oil spill.These data compensate for the shortcomings of traditional direct surveillance methods and can guide surveillance aircraft and ships to conduct real-time monitoring of the most important parts of a spill.Hence, remote sensing technology has become an essential tool for detecting oil spills [1,6,7].
Hyperspectral remote sensing images are advantageous for oil spill detection because they comprise continuous and abundant spectral information.The intrinsic data structure of such images can be regarded as a three-dimensional tensor representing the height, width, and spectral dimension of the image.Within the spectral dimension, the two-dimensional spatial architecture of the hyperspectral data comprises the first (the height of the spectral image) and second (the width of the spectral image) dimensions.Today, the methods commonly used for extracting oil spill information from hyperspectral remote sensing images include the maximum likelihood classifier, classification regression tree (CART), support vector machine (SVM), and random forest (RF).However, these classification methods have many drawbacks, such as the Hughes (curse of dimensionality) effect, the need for large memory, cumbersome tuning with large-scale training samples, and limitations in dealing with multi-modal inputs [8].
Convolutional Neural networks (CNNs) are similar to biological neural networks, which are applied for visual image processing and speech recognition.CNNs can effectively extract spatial information and share weights among nodes to reduce the number of parameters [9][10][11].Makantasis, Liang, and Vetrivel [12][13][14] extracted spatial features from the first several principal-component bands of original hyperspectral data using a 2D CNN model.However, these studies mainly used local spatial information while training the 2D CNN model and did not use any spectral information.In some studies [13,15], although the CNN model was more accurate overall than other classifiers, it tends to misclassify smaller targets.A 3D CNN model can acquire local signal changes in the spatial and spectral dimensions simultaneously in a feature cube; furthermore, it can utilize the rich spectral information included in hyperspectral images [16,17].Chen [18] proposed deep feature extraction based on spectral, spatial, and space-spectral combined information using the CNN framework; moreover, a feature extraction model for hyperspectral imagery that uses 3D-CNNs was proposed.Li [19] proposed a classification method that uses 3D CNNs to extract deep features effectively using a combination of spectral and spatial information.This method does not rely on any pre-or post-processing steps.However, the computation time for data with high dimension increased significantly when taking both spectral and spatial information into account [16,17,19].
The 1D CNN model focuses on the rich spectral information that is available in hyperspectral images, which can reduce calculation time and deeply mine spectral feature information [16].Fisher, Zhang, and Ghamisi [20][21][22] first proposed a 1D CNN algorithm based solely on spectral information.This algorithm uses the pure spectral characteristics of pixels in the original image as an input vector, which could be realized quickly and simply in theory.Hu [23] used the spectral information in single-season hyperspectral images as input vectors to construct a 1D CNN model and a local convolution filter to obtain the local spectral features of spectral vectors for land-cover classification.The overall accuracy (90% to 93%) of the 1D CNN was superior to SVM classification by 1% to 3%.Guidici [11] proposed a 1D CNN architecture based on single-season and three-season hyperspectral images of the San Francisco Bay in California and compared the CNN classifier against RF and SVM classifiers.The 1D CNN was 0.4% more accurate than the SVM and 7.7% more accurate than the RF when using three-season data.Therefore, 1D CNNs can offer some of the advantages of a 3D CNN without incurring the prohibitive computational cost.
The 2D CNN model must convolve for each two-dimensional input in the network, and each input comprises a set of learnable kernels.The increase of training parameters may cause over-fitting, leading to a reduction in the generalization ability of the algorithm.In addition, the 2D CNN model considers only the information between adjacent pixels of a certain pixel on the images and does not utilize the unique spectral-dimension information of the hyperspectral image [16,23,24].The 3D CNN model extracts three-dimensional features from the image and uses spatial and spectral information, which may improve the classification accuracy of hyperspectral images.However, it significantly increases the computing time and reduces operating efficiency [16,19,25].In recent years, applications of CNN models to oil-slick monitoring have been improved owing to the increased abundance of remote sensing data from satellites.Guo [10] proposed multi-feature fusion to support the CNN oil spill classification method using polarized synthetic aperture radar (SAR) data.In addition to identifying dark spots, the method could effectively classify unstructured features.Nieto-Hidalgo [26] proposed a system for detecting ships and oil spills with a two-stage architecture composed of three pairs of CNNs.However, CNN models for oil-slick identification mainly used SAR images currently.SAR and multi-spectral remote sensing data are widely used in the response to oil spills.However, SAR data do not allow clear discrimination between oil slicks and false positives, for example biogenic slicks, because it cannot get any spectral features.Furthermore, the use of SAR is critical when the sea is flat due to absence of wind.In this case, the availability of fully polarimetric SAR could be useful to discriminate between the dielectric constant of oil and that of (salt) water.Otherwise, the response in a single-polarization channel mainly depends on the roughness, i.e., presence and size of waves, of the sea surface.The multi-spectral data usually do not permit to detect oil pollution, because of their large bands compared to the narrow bands of hyperspectral data.Thin absorption features are captured by some of the hyperspectral bands, but are spread and cannot be detected in the corresponding multispectral bands.That is the reason for which few hyperspectral bands containing the absorption features of interest for the discrimination are preferable to many multispectral bands.More suitable hyperspectral data will be available in future (EnMap and PRISMA), which could be used to develop more suitable solutions to the oil film classification.However, few studies have used hyperspectral data for oil film extraction using machine learning methods.Using the band selection method, we can extract the most valuable spectral features and reduce the amount of computation.The main purpose of this work is to propose a simple and efficient oil film classification model.By doing so, we could aid in the oil spill emergency response and clean-up works.

Data Sets
The Deep Water Horizon oil spill was among the worst in history [27,28].To monitor progress in the cleanup of this spill, the US government obtained significant amount of spaceborne and airborne remote sensing data using a range of sensors, such as MODIS, MERIS, Landsat-TM/ETM, airborne visible/infrared imaging spectrometer (AVIRIS), and Envisat-ASAR, which were analyzed in several studies [29][30][31].
This paper focuses on the AVIRIS data, which was recorded with 224 bands in the 400 to 2500-nm wavelength range [32].The flight names of the data are f100517t01p00r10 and f100517t01p00r11, and they have spatial resolutions of ~7.6 m and spectral resolution of 10 nm, respectively.The data was recorded on 17 May 2010 (Figure 1).

Overall Procedure
The overall workflow of this study is shown in Figure 2. First, three machine learning classifiers were implemented, which will be explained in detail later.Next, the training and test samples were

Overall Procedure
The overall workflow of this study is shown in Figure 2. First, three machine learning classifiers were implemented, which will be explained in detail later.Next, the training and test samples were prepared and read into each classifier separately.A series of parameters were used to train the model using the training samples.The 3-fold cross-validation was used to the training samples [35], which helped select the best combination of parameters.Each of the selected methods was tested using the test samples to calculate the accuracy.Finally, we adopted the best models of each classifier obtained previously to predict the label for each pixel and produce the oil slick distribution maps.We also compared the accuracy of the proposed method with the method in [23].For the process of oil film distribution mapping, SIs and minimum redundancy maximum relevance (mRMR) were used to select the prominent features of each pixel in the image of T1 (Figure 1).The selected bands were loaded into the aforementioned classifiers.

Overall Procedure
The overall workflow of this study is shown in Figure 2. First, three machine learning classifiers were implemented, which will be explained in detail later.Next, the training and test samples were prepared and read into each classifier separately.A series of parameters were used to train the model using the training samples.The 3-fold cross-validation was used to the training samples [35], which helped select the best combination of parameters.Each of the selected methods was tested using the test samples to calculate the accuracy.Finally, we adopted the best models of each classifier obtained previously to predict the label for each pixel and produce the oil slick distribution maps.We also compared the accuracy of the proposed method with the method in [23].For the process of oil film distribution mapping, SIs and minimum redundancy maximum relevance (mRMR) were used to select the prominent features of each pixel in the image of T1 (Figure 1).The selected bands were loaded into the aforementioned classifiers.

Data Pre-Processing
Radiometric calibration and atmospheric correction, which will eliminate systematic errors introduced by system [36] and atmosphere, respectively, were applied to the hyperspectral raw data.While radiometric calibration is performed by the data provider, atmospheric correction is generally performed by the user.NASA/JPL had already processed the data to remove geometric errors introduced by the aircraft motion and radiation errors caused by the instruments.The atmospheric calibration was required to yield the surface reflectance values.The original values in the images were those of scaled radiance.Bands 1 to 110 have a scale factor of 300, bands 111 to 160 have a scale factor of 600, and bands 161 to 224 have a scale factor of 1200.Each band was divided by the corresponding scale factors, and a 16-bit integer radiation value was obtained in units of µW/(cm 2 × sr × nm).Following this, the atmospheric correction was performed using the Fast Line-of-sight Atmospheric Analysis of Hypercubes (FLAASH) module in the ENVI software.In the FLAASH module, the atmospheric model parameters were tropical, and the aerosol model was maritime.After the atmospheric correction, the nearby pixels, which may contain oil film with similar thickness, have nearly identical spectra [37].

Feature Selection
Pal and Foody [38] showed that the classification accuracy is related to the dimension of the input features.The accuracy declines significantly with the addition of features, particularly if a small training data sample is used.However, Li et al. [39] found that the use of more bands improved classification accuracy.In this paper, we used images with and without band selection as inputs for the classifiers.The band selection methods we tested are based on the spectral index (SI) and mRMR measures.
(1) SI-based band selection Zhao et al. [34] evaluated the usefulness of SIs to identify oil films with different thicknesses.They found that the spectral indices of hydrocarbons have a greater potential to detect oil slicks imaged with continuous true color.For sheens and seawater, seawater indices are more suitable.Among these indices, the Hydrocarbon Index (HI), Fluorescence Index (FI), and Rotation-Absorption Index (RAI) have been used to detect and characterize oil films of varying thicknesses [40,41] (Table 2).Other researchers [2,42,43] analyzed the spectral features of oil films with different thicknesses or area ratios and proposed useful spectral bands in the ranges of 507 to 670 nm, 756 to 771 nm, and 1627 to 1746 nm.

Algorithm
Bands Reference For the present work, we selected bands in the ranges of 490 to 885 nm and 1627 to 1746 nm.
(2) mRMR-based band selection mRMR is an efficient feature selection algorithm proposed by Peng et al [44].It penalizes a feature's relevance by its redundancy in the presence of the other selected features.The relevance of a feature set D(S, C) and the redundancy R(S) of all features in the set are defined as follows: where S is a feature set, C is the class, f is the individual feature, and I is the mutual information between feature f i and class C or features f i and f j is the relevance of S for the class C. The mRMR of a feature set is obtained by simultaneously combining maxD(S, C) and minR(S) into a single criterion function.Ding [45] defined the following two criteria to select the features: The MID: Mutual Information Difference criterion, and MIQ: Mutual Information Quotient criterion.
We used the MIQ, and 15 features were selected.
In hyperspectral data, the class label is an integer, but the bands are continuous.In order to use the algorithm, features are usually quantized to discretize the continuous bands into bins [46].In this paper, the quantization boundary was set as µ ± 2σ, where µ and σ represent the estimated mean and standard deviation of the features, respectively.

Classifiers
(1) RF Developed by Breiman in 2001, RF is one of the most commonly used machine learning ensembles.It trains a series of decision trees using randomized draws of training data.Following this, the entire forest of decision trees is used as a composite classifier.Data can be applied to the classifier once the forest is created.Furthermore, the prediction across each tree can be obtained [47].
A few parameters should be adjusted to make the RF classifier easier to use [48].We used the RF classifier included in the Scikit-learn library [49].The number of trees in the forest, the maximum number of features, and the minimum samples required to be present at a leaf node should be adjusted.The features are the reflectance bands in hyperspectral imagery.The maximum number of features is considered when looking for the best split.The number of trees used during this study was (10, 100, 500, 1000), among which the model with 100 trees got a relatively high accuracy and consumed a short amount of time.The maximum number of features is the square root of the number of features, and the minimum number of samples taken at each leaf is 1.
(2) SVM SVM is a supervised learning model that uses learning algorithms to analyze and classify data.It is used widely owing to its effectiveness with small datasets [38].A hyperplane is defined using the training data to classify in the SVM classifier [49,50].For 2D data, the hyperplane is the line between two categories that separates the data most effectively.For high-dimensional classification, the data are mapped to space with a higher dimension, and the separator between categories becomes a plane.Hyper-parameters are used to dictate how the data is mapped onto the higher dimension.In this paper, the SVC model in the Scikit-learn library was used.The kernel support vector classifier that utilizes a one-vs.-oneclassification scheme, termed Radial Basis Function (RBF), was adopted.The kernel coefficient gamma for the RBF was set to (10 −6 , 0.01, 0.1, 0.2, 0.5).The penalty parameter C of the error term balances the misclassification of the training samples and the simplicity of the decision surface.A small C value makes a smooth decision surface, while a large C value implies that the model has greater freedom to select more samples as support vectors.We tested a series of C values, 1, 20, 70, 100, 200, 700, and 1000.The most suitable values found that kernel coefficient was 0.01 and C was 700.
(3) Convolutional Neural network CNNs are feedforward neural networks whose artificial neurons can respond to a surrounding area of the covered part.CNNs comprise neurons with learnable weights and biases.Each neuron receives a line of inputs and performs dot product calculations.The output is the score for each category; the category with the highest score is used as the result.Typically, CNNs comprise four layers: Input, convolutional, pooling, and fully connected (FC).The output of the FC layer is the input to a simple linear classifier that generates the required classification result.
We used TensorFlow, a popular deep learning tool [51], to build our CNN.The architecture of the network comprises an input layer, a convolutional layer, a pooling layer, an FC neural network, and an FC Softmax layer that serves as the classifier.The flow diagram of our CNN process is shown in Figure 3.
area of the covered part.CNNs comprise neurons with learnable weights and biases.Each neuron receives a line of inputs and performs dot product calculations.The output is the score for each category; the category with the highest score is used as the result.Typically, CNNs comprise four layers: Input, convolutional, pooling, and fully connected (FC).The output of the FC layer is the input to a simple linear classifier that generates the required classification result.
We used TensorFlow, a popular deep learning tool [51], to build our CNN.The architecture of the network comprises an input layer, a convolutional layer, a pooling layer, an FC neural network, and an FC Softmax layer that serves as the classifier.The flow diagram of our CNN process is shown in Error!Reference source not found..The first layer of CNN is the input layer that contains the samples.To uniformly distribute the samples, a pre-processing procedure, such as normalization and dimensional reduction, could be performed.For this work, one-dimensional arrays, the spectra of all bands, and bands selected by mRMR and SI are used as the inputs.
The convolutional layer is the main part of CNN, and its parameters comprise a set of learnable filters [52].Each filter is small but extends through the full depth of the input array.The outputs produced via convolving these filters across the input arrays are then fed into an active function.The outputs of convolutional layers are calculated using equation (5).
where f is the active function, M and N are the width and height of the filter, respectively, and O(x,y) and w(x,y) are the value and weight of the xth row and yth column, respectively.b represents the bias.
In particular, for this work, two convolutional layers were utilized, the sizes and height of the filters were set as 3 and 1, respectively, and x was a constant (x = 1).We utilized the rectified linear unit (ReLU) as the active function.ReLU is an element-wise operation and replaces all negative pixel The first layer of CNN is the input layer that contains the samples.To uniformly distribute the samples, a pre-processing procedure, such as normalization and dimensional reduction, could be performed.For this work, one-dimensional arrays, the spectra of all bands, and bands selected by mRMR and SI are used as the inputs.
The convolutional layer is the main part of CNN, and its parameters comprise a set of learnable filters [52].Each filter is small but extends through the full depth of the input array.The outputs produced via convolving these filters across the input arrays are then fed into an active function.The outputs of convolutional layers are calculated using Equation (5).
where f is the active function, M and N are the width and height of the filter, respectively, and O (x,y) and w (x,y) are the value and weight of the xth row and yth column, respectively.b represents the bias.
In particular, for this work, two convolutional layers were utilized, the sizes and height of the filters were set as 3 and 1, respectively, and x was a constant (x = 1).We utilized the rectified linear unit (ReLU) as the active function.ReLU is an element-wise operation and replaces all negative pixel values in the feature map with zero.The purpose of ReLU is to introduce nonlinearity in the proposed convolutional net because most of the real-world data to be learned by the convolutional net would be non-linear.
The pooling layer can be regarded as a spectral subsampling or down-sampling of the convolutional features.It reduces the dimensionality of each feature but retrains the most important information.A max-pooling operation, which has been shown to work well in practice [11], was used in this paper.Through this operation, we defined a spectral neighborhood and took the largest element from the rectified feature within that window.For this work, the size of the window was 1 × 3.
The output of the pooling layer was then provided to the feature classification network, which comprised an FC neural network and an FC Softmax layer.The FC neural network contained a hidden layer with 128 notes.A 50% dropout level was adopted to randomly ignore nodes, which could prevent overfitting.
The output of the hidden layer was connected to the final classifier: a Softmax layer.This layer could produce a vector with the length that equals to the number of classes.Each value represents the probability that a feature belongs to a certain class.The Argmax function was adopted to find the location of the argument with the largest probability and finally provide a one-hot classification.

Feature Selection
The retention of important spectral features is essential.Hyperspectral remote sensing images can obtain nearly continuous spectra associated with the features.However, leaving too many spectral bands in the image will lead to redundancy between the bands, which will increase computational complexity without improving accuracy.Therefore, the hyperspectral remote sensing data are always dimensionally reduced.Band selection was used for dimension reduction.Using SIs-based method, 55 bands were selected and 15 bands remained after mRMR-based band selection.The computational complex was reduced dramatically, while the main spectral features were preserved.Images reduced with the mRMR indicator were less accurately classified than unreduced and SI-reduced images.This implies that mRMR band selection eliminated some useful features.The images reduced using the SIs indicator were reduced in dimension but lost no useful information [34].This indicates that the thickness-related features of the images were part of the spectral information.
Band selection based on SIs and mRMR was applied to the original images, and the resulting bands were compared.Here, we named the original image as the all-bands image and the images after band selection as images of SIs selected bands and image mRMR selected bands.
The dimensions of the data were reduced significantly through band selection (Figure 4).The selection results are mainly concentrated on the bands whose wavelength less than 970 nm or greater than 1600 nm.Using the mRMR method, 15 bands were selected for subsequent training and classification.The wavelengths of these bands are mainly concentrated in the range of 405 to 948 nm and 2357 to 2457 nm.According to the SIs, 55 bands that range from 490 to 880 nm to 1632 to 1742 nm were selected.The separability between regions of interest (ROIs) is an important evaluation criterion for measuring the suitability of the training samples or ROIs.In the present study, the Jeffries-Matusita Distance (JMD) between various ROIs was calculated to evaluate the pair separation of the selected ROIs depending on the band selection method.
As shown in Error!Reference source not found., the JMD values of the selected training samples The separability between regions of interest (ROIs) is an important evaluation criterion for measuring the suitability of the training samples or ROIs.In the present study, the Jeffries-Matusita Distance (JMD) between various ROIs was calculated to evaluate the pair separation of the selected ROIs depending on the band selection method.
As shown in Table 3, the JMD values of the selected training samples from the all-band and band-selected images are between 1.89 and 2, which indicates that the selected samples were separable in all three cases [53].The separation of the all-bands image was better than that of the band-selected images.The separability of the training samples is relatively low on the images after band selection through mRMR.The spectra of each class after bands selection (Figure 5) were less continuous than the original spectra on all band images (Figure 4).On the other hand, the spectral difference among the classes was much more obvious than the original spectra, which could lead to higher classification accuracy.

Accuracy Comparisons Among the Models
The sample distribution, overall accuracy (OA), producer's accuracy (PA), and Kappa coefficient of the proposed and compared methods are listed in the Table 4.The accuracy was calculated based on the 42,676 test samples.By comparing the OAs of all the methods, we found that the methods with SIs selected bands received more accurate classification results than theses with all bands.Specifically, the proposed SIs+1D CNN had the best performance, while the All bands+RF performed the worst.We also calculated the accuracy of the method using Hu's CNN [23].The results of the method using Hu's CNN were more accurate than those of SVM and RF, but less accurate than those of the proposed 1D CNN.This may be caused by the shallower architecture.
We calculated and listed the PA of each class using different methods.The PA of water is much

Accuracy Comparisons Among the Models
The sample distribution, overall accuracy (OA), producer's accuracy (PA), and Kappa coefficient of the proposed and compared methods are listed in the Table 4.The accuracy was calculated based on the 42,676 test samples.By comparing the OAs of all the methods, we found that the methods with SIs selected bands received more accurate classification results than theses with all bands.Specifically, the proposed SIs+1D CNN had the best performance, while the All bands+RF performed the worst.

Running Time Comparisons Among the Models
The running time of each model is listed in Table 5.It was tested on the same running environment (Intel Core i5 CPU @ 2.64 GHz, use single core).The sample training time for RF and SVM refers to the time to fit the model to the training samples.The training time for proposed 1D CNN and Hu's CN is the training time used to reach the highest OA.The sample validation time is the time used to predict labels for the test samples.The prediction time is the average time used to produce oil film distribution map for the hyperspectral image of region T1, including the time used for image input, output, and classification.

Case Studies
We selected one representative scene to compare the mapping results of each classifier.Figure 6 shows the AVIRIS true color image (color composite is R:638 nm, G:550 nm, B: 462 nm), the results obtained from All bands+1D CNN, SIs+1D CNN, mRMR+1D CNN, All bands+RF, SIs+RF, All bands+SVM, SIs+SVM, All bands+HU and SIs+HU for each scene.As can be seen in the images, most of the sea surface was clear water.In the area covered by the oil film, the oil film was mainly sheen and thin film.Thick oil films were scattered around the study area, which were labelled by the ellipses on the images.The medium oil films were distributed around the thick films.By comparing the oil film distribution maps with the true color image, we found that the SIs+1D CNN, All bands +1D CNN, and SIs+SVM could identify most of the thick oil film.The model of SIs+1D CNN could precisely extract medium and thick oil film, while the other models consistently underestimated these two classes.The sheen oil film was mistakenly classified as water when the All bands+1D CNN, mRMR+1D CNN, RF, and Hu are used as the classifiers.The SIs+1D CNN could classify the sheen accurately (marked in red rectangle).Generally, the results obtained using SIs+1D CNN were substantially better than those from other models.

Conclusions and Future Work
In this paper, we proposed a band selection based 1D CNN method to perform oil film identification and thickness classification over AVIRIS hyperspectral images.We selected some bands with the main spectral features using spectral indices and mRMR.All models were trained and By comparing the oil film distribution maps with the true color image, we found that the SIs+1D CNN, All bands +1D CNN, and SIs+SVM could identify most of the thick oil film.The model of SIs+1D CNN could precisely extract medium and thick oil film, while the other models consistently underestimated these two classes.The sheen oil film was mistakenly classified as water when the All bands+1D CNN, mRMR+1D CNN, RF, and Hu are used as the classifiers.The SIs+1D CNN could classify the sheen accurately (marked in red rectangle).Generally, the results obtained using SIs+1D CNN were substantially better than those from other models.

Conclusions and Future Work
In this paper, we proposed a band selection based 1D CNN method to perform oil film identification and thickness classification over AVIRIS hyperspectral images.We selected some bands with the main spectral features using spectral indices and mRMR.All models were trained and tested over the original all bands, SIs selected bands and mRMR selected bands separately.A cross-validation procedure was performed to find the best parameter combination for each method.All the selected models were tested over the test samples to calculate their OA, PA, and kappa coefficients, which demonstrated that the proposed SIs+1D CNN achieved an improvement in performance with respect to the rest of compared models in OA, and in PA for water, sheen, and thick oil film.The comparison of computation time was also made, and our proposed method reported that it needed relatively less prediction time.Moreover, all the mentioned methods were applied to an image to map the oil film distribution.It showed that our proposed method performed well in identification and classification of oil film.
In addition to the spectral information, the spatial context should also be considered during the oil film classification [37].Normally, the thickness of oil film on sea surface changes gradually from sheen to thick.To improve classification performance, an intuitive idea is to design models using both spectral and spatial dimension, incorporating the spatial context into the 1D classifiers.Spatial information could provide additional discriminant information related to the effect of adjacent pixels, which could lead to more accurate classification maps [21].Thus, 3D CNN should be considered in future studies.

15 Figure 1 .
Figure 1.AVIRIS flight lines covered on May 17, 2010.The models were trained and tested using image of T0, and applied to images of regions T1.

Figure 1 .
Figure 1.AVIRIS flight lines covered on 17 May 2010.The models were trained and tested using image of T0, and applied to images of regions T1.

Figure 1 .
Figure 1.AVIRIS flight lines covered on May 17, 2010.The models were trained and tested using image of T0, and applied to images of regions T1.

Figure 2 .
Figure 2. The overall flowchart for all methods described in this study.

Figure 2 .
Figure 2. The overall flowchart for all methods described in this study.

Figure 3 .
Figure 3. Flow chart of the proposed convolutional neural network (CNN).

Figure 3 .
Figure 3. Flow chart of the proposed convolutional neural network (CNN).

Figure 4 .
Figure 4. Band selection results.The bands covered by grey areas were selected by mRMR.The blue lines indicate bands selected according to the SIs.

Figure 4 .
Figure 4. Band selection results.The bands covered by grey areas were selected by mRMR.The blue lines indicate bands selected according to the SIs.

Figure 5 .
Figure 5. Spectra with band number of images after band selection.Left: SIs-based selected bands; Right: mRMR-based selected bands.

Figure 5 .
Figure 5. Spectra with band number of images after band selection.Left: SIs-based selected bands; Right: mRMR-based selected bands.

Funding:
This work was supported by the National Natural Science Foundation of China (Grant No. 51509030), Natural Science Foundation of Liaoning Province (Grant No. 20180550362), Dalian Innovation Support Foundation (Grand No. 2017RQ065), the Fundamental Research Funds for the Central Universities (Grant No. 3132014302) and China Scholarship Council.

Table 1 .
Oil film description and thickness according to the Bonn Agreement and in this study.

Table 2 .
Spectral indexes (Sis) referenced in this study.

Table 3 .
Pair separation between each training sample in terms of the Jeffries-Matusita distance (JMD).

Table 5 .
Running time comparison (averaged from 20 repeated experiments).The model of All bands+RF took the longest training, validation, and prediction time.The method of mRMR+1D CNN took the shortest time for training, validation and, prediction.Generally, methods with Hu's CNN needed less time than those with our CNN.This was mainly because Hu's CNN had less layers, which meant it took less time to calculate and fit.Overall, The SIs+1D CNN model took relatively less prediction time, which is a very important factor for large scale oil spill extraction.