Oil Film Classification Using Deep Learning-Based Hyperspectral Remote Sensing Technology

Marine oil spills seriously impact the marine environment and transportation. When oil spill accidents occur, oil spill distribution information, in particular, the relative thickness of the oil film, is vital for emergency decision-making and cleaning. Hyperspectral remote sensing technology is an effective means to extract oil spill information. In this study, the concept of deep learning is introduced to the classification of oil film thickness based on hyperspectral remote sensing technology. According to the spatial and spectral characteristics, the stacked autoencoder network model based on the support vector machine is improved, enhancing the algorithm’s classification accuracy in validating data sets. A method for classifying oil film thickness using the convolutional neural network is designed and implemented to solve the problem of space homogeneity and heterogeneity. Through numerous experiments and analyses, the potential of the two proposed deep learning methods for accurately classifying hyperspectral oil spill data is verified.


Introduction
The majority of oil is transported by ships, thus greatly increasing the risk of oil spills.In 2010, the Deepwater Horizon oil rig exploded in the Gulf of Mexico, leaking a large amount of crude oil into the deep sea.Once an oil spill accident occurs, the distribution and thickness information of the oil spill must be determined in real time to dispose of the oil spill.Remote sensing technology is widely used in oil spill monitoring and research because of its advantages in large-area imaging.Compared with radar, laser, and multispectral images, hyperspectral remote sensing images have the following advantages: a wide monitoring range, continuous and high-dimensional object spectrum information, and an anti-interference capability.They also play an important role in environmental monitoring.The spatial resolution of hyperspectral images has improved with the development of sensor technology, so these images can provide multidimensional characteristics for target recognition in environmental monitoring and enable the classification of oil film thickness.
Hyperspectral images result from combining spectral and spatial information processing [1].The deep learning method enables the analysis of large datasets and can extract the inherent laws and characteristics of images [2].Using the deep learning method to process hyperspectral images has also become a trend [3].Scholars have achieved fruitful research results in this aspect.Hyungtae et al. used a multiscale convolution method to combine the spatial and spectral information of hyperspectral images and introduced residual learning to construct a convolutional neutral network of nine layers.The experiments confirm that the classification accuracy of the convolutional neural network (CNN) model is higher than that of other models (e.g., LeNet-5, D-DBN, and RBF-(support vector machine) SVM) [4].Shi Cheng [5] and Chen Yushi [6] used CNN to extract the characteristics of hyperspectral images.The former used the wavelet analysis to improve the structure of the CNN, thus improving the accuracy.Gustavo and Lorenzo presented a framework of kernel-based methods in the context of hyperspectral image classification, its main advantage is the ability to directly estimate the conditional posterior probabilities of classes [7].Some scholars proposed a framework for multiple feature learning, which provided state-of-the-art classification results without significantly increasing computational complexity [8].However, deep learning techniques are rarely used in the classification of the relative thickness of hyperspectral oil films in current research.Therefore, this study explores how machineand deep-learning methods are used in the recognition of ship oil spills and in the classification of oil film thickness in hyperspectral data.Typical deep neural network architectures include deep scattering convolutional network [9], deep belief networks (DBNs) [10], deep Boltzmann machines (DBMs) [11], stacked autoencoders (SAEs) [12], and stacked denoising autoencoders (SDAEs) [13].The layer-wise training models have several alternatives such as restricted Boltzmann machines (RBMs) [14], pooling units [15] AEs, variational auto-encoder (VAE), adversarial auto-encoder [16,17], and convolutional neural networks (CNNs) [18].In this paper, we adopt two of the above deep learning models, neural network based on stacked autoencoders (SAE) and CNN, as the corresponding deep architecture for hyperspectral data classification.Finally, this work improves the stacked autoencoder network, which is based on the support vector machine (SVM), in accordance with the spatial and spectral characteristics.A CNN model suitable for oil film recognition is established.A detailed comparison analysis of the classification results is conducted using SVM algorithms and the back propagation (BP) neural network (BP) neural network algorithms.

SVM
SVM is a discriminant model proposed by Vladimir.N. Vapnik, the father of statistical learning ethics, in 1995.With a small number of training samples, a fast convergence speed, and an insensitivity to high-dimensional data, SVM has been applied in a wide range of research scenarios.At present, the nonlinear SVM model is mostly used.The kernel method is used to transform a given feature space into a new feature space so that the original sample is linearly separable.Common kernel functions include the polynomial, Gaussian, and Sigmoid kernel functions.

BP Neural Network
The back propagation (BP) neural network, which was proposed by Werbos in his doctoral thesis, is a neural network that learns through the BP algorithm.The BP algorithm corrects the weight of each layer by sending the loss through forward propagation and the error through BP.The basic structure of the BP neural network includes the input, hidden, and output layers, as depicted in Figure 1.In the figure, the attributes of the input sample are and the weight from the hidden layer to the output layer is The BP algorithm optimizes the weight of various layers by using the gradient descent strategy with the mean square error as the minimum goal.

Neural Network based on Stacked Autoencoders (SAEs)
An autoencoder is a neural network structure with a single hidden layer which can transform sample characteristics into characteristics that are easy to classify.The method also reduces data dimensionality [20].Autoencoder learning can be divided into two steps: encoding and decoding.Encoding converts the input characteristics into another characteristic, whereas decoding takes the characteristics obtained through the encoding process as the input and reconstructs them into the original input; the parameters are adjusted by comparing the original input and the result.An AE network consists of an input layer, a hidden layer, and an output layer.SAE consists of multiple AE networks with a deep network structure.SAE transforms low-level features into high-level features, easily resolving the linearly inseparable problem.Given that hyperspectral remote sensing images usually have many wave bands, the SAE network can effectively extract image characteristics and can facilitate their classification from a theoretical perspective.SAE has three hidden layers, an input layer and an output layer.The input of the last two layers is the encoding result of AE of the previous layer.The eigenvalue after the first encoding, y, is used as the input of the second encoding.The eigenvalue after the second encoding, z, is used as the input of the third encoding, and so on.

Neural Network based on Stacked Autoencoders (SAEs)
An autoencoder is a neural network structure with a single hidden layer which can transform sample characteristics into characteristics that are easy to classify.The method also reduces data dimensionality [19,20].Autoencoder learning can be divided into two steps: encoding and decoding.Encoding converts the input characteristics into another characteristic, whereas decoding takes the characteristics obtained through the encoding process as the input and reconstructs them into the original input; the parameters are adjusted by comparing the original input and the result.An AE network consists of an input layer, a hidden layer, and an output layer.SAE consists of multiple AE networks with a deep network structure.SAE transforms low-level features into high-level features, easily resolving the linearly inseparable problem.Given that hyperspectral remote sensing images usually have many wave bands, the SAE network can effectively extract image characteristics and can facilitate their classification from a theoretical perspective.SAE has three hidden layers, an input layer and an output layer.The input of the last two layers is the encoding result of AE of the previous layer.The eigenvalue after the first encoding, y, is used as the input of the second encoding.The eigenvalue after the second encoding, z, is used as the input of the third encoding, and so on.
The SAE programming process is combined with the classifier to establish a neural network based on SAEs, as illustrated in Figure 2. A logistic or Softmax regression is frequently used by the classifier.

Neural Network based on Stacked Autoencoders (SAEs)
An autoencoder is a neural network structure with a single hidden layer which can transform sample characteristics into characteristics that are easy to classify.The method also reduces data dimensionality [20].Autoencoder learning can be divided into two steps: encoding and decoding.Encoding converts the input characteristics into another characteristic, whereas decoding takes the characteristics obtained through the encoding process as the input and reconstructs them into the original input; the parameters are adjusted by comparing the original input and the result.An AE network consists of an input layer, a hidden layer, and an output layer.
SAE consists of multiple AE networks with a deep network structure.SAE transforms low-level features into high-level features, easily resolving the linearly inseparable problem.Given that hyperspectral remote sensing images usually have many wave bands, the SAE network can effectively extract image characteristics and can facilitate their classification from a theoretical perspective.SAE has three hidden layers, an input layer and an output layer.The input of the last two layers is the encoding result of AE of the previous layer.The eigenvalue after the first encoding, y, is used as the input of the second encoding.The eigenvalue after the second encoding, z, is used as the input of the third encoding, and so on.

CNN
CNN was developed from the receptive field and first appeared in the 1860s.CNNs adopt the local link approach, whereas traditional neural networks use the complete-link approach.
A CNN usually consists of the following five layers: the input, convolutional, pooling, fully connected, and classification layers.The input layer is used as the input of the sample through which the sample can be preprocessed (e.g., normalization) to make the sample distribution uniform.The fully connected layer is the same as in an ordinary neural network, and its result is used as the input to the classification layer.The classification layer mainly calculates the probability that a sample belongs to a certain type.The convolutional and pooled layers are important structures in the CNN.
This work uses Tensorflow as the development framework for deep learning.In establishing the CNN model, classical convolutional neural network models, AlexNet [21] and VGGNet [22], are used as reference.Figure 3 shows the structure of AlexNet model and VGGNet model.The left side shows the AlexNet network structure and the right side shows the vggnet-16 network structure.Five convolutional layers, three pooling layers and three full-link layers are adopted in AlexNet.Mechanisms such as ReLU, Dropout and Local Response Norm (LRN) are adopted in the network.The ReLU activation function is used to solve the gradient dispersion problem when the Sigmoid function is deep in the network.A dropout mechanism is used to randomly ignore some neurons and avoid overfitting.LRN mechanism is adopted to make the value with larger response become relatively larger so as to increase the generalization ability.VGGNet is only composed of 3x3 convolution kernel and 2x2 pooling kernel, which can reduce the number of parameters needed to train for the convolution layer.As shown in the figure, there is no LRN in VGGNet.
The SAE programming process is combined with the classifier to establish a neural network based on SAEs, as illustrated in Figure 2. A logistic or Softmax regression is frequently used by the classifier.

CNN
CNN was developed from the receptive field and first appeared in the 1860s.CNNs adopt the local link approach, whereas traditional neural networks use the complete-link approach.
A CNN usually consists of the following five layers: the input, convolutional, pooling, fully connected, and classification layers.The input layer is used as the input of the sample through which the sample can be preprocessed (e.g., normalization) to make the sample distribution uniform.The fully connected layer is the same as in an ordinary neural network, and its result is used as the input to the classification layer.The classification layer mainly calculates the probability that a sample belongs to a certain type.The convolutional and pooled layers are important structures in the CNN.
This work uses Tensorflow as the development framework for deep learning.In establishing the CNN model, classical convolutional neural network models, AlexNet [21] and VGGNet [22], are used as reference.Figure 3 shows the structure of AlexNet model and VGGNet model.The left side shows the AlexNet network structure and the right side shows the vggnet-16 network structure.Five convolutional layers, three pooling layers and three full-link layers are adopted in AlexNet.Mechanisms such as ReLU, Dropout and Local Response Norm (LRN) are adopted in the network.The ReLU activation function is used to solve the gradient dispersion problem when the Sigmoid function is deep in the network.A dropout mechanism is used to randomly ignore some neurons and avoid overfitting.LRN mechanism is adopted to make the value with larger response become relatively larger so as to increase the generalization ability.VGGNet is only composed of 3x3 convolution kernel and 2x2 pooling kernel, which can reduce the number of parameters needed to train for the convolution layer.As shown in the figure, there is no LRN in VGGNet.The convolutional neural network structure is established (see Figure 4).The dropout and ReLU mechanisms are adopted in the network structure.Figure 4  The convolutional neural network structure is established (see Figure 4).The dropout and ReLU mechanisms are adopted in the network structure.Figure 4

Experimental Data Description
The experimental data were derived from the airborne data in the oil spill accident of the Gulf of Mexico.An airborne visible/infrared imaging spectrometer with a wavelength range of 380-2500 nm was used.It contains 224 wave bands with a spectral resolution of 10 nm and a spatial resolution of 3.3 m.The data obtained on 9 July 2010 (partly cloudy weather) were used in the experiment.The images come from https://gulfoilspill.jpl.nasa.gov/cgi-bin/search.pl, and Figure 5 illustrates part of the data.A total of 1,500 samples were selected as the training sample set, and 315 samples were used as the validation samples to test the effect of the training model.
Figure 6 depicts the spectral curves to which each type corresponds.The figure indicates that each spectral curve features certain characteristics.The horizontal axis represents the wavelength, the vertical axis represents gray value of pixel points.The wavelength range of visible light is between the green and red lines.The samples were classified into five categories according to the literature [23]: 0, 1, 2, 3, and 4 represent seawater, very thin oil film, thin oil film, thick oil film, and very thick oil film, respectively.For evaluating the performance of the models, 315 samples from the remote sensing images (with 63 samples in each category, including samples affected by solar illumination) were randomly selected as validation samples.To ensure the uniform distribution of the samples, this paper uses a standardized method for training data and test data.In this paper, the overall accuracy (OA) and the Kappa coefficient [24] are used to evaluate the models.

Oil Film Recognition Model Based on SVMs
In this work, SVMs with kernel functions RBF, Poly, and Sigmoid were used for the sample training.SVMs use functions in Sklearn with default parameters.To evaluate the performance of the model, the OA of the three kernel functions was calculated, and the results were 68%, 57%, and 63%, respectively; their Kappa coefficients were 0.611, 0.46, and 0.544, respectively.Given that the OA and Kappa coefficient of the SVM with the RBF kernel were larger than those of the SVMs with the other To ensure the uniform distribution of the samples, this paper uses a standardized method for training data and test data.In this paper, the overall accuracy (OA) and the Kappa coefficient [13] are used to evaluate the models.

Oil Film Recognition Model Based on SVMs
In this work, SVMs with kernel functions RBF, Poly, and Sigmoid were used for the sample training.SVMs use functions in Sklearn with default parameters.To evaluate the performance of the model, the OA of the three kernel functions was calculated, and the results were 68%, 57%, and 63%, respectively; their Kappa coefficients were 0.611, 0.46, and 0.544, respectively.Given that the OA and Kappa coefficient of the SVM with the RBF kernel were larger than those of the SVMs with the other two types of kernel functions, the SVM with the RBF kernel exhibited a better classification performance than the other two types, and the RBF kernel function was used in the oil film recognition experiments.

Oil Film Recognition Model Based on the BP Neural Network
In the BP neural network, the current popular ReLU activation function is adopted.The ReLU activation function can solve the vanishing gradient problem and can alleviate the over-fitting problem.In the current work, the 10-fold cross-validation method and OA were used in the training data to evaluate the neural network models at different hidden layers.The numerous tests using the same data on the neural network at different hidden layers revealed that, when the number of hidden layers reached 9, the OA does not vary significantly and nearly reaches the highest level several times.Given that only the shallow neural network was studied through experiments and considering the test results, the structure with nine hidden layers was used for oil film recognition using the BP neural network.

Improved Oil Film Recognition Model Based on the SAE Network
According to the number of nodes at the input, hidden, and output layers of the SAE network, this part compares the SAE network models of two structures.The number of nodes at the hidden layer of the SAE was reduced by an approximately equal ratio and equal difference.Twelve different SAE network models were constructed.Table 1 lists the OA of the two SAE network models in the training and the validation datasets and the Kappa coefficient in the validation datasets.Table 2 lists the experimental results of AE_SVM_1H (the SVM equal difference SAE network structure with one hidden layer) and AE_SVM_3H (The SVM equal difference SAE network structure with three hidden layers) in the classification of thick oil film.Finally, this work used AE_SVM_1H with a good generalization, a high precision, and a high Kappa coefficient to identify the oil film.AE_SVM_3H was also improved by combining the spatial and spectral characteristics.The loss function of SAE uses the squared error.In the SAE network oil film experiment, Figure 7a presents the composite image of the raw data.Figure 7b shows the experimental result of the AE_SVM_3H model, and the red spots appear due to over-fitting.To remove the red spots, the spatial characteristics of the model were confined.Figure 7 illustrates the transformed model.In the original SAE model, the input dimension of the sample was shifted from 224 to 58 and SVM was directly used as the classifier.In the improved model, the four values at the upper, lower, left, and right ends of each sample were merged into a vector with a dimension of 290 after the sample input dimension was changed to 58, as indicated by the yellow arrow in Figure 8.The SVM classifier was then included, allowing the model to add spatial characteristics before the classifier.

Oil Film Recognition Model Based on the CNN Model
In this section, two CNN models are designed.The number of parameters of the node is shown in Table 3. Input is the input layer, Conv is the convolutional layer, Pool is the pooling layer, FC is

Oil Film Recognition Model Based on the CNN Model
In this section, two CNN models are designed.The number of parameters of the node is shown in Table 3. Input is the input layer, Conv is the convolutional layer, Pool is the pooling layer, FC is   7b,c indicates that (c) maintains the thick oil film portion of (b) while removing the over-fitting information (the red spots).The classification accuracy of the validation set increased from 68% to 73%.

Oil Film Recognition Model Based on the CNN Model
In this section, two CNN models are designed.The number of parameters of the node is shown in Table 3. Input is the input layer, Conv is the convolutional layer, Pool is the pooling layer, FC is the fully connected layer, NA is none, and the numbers in the table represent the dimensions of the input/output layer.In the CNN-1 model, two convolutional layers, two pooling layers, and one fully connected layer were used.In the CNN-2 model, four convolutional layers, two pooling layers, and one fully connected layer were used.The receiver operating characteristic curve (ROC) and area under the curve (AUC) were used in the evaluation of the model.Figure 9a   The receiver operating characteristic curve (ROC) and area under the curve (AUC) were used in the evaluation of the model.) of the SAE model that combined both spectral and spatial information processing (see Figure 9).According to Table 4, the OA and Kappa coefficients of the CNN model are larger than those of the SAE model that combines both spectral and spatial information processing.The measured results illustrated in Figure 10 reveal that each model works equally well in the classification, as they can all represent the information of the original image well.In Figure 11, as the images are more complex, the classification effect of each model is  The AUC values of the CNN model in the training and validation datasets (the maximum value in the training set AUC t_max = 0.96 and the maximum value in the validation set AUC y_max = 0.74) were larger than those (AUC t = 0.90 and AUC y = 0.56) of the SAE model that combined both spectral and spatial information processing (see Figure 9).According to Table 4, the OA and Kappa coefficients of the CNN model are larger than those of the SAE model that combines both spectral and spatial information processing.The measured results illustrated in Figure 10 reveal that each model works equally well in the classification, as they can all represent the information of the original image well.In Figure 11, as the images are more complex, the classification effect of each model is different, and the four models, (b), (c), (d), and (e), do not identify the targets, such as seawater (see the circle in Figure 10).These models only extracted part of the thick oil film information.The results of (f) and (g) indicate that the CNN model works well in extracting information, which is especially true for the models represented by (g).In Figure 12, as the image is affected by illumination, its spectral information changes, resulting in completely different classification results from the various models.The results of the CNN-1 model and the CNN-2 model match the "thick oil film" information well, and the fitting effect of the CNN-2 model is better than that of the CNN-1 model.Figure 13 presents the actual testing results of CNN-2 on a wide range of data, indicating that the model is equally applicable to a wide range of data. of (f) and (g) indicate that the CNN model works well in extracting information, which is especially true for the models represented by (g).In Figure 12, as the image is affected by illumination, its spectral information changes, resulting in completely different classification results from the various models.The results of the CNN-1 model and the CNN-2 model match the "thick oil film" information well, and the fitting effect of the CNN-2 model is better than that of the CNN-1 model.Figure 13 presents the actual testing results of CNN-2 on a wide range of data, indicating that the model is The following conclusions can be drawn from the analysis above.In the classification of oil film thickness, the CNN model fits the image information well.The CNN model can be combined with the spatial characteristic information of the oil film for classification.For example, the results of Figure 11 show that, because the thick oil film has distinguished the spatial characteristics, the thick oil film information can finally be extracted.Although the OA and Kappa coefficients of the deep CNN models are relatively small in the verification datasets, the actual test results indicate that the deep CNN models fit the information of the thick oil film well.Thus, the deep CNN model has a strong generalization ability.

Conclusion legend
seawater very thin oil film thin oil film thick oil film very thick oil film The following conclusions can be drawn from the analysis above.In the classification of oil film thickness, the CNN model fits the image information well.The CNN model can be combined with the spatial characteristic information of the oil film for classification.For example, the results of Figure 11 show that, because the thick oil film has distinguished the spatial characteristics, the thick oil film information can finally be extracted.Although the OA and Kappa coefficients of the deep CNN models are relatively small in the verification datasets, the actual test results indicate that the deep CNN models fit the information of the thick oil film well.Thus, the deep CNN model has a strong generalization ability.

Conclusions
Hyperspectral image recognition technology has gradually evolved with the development of sensors and machine learning technology.Spectral and spatial information processing is combined with deep learning.The classifiers built in this deep learning-based framework provide a competitive performance in oil film thickness recognition.This work proposed and implemented an oil film information extraction method that combined spatial information and the SAE neural network, thereby improving the classification accuracy of the traditional SAE neural network.A method for extracting information on an oil spill on the sea surface using the CNN was designed and implemented.The CNN

Figure 2 .
Figure 2. A neural network structure based on stacked automatic encoding.

Figure 2 .
Figure 2. A neural network structure based on stacked automatic encoding.

Figure 2 .
Figure 2. A neural network structure based on stacked automatic encoding.
presents four modules.Part A is the data input and the first convolutional layer.Part B contains y convolution and pooling layer, in which the convolution layer contains x convolution operations.The operation window for convolution is 3 × 3, and the operation window for pooling is 2 × 2. Part C is a fully connected layer, and part D is the output result.The ConLU activation function is used for the convolution and complete-link operations in the model.The dropout mechanism is used for both the pooling and complete-link operations.In training the neural network model, this study uses cross entropy which is commonly applied in neural networks as the loss function.The adadelta algorithm is employed to optimize the model.ISPRS Int.J. Geo-Inf.2019, 8, x FOR PEER REVIEW 5 of 13 presents four modules.Part A is the data input and the first convolutional layer.Part B contains y convolution and pooling layer, in which the convolution layer contains x convolution operations.The operation window for convolution is 3x3, and the operation window for pooling is 2x2.Part C is a fully connected layer, and part D is the output result.The ConLU activation function is used for the convolution and complete-link operations in the model.The dropout mechanism is used for both the pooling and complete-link operations.In training the neural network model, this study uses cross entropy which is commonly applied in neural networks as the loss function.The adadelta algorithm is employed to optimize the model.

Figure 4 .
Figure 4.The convolutional neural network (CNN) model used in the experiment.

Figure 5 .
Figure 5. Remote sensing images used to select samples.

Figure 4 .
Figure 4.The convolutional neural network (CNN) model used in the experiment.

Figure 4 .Figure 5 .
Figure 4.The convolutional neural network (CNN) model used in the experiment.

Figure 5 .
Figure 5. Remote sensing images used to select samples.Figure 5. Remote sensing images used to select samples.
ISPRS Int.J. Geo-Inf.2019, 8, x FOR PEER REVIEW 8 of 13 SAE model, the input dimension of the sample was shifted from 224 to 58 and SVM was directly used as the classifier.In the improved model, the four values at the upper, lower, left, and right ends of each sample were merged into a vector with a dimension of 290 after the sample input dimension was changed to 58, as indicated by the yellow arrow in Figure 8.The SVM classifier was then included, allowing the model to add spatial characteristics before the classifier.

Figure 7 (Figure 7 .
Figure 7.A comparison of the results before and after the transformation of model structure of AE_SVM_3H.

Figure 8 .
Figure 8.The changed structure of the AE_SVM_3H model.

Figure 7 .
Figure 7.A comparison of the results before and after the transformation of model structure of AE_SVM_3H.

Figure 7 (Figure 7 .
Figure 7.A comparison of the results before and after the transformation of model structure of AE_SVM_3H.

Figure 8 .
Figure 8.The changed structure of the AE_SVM_3H model.

Figure 8 .
Figure 8.The changed structure of the AE_SVM_3H model.

Figure 7c depicts the
Figure 7c depicts the experimental results after the model change.The comparison of Figure7b,cindicates that (c) maintains the thick oil film portion of (b) while removing the over-fitting information (the red spots).The classification accuracy of the validation set increased from 68% to 73%.
shows the ROCs and the AUC values of CNN-1, CNN-2, and the SAE model that combines both spectral and spatial information processing in the training data, where AUC SAE = 0.90, AUC CNN−1 = 0.95, and AUC CNN−2 = 0.96.The horizontal axis represents the false positive rate and the vertical axis represents the true positive rate.Figure 9b presents the ROCs and the AUC values of CNN-1, CNN-2, and the SAE model that combines both spectral and spatial information processing in the validation data, where AUC SAE = 0.56, AUC CNN−1 = 0.74, AUC CNN−2 = 0.72.

Figure 9 (Figure 9 .
Figure 9.The ROC curve and the area under the curve (AUC) value of the model on the train dataset (a) and the validation dataset(b).

Figures 10 to 12
Figures 10 to 12 compare the actual test results.Figures 10 (rare types of thick oil films) and 11 (many types of thick oil films) are the normal undisturbed areas, and Figure 12 is the solar area (incident point of the sun).In each figure, (a) is the original image, (b) is the result of the SVM model with the RBF kernel, (c) is the result of the nine-layer BP neural network model, (d) is the result of the SVM SAE equal-difference network model with one hidden layer, (e) is based on the result of the SAE model that combines spectral and spatial information processing, (f) is the result of the CNN-1

Figure 9 .
Figure 9.The ROC curve and the area under the curve (AUC) value of the model on the train dataset (a) and the validation dataset (b).

Table 4 . 5 .
Figures 10-12 compare the actual test results.Figure 10 (rare types of thick oil films) and 11 (many types of thick oil films) are the normal undisturbed areas, and Figure 12 is the solar area (incident point of the sun).In each figure, (a) is the original image, (b) is the result of the SVM model with the RBF kernel, (c) is the result of the nine-layer BP neural network model, (d) is the result of the SVM SAE equal-difference network model with one hidden layer, (e) is based on the result of the SAE model that combines spectral and spatial information processing, (f) is the result of the CNN-1 model, and (g) is the result of the CNN-2 model.

Figure 10 .Figure 11 .
Figures 10 to 12 compare the actual test results.Figures 10 (rare types of thick oil films) and 11 (many types of thick oil films) are the normal undisturbed areas, and Figure 12 is the solar area (incident point of the sun).In each figure, (a) is the original image, (b) is the result of the SVM model with the RBF kernel, (c) is the result of the nine-layer BP neural network model, (d) is the result of the SVM SAE equal-difference network model with one hidden layer, (e) is based on the result of the SAE model that combines spectral and spatial information processing, (f) is the result of the CNN-1 model, and (g) is the result of the CNN-2 model.

Figure 10 .Figure 10 .Figure 11 .
Figure 10.The actual test results at the areas of rare types of thick oil films.

Figure 11 .Figure 11 .Figure 12 .
Figure 11.The actual test results at the areas of many types of thick oil films.

Figure 12 .
Figure 12.The actual test results at the solar area.

Figure 13 .
Figure 13.The actual test results of CNN-2 ((a) the original data and (b) CNN-2)

Figure 13 .
Figure 13.The actual test results of CNN-2 ((a) the original data and (b) CNN-2)

Table 1 .
The overall accuracy (OA) and Kappa coefficient of the equal-ratio and equal-difference stacked autoencoder (SAE) networks.

Table 2 .
The results of AE_SVM_1H (the SVM equal difference SAE network structure with one hidden layer) and AE_SVM_3H (the SVM equal difference SAE network structure with three hidden layers) in the classification of thick oil film.

Table 3 .
Number of parameter nodes in the experiment.Input is the input layer, Conv is the convolutional layer, Pool is the pooling layer, FC is the fully connected layer, NA is none, and the numbers in the table represent the dimensions of the input/output layer. Note:

Table 4
lists the OA and Kappa coefficients of the three models in the validation dataset.Input is the input layer, Conv is the convolutional layer, Pool is the pooling layer, FC is the fully connected layer, NA is none, and the numbers in the table represent the dimensions of the input/output layer.

Table 4 .
The test results of the three models on the validation dataset.

Table 4 .
The test results of the three models on the validation dataset.Figures 10 to 12 compare the actual test results.Figures 10 (rare