Quantiﬁcation of Hydrocarbon Abundance in Soils Using Deep Learning with Dropout and Hyperspectral Data

: Terrestrial hydrocarbon spills have the potential to cause signiﬁcant soil degradation across large areas. Identiﬁcation and remedial measures taken at an early stage are therefore important. Reﬂectance spectroscopy is a rapid remote sensing method that has proven capable of characterizing hydrocarbon-contaminated soils. In this paper, we develop a deep learning approach to estimate the amount of Hydrocarbon (HC) mixed with different soil samples using a three-term backpropagation algorithm with dropout. The dropout was used to avoid overﬁtting and reduce computational complexity. A Hyspex SWIR 384 m camera measured the reﬂectance of the samples obtained by mixing and homogenizing four different soil types with four different HC substances, respectively. The datasets were fed into the proposed deep learning neural network to quantify the amount of HCs in each dataset. Individual validation of all the dataset shows excellent prediction estimation of the HC content with an average mean square error of ~2.2 × 10 − 4 . The results with remote sensed data captured by an airborne system validate the approach. This demonstrates that a deep learning approach coupled with hyperspectral imaging techniques can be used for rapid identiﬁcation and estimation of HCs in soils, which could be useful in estimating the quantity of HC spills at an early stage.


Introduction
Hydrocarbons refer to chemical substances formed exclusively from carbon and hydrogen. Naturally occurring hydrocarbon (HC) substances, depending on the length of the carbon chain, occur in different forms; solid, liquid, and gas [1]. Liquid HCs found in nature consist of a complex mixture of various molecular weights; in addition nitrogen, sulfur, and oxygen exist in small quantities [2].
While the economic significance of HCs is attributed to its primary use as fuel and then versatile application in downstream industries, they can have detrimental environmental consequences [1,3]. Oil exploration, production, and processing represent potential environmental exposure to HCs resulting in accidental terrestrial spillage thereby altering the physical and chemical properties of soils. HCs may therefore be environmentally harmful, causing toxicity, and limiting soil quality [4].
Knowledge about the concentration and nature of a spill is important in order to track their propagation in the environment, assess their risk and propose remediation strategies [5,6]. To effectively protect communities affected by a spill, fast and accurate determination of the area impacted is spectrum as measured by the sensor [27]. Given the mixing model, SU then estimates the inverse of the formation process to infer the quantity of interest, specifically the endmembers, and abundance from the collected spectra [28][29][30]. This could be achieved through a radiative transfer model that accurately describes light-scattering by the materials in the observed scene by a sensor [27,31]. The two main approaches to spectral unmixing are linear and nonlinear models [21,22,25,26,28].
Different methods utilizing both linear and nonlinear models have been demonstrated in the literature for the analysis of different hydrocarbon types. In a work by the authors of [13,15], Principal Component Analysis (PCA) and PLS regression are used. The authors used PCA to differentiate the types and density of HCs in soils while they used PLS to predict the concentration of oils and fuels in soil samples. The authors of [18] used Spectral Angular Mapper (SAM) to classify oil spills on an image and also used signature matching to distinguish oils from other features. However, most of these methods adopt a linear model and smoothing threshold function for feature extraction. Other approaches such as a Kernel-based transformation [32] and manifold learning algorithm [33] are based on nonlinear models.
In the work by the authors of [34], we proved experimentally that HCs abundance in soils was estimated with higher accuracy when non linear unmixing models were applied. Nevertheless, spectral unmixing and specifically the abundance estimation of HCs such as gasoline, can be challenging [20], and may require more advanced techniques such as deep learning. Deep learning network can be considered a powerful technique to solve nonlinear problems, which can be fast, accurate and does not rely on any assumptions to estimate the abundances in a given dataset. However, to the best of our knowledge, there is no study that uses spectral data and deep learning methods to detect and estimate the percentage of HCs in soils. While the value and application of these two techniques have been presented in independent research activities, the techniques have not yet been combined. Therefore, in this paper, a deep learning approach is developed to estimate the amount of HC contamination in soil samples using SWIR imaging spectroscopy. The remainder of the paper is organized as follows. Section 2 describes the data acquisition process including the materials used, sample preparation and the hyperspectral sensor used. Section 3 discusses the methodology, including the parameters used in training the network, the architecture of the deep learning approach, as well as the validation method. Results are presented in Section 4 and discussed in Section 5. Finally, conclusions are drawn in Section 6.

Materials
The hyperspectral imaging sensor used for this experiment covers the Shortwave Infrared (SWIR) range (930-2500 nm), which has been found suitable for the detection of HCs [5,6,13,35,36], mineral identification and mapping [36], rock mapping [37], and mapping of mafic and ultramafic units in the Cape Smith Belt [38].
The soils and HC types selected here have been used extensively in the literature for assessments of HC contamination in different soil types [8,13,17,18,39]. Different HC types, namely Diesel, Bio-diesel, Ethanol, and Petroleum were used. These are the most commonly used HCs in the literature. Soil types include typical mixtures of clay (<0.002 mm in diameter), silt (0.002-0.05 mm in diameter), and sand (0.05-1 mm in diameter). In particular, we used mixtures with different grain size ranging from medium to coarse as follows; Clay, Clay Loam , Sand Clay Loam, and Sand Loam [40].

Sample Preparation
The preparation of the samples consisted of the following steps.

•
Each soil type was air-dried, and therefore all samples contained similar levels of moisture.

•
Fifty grams of a soil sample type was added to a petri dish (12 cm in diameter) • The sample was scanned with a Hyspex SWIR 384 m camera under constant illumination.

•
In the same sample, initially, 2 mL of the HC were added to the soil using a syringe (to clay and clayloam), which was subsequently changed to 5 mL of the HC to the other soil types. • A disposable plastic spoon was used to homogenize the mixture and to flatten its surface in order to have even surfaces, except for some soil samples containing clay which tends to be sticky and difficult to flatten due to the characteristics of the soil type, e.g., Figure 1b.

•
The sample was scanned with a Hyspex SWIR 384 m camera under constant illumination.

•
In the same sample, a further 5 mL of HC was added to the mixture.

•
The disposable spoon was used to homogenize the mixture and another scan was taken.

•
The procedure was repeated with increments of 5 mL of HCs until the mixture was saturated and formed a shallow local pool (see Figure 1). The procedure was repeated on all the soil samples contaminated with all the different hydrocarbon types.
A calibration panel was used as white reference, the acquired images were calibrated from radiance to reflectance using HYSPEX REF software which normalizes the images to an area of known reflectance. A total of 15 combinations (see Table 1) were produced with four mixtures each for clay-loamy, sandy-clay-loam, and sandy-loam soil types, while clay had three mixtures. The complete data set used here consisted of 96 spectral images.

Hyperspectral Imaging
The spectral data was obtained using a Hyspex SWIR 384 m line-scan hyperspectral camera and is equipped with a Mercury Cadmium Telluride (MCT) detector array. For this experiment, a user friendly table-top laboratory set-up with translation stage, SWIR light source, and close-up lenses were used during the scanning stage to scan the sample and build a hyperspectral data cube (see Figure 2). The camera simultaneously captured a full SWIR spectrum, with a spectral sample interval of 5.45 nm between 930 and 2500 nm, each along a line of 384 pixels for 288 bands with a radiometric resolution of 16 bit [41]. The 384 columns of the detector array formed one line of the hyperspectral image in the x-axis. The hyperspectral image was obtained line by line using the so-called "pushbroom" scanning mode, where the platform holding the sample was translated onto the y-axis at constant speed (see Figure 3). The scanning speed was automatically controlled by the data acquisition unit based on the selected lens option. The images produced had a spatial resolution of 0.22 mm/pixel. Radiometric calibration was performed using the vendor's software package. A more detailed specification of the system is given in Table 2.   The resultant reflectance spectra were used to estimate the percentages of the HCs using the abundances calculated based on the different mixture types as shown in Table 3.

Workflow
Spectral information was obtained from the controlled dataset and used with ground truth abundances to evaluate the performance of the proposed deep learning model for estimating the abundance of HCs in each dataset. The workflow of the study is as follows.

•
Obtaining the dataset via a controlled experiment by mixing and homogenizing different Hydrocarbon (HC) types with soil samples and scanning them with a Hyspex Shortwave Infrared (SWIR) 384 m camera.

•
Applying the Deep Learning (DL) model trained using a three-term backpropagation algorithm with dropout for the abundance estimation of the HCs. • Structuring the DL model with different dropout ratios to determine the most efficient DL setting.

•
Testing and validating the performance of the proposed method for abundance estimation of the different HCs by using the same network structure and hyperparameters.

•
Comparing the accuracy and performance of the DL model with a hybrid spectral unmixing method [21] and DL models trained using a standard backpropagation algorithm with and without dropout (to prove the generalization ability of dropout), respectively.
The description and experimental results of this workflow are organized in the following sections. Further explanation and discussion regarding abundance estimation of the HCs by the DL model as well as the other methods can be found in the data acquisition, results, and discussion sections.

Deep Learning
Deep learning has been shown to outperform other machine learning and neural networks techniques. Deep learning can be categorized as a subfield of machine learning, which learns high level abstractions in data by utilizing hierarchical architectures [42]. Deep learning can also be described as the final product of machine learning where the learning rule becomes the algorithm that generates the model from the training data. It typically involves modeling, which hierarchically learn features of input data using Artificial Neural Networks (ANN) and usually has more than three layers [43]. The main advantage of deep learning is that these layers of features are not designed by an operator; they are learned from the input data using learning procedures. A deep neural network can simply be referred to as a network of sufficient complexity in order to interpret raw data without human derived explanatory variables [44,45]. Deep learning models provide excellent results with the ability to extract stronger features, but in turn lead to vanishing gradient, overfitting, and computational load [46]. These problems can be addressed and improved by employing dropout, three-term backpropagation and a Rectified Linear Unit (ReLU) activation function which is known to transmit error better when compared to other functions.
There are many types of deep learning architectures whose application have been proven to yield excellent results, the most common are Deep Believe Network (DBN), Convolutional Neural Network (CNN), Deep Convolutional Generative Adversarial Networks (DCGAN), Recurrent Neural Networks (RNN), etc. [47,48]. The application of deep learning techniques to hyperspectral data is relatively recent, for instance, in the work by the authors of [49], deep belief networks, and a novel texture enhancement algorithm were investigated for their suitability and practical application to hyperspectral image classification. The authors of [50] utilized high-resolution remote sensing imagery and deep learning techniques to extract buildings in urban districts using guided filters. In the work of the authors of [51], a 3D full convolutional neural network model was used for spatial-spectral resolution of hyperspectral images by learning end-to-end, with full mapping between low and high spatial resolution hyperspectral images at high accuracy. Transfer learning with a deep convolutional neural network was reported in the work by the authors of [52]; in this research, a large amount of unlabeled SAR scene data was transferred to SAR target recognition tasks with feedback of the construction loss to the classification pathway. Others, such as the authors of [53,54], used a deep learning approach to classify hyperspectral images. Most of the aforementioned methods used the standard backpropagation algorithm to train the network which has been characterized as having low convergence rates especially when used to train a network with more than one hidden layer. Thus, in this paper, the main aim of using the three-term backpropagation algorithm with dropout to train the network is to increase the convergence rate and the ability to generalize to unseen data with good prediction accuracy compared to existing methods.

Dropout
Dropout allows neurons to randomly drop out of the network during training, while other neurons can step in and handle the representation required to make predictions for the missing neurons [55]. This simply means removing neurons from the network along with all its incoming and out going connections. By applying dropout to a deep neural network, a thinned network often results. This thinned network consists of all the units that survive dropout [56] as shown in Figure 4. The dropout effect is that the network becomes less sensitive to the specific weights of neurons. This in turn results in a network that is capable of better generalization and is less likely to overfit the training data. In this paper, dropout on hidden layers and on the visible layer are developed. Dropout on hidden layers is applied to hidden neurons in the hidden layers and between the last hidden layer and the output layer of the body of the deep networks' model. Dropout on the visible layer is applied between the input and the first hidden layer. Since deep neural networks consist of multiple nonlinear hidden layers, this makes them expressive models that can learn complex relationships between the input and output nodes which often results in overfitting.

Backpropagation
Backpropagation is carried out to train multilayer architectures to minimize the cost function of the model. It is also used to adjust the free parameters weights (ω) and biases in order to attain the desired network output. Traditionally, the learning rate and momentum factors are used to control the weight adjustments and damping oscillations. This is a popular training algorithm in many applications, however the main limitation is its slow convergence especially when used to train a deep neural network with multiple hidden layers. Therefore, the three-term backpropagation algorithm with dropout tend to improve the accuracy of the trained model.

Three-Term Backpropagation
The backpropagation algorithm has been modified by different researchers to improve the efficiency and convergence rate of the algorithm. One such method is the three-term backpropagation algorithm proposed by the authors of [57], shown in Algorithm 1. This algorithm uses an extra term called the Proportional Factor (PF) to the standard backpropagation algorithm. This PF speeds up the weight adjustment process by increasing the convergence rate and decreasing learning stalls while maintaining the simplicity and efficiency of the standard backpropagation algorithm [58].

end end end
A stability analysis of the three-term backpropagation was studied in the work by the authors of [58] to test the convergence rate and stability of the algorithm. This training algorithm has proven to be effective in training a network with good prediction accuracy and a high convergence rate [58,59].
A deep learning model with dropout can be trained using the stochastic gradient descent which can be similar to a standard neural network, the only difference here is the random dropping of units in the network's hidden layers. Different methods have been used to improve the standard gradient descent algorithm such as momentum, annealed learning rates, as well as L2 weight decay [55]. Here, the effectiveness of the dropout trained method using the three-term backpropagation algorithm is demonstrated. The three-term backpropagation algorithm speeds up weight space adjustment compared to a conventional backpropagation algorithm. The dropout has proven to be successful for computer vision tasks as it helps to avoid overfitting and improve generalization [60,61].

Hyperparameters
A deep learning model requires the modification of various hyperparameters in order to improve the results, and these largely depend on the dataset and other hyperparameters. The backpropagation algorithm involves two parameters in updating the weights during training which are: the learning rate (α) and momentum factor (β).
The initial learning rate α is one of the most important hyperparameters; too small a learning rate makes the network learn slowly, and too large a learning rate possibly leads to oscillation preventing the error falling below a certain value.
The momentum factor β is believed to make the learning procedure more stable and accelerate convergence in shallow regions of the error function, which in practice does not always happen [62].
The extra term introduced by the three-term back propagation algorithm, called the proportional factor (γ), speeds up the weight adjustment process by increasing the convergence rate and decreasing learning stalls of the algorithm.
The best choice of these parameters depends on the problem which often requires a trial and error process before a suitable choice is found [63]. Having run the experiments a number of times based on trial an error, the optimum values of the parameters were achieved which trained the network and output good results.

Architecture of the Deep Learning Model
The deep learning model was designed using the 288 bands as input to the network. Each pixel is taken as an independent input to the network. In this research study, we do not consider the spatial information. The network has four hidden layers each containing 30 nodes and one output corresponding to the abundance of hydrocarbon. The network was trained using the ground truth abundances for the different mixtures, as detailed in Table 4.
The data was randomly divided into 3 categories, namely: training, validation, and test sets. The training set is used to fit the parameters of the deep learning model, the test set (unseen data) is used to investigate the predictive power of the model while the validation set is used to avoid overfitting using the cross-validation algorithm.
The cross-validation algorithm avoids overfitting because the training sample is independent of the validation sample [64]. The size of the data sets depended on the soils' absorption level during the experiment (i.e., when a local shallow pool was formed). Only image pixels corresponding to data from inside the Petri dish were considered. Moreover, for each scanned image, 1000 pixels were randomly selected. Thus the data sets ranged between 5000 pixels × 288 bands (where five mixture types were available) to 10,000 pixels × 288 bands (for samples with ten possible mixtures). The size of the data sets and number of mixtures used for the experiments (see Table 1) are summarized in Table 3. Subsets of the hyperspectral data were fed into the network as follows: 80% of the data were randomly selected for training the network, 10% were used to test the network and 10% were used for cross-validation.
We compared two neural network architectures one with and another without dropout, respectively. This is to prove the dropout's efficiency to improve the generalization capabilities of the neural network.
The network used a sigmoid activation function which was applied to the hidden and output nodes.
The deep learning abundance estimation experiments were conducted to obtain optimum hyperparameters in order to achieve maximum accuracy in estimating the amount of HCs in each soil mixture type. The ground truth, or known abundances from the sample preparation, were used as class labels (targets) to train the network for the abundance estimation. These ground truth abundances were estimated based on the HC type in each data set as detailed in Table 4 and depend on the density of each HC. All the experiments were conducted with the learning rate of α set to 0.01, β set to 0.5, and γ set to 0.1, which allowed convergence of the objective function at a high rate. The algorithm was run iteratively with 20 epochs.
Moreover, in order to find the optimum level of dropout, the models were trained using the three-term backpropagation algorithm with different ranges of dropout (10-50%).

Results
In this section, we present the results obtained from the deep learning model demonstrating the abundance estimation of the different HCs. Results are presented to demonstrate the effectiveness of dropout in the model in terms of generalization capabilities. We also show the accuracy of the proposed method compared to the hybrid spectral unmixing method and DL models trained with conventional backpropagation with and without dropout, respectively. Results with laboratory and remote sensed data are presented. The algorithms were implemented using MatLab 2018b. The experiments were carried out on an LG desktop with Intel (R) core (TM) 2 Duo CPU 3.00 GHZ processor 8.00 GB RAM.

Experiment with Laboratory Data
The reflectance spectra of different soil samples with 15% hydrocarbon concentration mixture are shown in Figure 5 showing specific absorption at around 1700 µm and 2300 µm, respectively  The ground truth abundances in Table 4 were used to estimate the amount of hydrocarbon used in the experiment. The abundances were calculated based on the density of the different hydrocarbon types. The aim is to quantify the percentage or amount of HC in each pixel using deep learning and hybrid spectral unmixing method (using abundance estimation). This was calculated based on the saturation level of the different hydrocarbons as shown in Table 1.
To demonstrate the ability of the proposed deep learning model to generalize on unseen data, Table 5 displays the results obtained from the test sets with and without dropout, respectively. The experimental process was repeated with different dropout ratios on the hidden layers of 10%, 20%, 30%, 40%, and 50% respectively. Results demonstrate both the training and validation accuracy of the network. Tables 6-9 illustrate the mean square error of the proposed method with the different dropout ratios. It is noted that in all cases the error is 10 times lower for dropout ratio of 40% than for 50%.
Then again when the error drops significantly for dropout ratio 20%. However, when it is further reduced to 10%, the error increases. The 20% dropout is adopted subsequently in the rest of the experiments.     Figures 6 and 7 for individual soil types contaminated with different HCs to confirm the accuracy of the method. Figure 6 shows the mean square error during training, and demonstrates the network's ability to converge rapidly with low numbers of epochs. The plots in Figure 7 show the model's estimated output and target output for 4 different combinations. It is observed that the DL model quantifies correctly all the different HC abundances with low error.  From the results obtained, it is noted that the proposed method was able to generalize on unseen testing and validation data with high prediction accuracy. We observed a similar trend on all the datasets used for the experiment which indicates a reduction in the error rate and a high convergence rate.

Results of the experiments are shown in
To demonstrate the effectiveness of the proposed method, a deep learning model trained with a conventional backpropagation algorithm was similarly used to quantify the HC abundances; first without dropout, and then with 20 % dropout to train the networks. For fair comparison, the same network structure was used and these include: number of layers, number of nodes for each layer, range of initial values, and learning rate. Another comparison was conducted with the hybrid spectral unmixing method for switching between linear and nonlinear methods [21]. Hybrid spectral unmixing uses a neural network to determine the most appropriate method among a set of linear and non linear unmixing method for each pixel in the scene. Specifically, we used Vertex Component Analysis (VCA) [65], Fully Constrained Least Square Method (FCLS) [66], Generalized Bilinear Mixing Model (GBM) [67], and the Polynomial Post Nonlinear Mixing Model (PPNMM) [68]. This means that the hybrid switch method selects the best of these four methods for each pixel.
From the results obtained, it may be observed that our proposed method outperforms the hybrid switch method and conventionally trained networks with the closest estimate from the ground truth values as demonstrated in Tables 10 and 11.

Soil Continuity Experiments
In this research, four different mixtures of soil were created and HC were added in discrete steps. However, in real life situations, both HC and soil levels of given samples are continuous rather than discrete. Therefore, in order to simulate a more realistic scenario, several strategies were explored. The first strategy was to create a generic model with all soils combined as opposed to separate models for each soil as in the previous experiments. It is noted that the soils were prepared and mixed manually and contained grains of different size (e.g., clay and sand mixture). By feeding the DL network with all types of soils, differences in the soil composition would appear from pixel to pixel. DLs were created including all four different soil mixtures (Clay, Clay-loam, Sandy loam, and Sandy clay-loam) rather than individually. Using the same architecture of the deep learning model, 80% of the resultant data was used to train the model, 10% was used as test sample and the remaining 10% was used for cross-validation. This was conducted to validate the network's ability to estimate the amount of HC regardless of the soil type and allowing for different soil types. The results were in the same range as for the different HC. Table 11 summarizes the results obtained for biodiesel. It is noted the training MSE was in the same range as the individual model although the number of epochs required increased to 178. Average MSE for the individual models are shown in brackets for comparison purposes. The generalization MSE on the training data is higher than in the individual models. However, it is noticed that this data is more complex as the individual models as it contains four different soil types. In the work of the authors of [16], an individual model per soil type was recommended as the generic models degraded the responses. We believe that with further tuning of the hyperparameters, improved results could have been achieved. However, this is out of the scope of this paper.
In order to simulate a more realistic scenario, and following a similar approach presented in the work by the authors of [16], noise was added to the data to simulate continuous spectra values instead of discrete and also to evaluate the noise rejection of the models. Here, the datasets were corrupted with Random Gaussian noise with signal-to-noise ratio (SNR) ranging from 10 to 40 dB. The model shows similar performance even with low SNR, showing good noise rejection and accurate prediction at different ranges of SNR. For SNR lower than 20 dB, the generalization error deteriorates slightly. Initially, we added noise to the test data but not to the training, to simulate training with discrete samples and testing with continuous samples. The performance deteriorates for SNR = 10 dB, while for higher SNR, it is very stable showing good adaptation of the model to continuous spectra.

Experiment with Remote Sensed Data
A remote sensed data captured by an airborne system, adjusted to work under stationary condition in the field, was used to validate our proposed method. This dataset contains soils contaminated with different levels of hydrocarbon (between 0 to 10 wt% in steps of 1 wt%) that were acquired at three different locations (Hamra, Kokhav, and Evrona) with a Hyper-Cam LW instrument. Each pixel responses are captured by 88 spectral bands in the spectral range of 8 to 12 µm with spectral resolution of 0.25 cm −1 . The experimental protocol, data capturing, and preprocessing of these datasets are fully described in the work by the authors of [16].
Each one of the 3 datasets was independently trained with a DL network. For each network, 75% of the samples were randomly selected for training and 25% were used for testing. These parameters were selected similar to the work by the authors of [16] in order to provide a fair comparison to the results they presented. A dropout ratio of 20% was used and all other hyperparameters were left as in our previous configuration. Results are presented as MSE for each dataset in Table 12. It was noted that our results surpassed in term of prediction accuracy the ones presented in the literature for these datasets. Moreover, the results show good generalization capabilities. Results on all 3 datasets shows that our proposed DL method achieved acceptable results with consistent MSE values as shown in Figure 8c for both training and generalization.

Discussion
In this study, controlled hyperspectral datasets were used to assess the capabilities of the deep learning model to predict and quantify the amount of HC spills on different soil types. The deep learning approach was trained using a three-term backpropagation algorithm with dropout technique. The deep learning model designed for this experiment utilizes a sigmoid activation function and dropout of 20% in all the hidden layers of the architecture in order to avoid overfitting. Another advantage of utilizing dropout is its ability to generalize.
The main aim of the three-term backpropagation algorithm was to reduce the number of training epochs and maintain the system's stability during training. Our proposed method was able to estimate the amount of HCs in each dataset with high accuracy using a low number of epochs. The network was able to achieve an average of 2.2 × 10 −4 mean square error on an average of 18 epochs as shown in Tables 5-8 and Figure 6.
Dropout plays an important role in the architecture of the proposed deep learning model by improving the performance of the model and avoided overfitting on the training data. This can be proven from Table 5, where the results show the ability of the model to generalize on unseen data with good accuracy.
From the results obtained, it may be observed that hydrocarbon can be estimated even at low levels as shown in Tables 10 and 11.  Tables 10 and 11 summarize the abundance estimation of the quantity of HCs in the different mixture types using our proposed method, the hybrid spectral unmixing method, and the conventionally trained NN with and without dropout. For instance, if we observe the first mixture from Table 10 (biodiesel mixed with clay), it is noted that for pure clay sample (reference 0% HC), all methods provide very close estimate. The second raw presents the results for the same mixture with 8% of biodiesel, 92% of clay. Our proposed method estimated 8.4% of biodiesel with the hybrid switch method at 9.4%, while the neural network trained with a standard backpropagation algorithm with and without dropout estimated at 9.8% and 9.9%, respectively. Similar results were obtained for all the different samples as shown in Tables 10 and 11. In addition, the soil's properties such as grain size and texture can lead to a variation in the absorption level, and thus the difference in detection of the different hydrocarbon types. The hybrid switch method was also able to estimate the amount of hydrocarbon spills with reasonable accuracy unlike the conventionally trained neural network which has low accuracy compared to the proposed method.
In the work by the authors of [69], it was revealed that deep learning requires large amounts of training data to achieve optimal performance and good generalization with minimum error. However, despite the size of the datasets used, the training process is faster with our proposed method, attributed to the use of the three-term backpropagation algorithm for training and the use of cross-validation with dropout. The authors of [55] reported that large neural networks trained in the standard way tend to overfit on small datasets. To see if dropout can improve this condition, we ran the experiment on all the datasets and varied the dropout ratio as shown in Tables 6-9. From the results obtained, the error rate is relatively low when the dropout ratio is between 10% and 30%, with a slight increase when the ratio is set above 50%. Therefore, it can be concluded that dropout ratio between 10% and 30% provides an acceptable prediction estimation and could be used in any dataset for estimation analysis.
The proposed deep learning method was further validated on field datasets. In particular, the Hamra soils produced a better results with lower MSE of 0.48 × 10 −4 compared to the Kokhav and Evrona soils as shown in Figure 8c. This shows a similar trend to what was obtained in the work by the authors of [16]. Although the deep learning model did not estimate well abundances between 1 and 3 wt%, this could be attributed to the fact that the data was obtained within the Longwave Infrared Region (LWIR), which has a different spectral range with the laboratory controlled data used in this research. Nevertheless, our proposed deep learning model perform better with all 3 datasets in terms of MSE accuracy.
Recent research in deep learning suggests that a large dataset is required for training remote sensing data, which is a major drawback. This was not the case for our proposed model because the three-term backpropagation algorithm allows it to train faster using a minimum number of epochs to converge. The training process was relatively fast compared to standard networks where parameter updates are noisy in architectures with dropout. Learning takes an average of 28.16 seconds with our proposed deep learning model compared to the conventionally trained neural networks which takes an average of 300.68 seconds to learn. The data processing-to-end-product time of our proposed method is relatively more time-consuming compared to traditional spectral unmixing method, which is approximately 35.45 seconds compared to the hybrid spectral unmixing method which has an average of 24.64 seconds. This could be attributed to the number of parameters and training in the deep learning method compared to the spectral unmixing method.
However, we anticipate that including a wider range of data such as field HC leaks, will require larger datasets and longer training to account for the variability of the data. Nevertheless, the proposed methodology has proven to decrease significantly both the learning time and sample data required to achieve accurate generalization.

Conclusions
In this paper, we developed a deep learning approach to accurately estimate hydrocarbon spills on different soil samples measured using imaging spectroscopy. The deep learning model was trained using a three-term backpropagation algorithm with dropout. The aim was to improve the accuracy of the model, avoid overfitting, and converge faster.
Standard backpropagation algorithms build co-adaptation, which work well on training data but the network does not generalize to unseen data. Dropout neutralizes these co-adaptations by improving the network's performance, thus enabling it to generalize. The choice of dropout ratio used in any neural network depends on the type of dataset and application.
The three-term backpropagation algorithm improves the network's ability to train faster and overcome local minima when compared to conventional backpropagation algorithms. The effectiveness of the deep learning model was verified when tested on the datasets containing different soil samples mixed with different hydrocarbon types to estimate the amount of hydrocarbon spills in each dataset. The datasets were acquired using a Hyspex 384 SWIR camera under laboratory conditions. Many studies have shown the ability to detect HC using spectroscopy in the SWIR region. The results of the experiments consistently show that the proposed method provides high prediction accuracy with low error even for amounts of HC as low as 6.8%. Therefore, it can be concluded that the three-term backpropagation algorithm with dropout significantly improves the model's operation. The deep learning model was further applied on three datasets acquired with an airborne LWIR camera in field conditions which proved the effectiveness of the proposed method and its applicability in real world scenarios.
Satellite and airborne hyperspectral data with ground truth are expensive, thus making it difficult to obtain; but with the emergence of new lightweight sensors mounted on Unmanned Aerial Vehicles (UAV), the potential application of this research is very large. It is noted that data acquired in field conditions could be affected by several limitations such as variable illumination, atmospheric conditions, and sensor sampling distance which could affect the accuracy in using such dataset. However, in the work by the authors of [70], the correlation between datasets obtained under laboratory and outdoor conditions was demonstrated. Thus, a neural network could be trained with laboratory data and validated using remote UAV or airborne data. The information provided in this research study can be used as a guide to understand the potential and limitations of a hyperspectral sensor for HC abundance estimation.
Future work will develop networks that are able to classify and identify different types of HC incorporating also spatial information using convolutional neural networks.