Hybrid Spectral Unmixing : Using Artificial Neural Networks for Linear / Non-Linear Switching

Spectral unmixing is a key process in identifying spectral signature of materials and quantifying their spatial distribution over an image. The linear model is expected to provide acceptable results when two assumptions are satisfied: (1) The mixing process should occur at macroscopic level and (2) Photons must interact with single material before reaching the sensor. However, these assumptions do not always hold and more complex nonlinear models are required. This study proposes a new hybrid method for switching between linear and nonlinear spectral unmixing of hyperspectral data based on artificial neural networks. The neural networks was trained with parameters within a window of the pixel under consideration. These parameters are computed to represent the diversity of the neighboring pixels and are based on the Spectral Angular Distance, Covariance and a non linearity parameter. The endmembers were extracted using Vertex Component Analysis while the abundances were estimated using the method identified by the neural networks (Vertex Component Analysis, Fully Constraint Least Square Method, Polynomial Post Nonlinear Mixing Model or Generalized Bilinear Model). Results show that the hybrid method performs better than each of the individual techniques with high overall accuracy, while the abundance estimation error is significantly lower than that obtained using the individual methods. Experiments on both synthetic dataset and real hyperspectral images demonstrated that the proposed hybrid switch method is efficient for solving spectral unmixing of hyperspectral images as compared to individual algorithms.


Introduction
Spectral Unmixing (SU) is the process of identifying spectral signatures of materials often referred to as endmembers and also estimates their relative abundance to the measured spectra.Spectral unmixing is used in a wide range of applications including crop/vegetation classification, disaster monitoring, surveillance, planetary exploration, food industry, fire and chemical spread detection and wild animal tracking [1].Endmembers play an important role in exploring spectral information of a hyperspectral image [2,3] the extraction of endmembers is the first and most crucial step in any image analysis which is the process of obtaining pure signatures of different features present in an image [1,4,5].SU often requires the definition of the mixing model underlying the observations as presented on the data.A mixing model describes how the endmembers are combined to form the mixed spectrum as measured by the sensor [6].Given the mixing model, SU then estimates the inverse of the formation process to infer the quantity of interest, specifically the endmembers, and abundance from the collected spectra [7,8].This could be achieved through a radiative transfer model which accurately describes the light scattering by the materials in the observed scene by a sensor [6].
The most common approach to spectral unmixing is the linear spectral unmixing [6,7], which assumes that each photon reaching the sensor interacts with only one material as measured by the spectrum [7].Promising and excellent results have been recorded with linear spectral unmixing methods as proposed by Keshava and Mustard [1], with some of the commonly used linear mixture models being; Adaptive Spectral Mixture Analysis (ALSMA) [9], Subspace Matching Pursuit (SMP) [10], Orthogonal Matching Pursuit (OMP) [11].Li et al. [12] proposed a robust collaborative sparse regression method to spectrally unmix hyperspectral data based on a robust linear mixture model.Thouvenin et al. [13] proposed a linear mixing model which explicitly accounts for spatial and spectral endmembers variability.Foody and Cox [14] used a linear mixture model and regression based fuzzy membership function to estimate land cover composition while in [15] the use of the VCA algorithm is demonstrated to unmix hyperspectral data with relatively lower computational complexity compared to other conventional methods.Non linear mixing models cope with nonlinear interactions capturing effects that are mostly present in an image [7].Li et al. [12] proposed a robust collaborative sparse regression method using a robust linear mixture model which takes into account nonlinearity in the image and treat them as mere outliers.The linear spectral unmixing method generally provides poor accuracy when the light suffers multiple interactions between distinct endmembers or intimate interaction before reaching the sensor [16,17].In this case, the linear mixture model can be advantageously replaced with nonlinear methods [18,19] which provides an alternative approach to SU.When interactions occur at a microscopic level, it is said that the materials are intimately mixed.A model proposed by Hapke [6] describes the interactions suffered by light when it comes into contact with a surface composed of particles; they involve meaningful and interpretable quantities that have physical significance, however, these models require a nonlinear formulation which is complex and complicates the derivation of the unmixing strategies [7].These methods account for the intimate mixture of materials, as covered by a scene, in a dataset [1,8].Different nonlinear mixing models exist, some motivated by physical arguments such as bilinear models, while others exploit a more flexible nonlinear mathematical model to improve the performance of the unmixing method [7].Nonlinear models can be grouped into several classes such as: intimate mixture models [1], bilinear models [20], physics based nonlinear mixing models [20], polynomial post nonlinear mixing models [21].Nascimento and Dias [22] solve the nonlinear unmixing problem with an intimate mixture model.This method first converts the observed reflectance into albedo using a look-up table, then a linear algorithm estimates the end members albedo and mass fraction for each sample.Chen et al. [18] formulated a new kernel-based paradigm that relies on the assumption that the mixing mechanism can be described by a linear mixture of end member spectra, with additive nonlinear fluctuations defined in a reproducing Kernel Hilbert Space.Hapke [6] derive an analytical model used to express the measured reflectance as a function of parameters intrinsic to the mixtures, these include mass fraction, density size and single scattering albedo.The main limitation is that these models depend solely on parameters inherent to the experiment because they require the full information of the geometric position of the sensor with respect to the observed samples therefore making the inversion process more challenging to implement especially when the spectral signatures of the endmembers are unknown [1].
Another effect that has been considered to great extend is the endmember variability during spectral unmixing due to atmospheric and temporal conditions.Machine learning methods have worked well to account for spectral variability.The combination of spectral information and spatial context may improve the accuracy of the results for hyperspectral unmixing and classification [23].Techniques such as morphological filters [24], Markov Random Fields (MRF) [23,25,26] Zhang [27] , Support Vector machines (SVM) [28] and Self Organizing Maps (SOM) [29] among others have been proposed to impose spatial information.MRF, in particular, is a very powerful tool used to describe neighborhood dependence between image pixels and have proven to provide accurate results for hyperspectral image classification.MRF are effective under the Bayesian inferring framework to incorporate spatial information which proves to provide accurate results in classification and unmixing of hyperspectral data [23].Markov Random Fields is a method that integrates spatial correlation information into the posterior probability distribution of the spectral features [25].SVM have shown excellent performance with high classification accuracies when applied to datasets with limited number of training samples [30].Artificial Neural Networks (ANN) are mathematical models that were initially developed to mimic the complex pattern of neuron interconnections in the human brain [31,32].Presently, a lot of feed-forward neural networks models have been extensively studied in fault detection and diagnosis of mechanical systems.Moreover, ANN have been successfully applied for many years with excellent performance in pattern recognition [33], and in particular for spectral data [34,35].SOM is one of the most widely used unsupervised neural network algorithms successfully applied for hyperspectral image classification [29,36,37] and data visualization [38].Alternative approaches include rule base fuzzy logic [39][40][41] and Markovian jump systems [42,43] which could be combined with ANN for switching decision making.
Deep learning involves modeling, which hierarchically learn features of input data using Artificial Neural Networks (ANN) and typically have more than three layers [44].Deep learning has been extensively used in the literature for a range of different applications such as vehicle detection [45,46], investigated avalanche search and rescue operations with Unmanned Areal Vehicles (UAV), change detection [47,48].In this scheme, high level features are learned from low level ones where the features derived can be formulated for pattern recognition classification [49].Neural network pattern recognition is often used to classify input data into a set of target categories by training a network to evaluate its performance using a confusion matrix.The application of neural networks has been demonstrated in the field of remote sensing and hyperspectral unmixing due to their ability to recognize complex patterns in high dimensional images [50].Neural network based unmixing of hyperspectral imagery has produced excellent results [51].Lyu et al. [48] have demonstrated neural networks to be a good tool for unmixing using both linear and nonlinear methods simultaneously [52].In [46], the use of artificial neural networks was reported to detect and count cars in Unmanned Areal Vehicle (UAV) images.Wu and Prasad [53] used neural networks for hyperspectral data classification, where a recurrent neural network was used to model the dependencies between different spectral bands and learn more discriminative features for hyperspectral data classification.Li et al. [35] reported the use of a 3D convolution neural network to extract spectral -spatial combined features from a hyperspectral image.Kumar et al. [51] used a linear mixture model to unmix hyperspectral data and then neural networks to predict a fraction of the data that accounts for the nonlinear mixture; they used ground truth data and the abundance estimated by the linear method to train the network for effective validation.Giorgio and Frate [50] used neural networks to unmix hyperspectral data to estimate endmembers and their abundance.Atkinson and Lewis [54] applied neural networks to decompose hyperspectral data and compared their results with a linear unmixing model and a fuzzy c-mean classifier; results showed that the neural networks outperformed the conventional linear unmixing method.
Little work in combining the linear and nonlinear approaches has been presented in the literature, and in particular the selection of the most appropriate technique in using the two methods.In this paper, we note that some nonlinear methods are a better method in scenes with multiple interactions and a complex mixture of features commonly composed of multi-layered materials.The linear model is appropriate for images that have a single cover type of material in a pixel.The objective of this paper is to propose a new hybrid methodology for switching between linear and nonlinear spectral unmixing methods using artificial neural networks based on deep learning strategies.The paper is organized as follows.Section 2 describes our methodology.Experimental results are presented in Section 3, results were discussed in section 4 and Conclusions are drawn in Section 5.

Research Design
In this study, two linear and two nonlinear spectral unmixing methods were adopted to unmix hyperspectral data.The question as to whether a mixed pixel is better explained with a linear or nonlinear process is still an unresolved problem in spectral analysis.Researchers have identified temporal, spectral and spatial variability that maybe due for instance to variable illumination, environmental, atmospheric, and temporal conditions in the scenes as the main error in spectral unmixing [55].Thus endmember variability problem has been deeply studied, neglecting the effects of multiple scattering and the resulting nonlinear mixing [56].Non linearities may occur when the photons interact with different material before reaching the sensor.In that sense, studies suggest that linear mixing is associated to mixtures for which the pixel components appear in spatially segregated patterns, (checker board scene) [55].The structure of the canopy and the spatial distribution of the plants area are also known to play an important role in nonlinearity [56].This paper proposes a novel approach to decide whether a mixed pixel is better explained with a linear or non linear model.Here we use a ANN to learn and to decide the non-linearity of a pixel based on some simple spatial and spectral features.The methods chosen were the state of the art methods that have been used extensively as reference in literature.They are: the Vertex Component Analysis (VCA) [15] and Fully Constrained Least Square Method (FCLS) [57] for the linear models, and the Polynomial Post Nonlinear Mixing Model (PPNMM) [21] and Generalized Bilinear Model (GBM) [58] for the nonlinear models.Hybridization between the methods was experimented with Artificial Neural Networks (ANN) to conduct a switch between the linear and nonlinear models.

Vertex Component Analysis (VCA)
This algorithm is based on the geometry of convex sets and exploits the fact that endmembers occupy the vertices of a simplex [59].The VCA algorithm assumes the presence of spectrally pure pixels in a dataset and iteratively projects the data onto the direction orthogonal to the subspace spanned by the end members that are already determined Weeks [4].The new endmember signature corresponds to the extreme of the projection.The algorithm iterates until all endmembers are exhausted Bioucas et al. [60].

Fully Constrained Least Square Method (FCLS)
The FCLS algorithm is derived from an unconstrained least square based orthogonal subspace projection Heinz [57]; in this method, negative values are considered 0 and the abundance fractions of the remaining material signatures are normalized to 1. FCLS utilizes a simplex method to produce a set of feasible solutions for spectral unmixing of material signatures while discarding the negative abundance values of the remaining material signatures to unity [57].

Polynomial Post Nonlinear Mixture (PPNM)
This model assumes that the reflectance of an image are nonlinear functions of pure spectral components contaminated by additive noise; the nonlinear functions are often approximated using polynomial function leading to a polynomial post nonlinear mixing model Altmann et al. [21].
The model involves linear and quadratic functions of the abundances.In this case, the R-spectrum Y = [y 1 , ..., y R ] T of a mixed pixel is defined as a nonlinear transformation g of a linear mixture of L spectra m l contaminated by additive noise n.
where m l is the spectrum of the l th material in the scene, a l its corresponding proportion, L is the number of endmembers contained in the image and g is an appropriate nonlinear function.
Another motivation for the PPNMM is the Weierstrass approximation theorem which states that every continuous function defined on an interval can be uniformly approximated by a polynomial with any desired precision [21].

Generalized Bilinear Mixing Model
The GBM model introduces a second term that accounts for multiple photon interactions [20].This model proposes that the spectrum of a mixed pixel, Y can be derived as follows: where m i m j is the Hadamard (term by term) product of the i and j spectra, m i is the spectrum of the endmember i, a i is the corresponding abundance and n is an additive noise.The first model term describes the linear mixture model and the double sum models the nonlinear effect.
] is a real parameter vector, γ i,j ∈ (0, 1), that quantifies the interaction between different spectral components.The parameter introduced in this model is used to obtain a more flexible model Halimi et al. [61].This model also adopts the positivity and sum to one constraints.

Vicinity Parameters
The objective of this study is to switch between linear and nonlinear methods depending on the mixture type of the neighboring pixels.The linear model is expected to provide acceptable results when two assumptions are satisfied [1] i.e., the mixing process should occur at macroscopic level and the photons must interact with single material before reaching the sensor (checker board scene).Generally this can not be known a priori and might change in different parts on a given scene.Most profound sources of error in spectral mixture analysis, however, lies in the lack of ability to account for sufficient temporal and spatial spectral variability [55].Endmember variability problem is often caused by spatial and temporal changes thereby neglecting the effects of multiple scattering and the resulting nonlinear mixing [62].In fact, it is more likely that the position, extent and number of stable spectral zones depends on the spatial, spectral and temporal complexity and composition of the endmembers present in the scene [55], therefore, it would be very interesting to design new models and nonlinear unmixing procedures that are capable of simultaneously exploiting the spatial correlation between abundances and nonlinearities to produce best results.Here, we propose a methodology to automatically switch between linear and nonlinear spectral unmixing to provide more accurate results based on deep learning neural network strategies.A number of parameters that are related to the pixels' neighboring characteristics are used.We assume that neighboring pixels in a checkerboard type of scene have more spectral spatial coherent spectrum than those in a nonlinear scene.The following values represent the diversity of the neighborhood for the pixel under consideration to the ones in its vicinity.These values are the minimum and maximum Spectral Angular Distance (SAD), covariance and a nonlinearity parameter.In order to compute these parameters, we defined a window W around the examined pixels of size n × n.

Spectral Angular Distance (SAD)
Spectral Angular Distance (SAD) describes the angular distance between two vectors, this is estimated by computing the cosine of the angles between the actual and the estimated endmembers [63].
The SAD between two spectra: U = (U i , ..., U R ) T and V = (V i , ..., V R ) T is defined as where R is the number of bands and U V are the modules of the vectors.Here, we compute the SAD between all pixels within the window W and use the minimum and maximum values respectively.

Covariance Matrix
The Covariance matrix proposes a way of fusing multiple spectra that are correlated.The variance of each spectra are represented by the diagonal values of the covariance matrix while the non-diagonal values represent the correlation [64].The covariance matrix is defined by the following equation: where µ i is the mean vector of all pixels in band i and X i is the vector containing all pixel values in band i within window W.

Nonlinearity Parameter
The non-linearity parameter, b as computed to a window, is a parameter which quantifies the levels of nonlinearity in a pixel, given as: where is the Hadamard (term by term) product operation, a i and a j are the abundance reflectance spectra of endmembers i and j and L is the number of endmembers.

Learning
An Artificial Neural Network was used to predict the best method when switching between linear and nonlinear spectral unmixing.
The data for the Artificial Neural Networks were divided into 3 categories, namely: training, validation and testing sets.

1.
The training set is used to fit the parameters of the classifier.2. Validation set is used to minimize over-fitting (i.e., verifying the accuracy of the training data) over some untrained data by the networks, while 3. testing sets are used to test the final solution in order to confirm the actual predictive power of the network [65].
The networks were trained with scale conjugate gradient back propagation because it has proven to be efficient and produce accurate results [66][67][68] .The back propagation procedure simply adopts the chain rule derivative [69], this is achieved where the gradient of the objective with respect to the input module, is computed backwards from the output module [69,70].This was considered due to its performance in updating the weight and bias values according to the scaled conjugate gradient; the training stops when certain conditions are met such as the maximum number of epochs is reached, maximum amount of time is exceeded, performance is minimized to the goal and the validation performance has increased more than the maximum it recorded [65].We expect that the linear model will perform better if neighborhood pixels are similar, on the other hand, when the pixel have multiple interactions, we expect higher diversity in the pixels.
The neural networks have 3 layers namely: input, hidden and output layers.The input layer has 12 nodes when using a 3 × 3 window corresponding to the vicinity parameters as described in Section 2.2 (min.SAD, max.SAD, c 1 , c 2 , c 3 .....c 9 , b); the hidden layer has 10 nodes while the output layer has 1 node.The output layer provides the decision between linear and non-linear unmixing models.

Simulated Data
A simulated dataset of images of size 36 × 36 pixels and 224 channels was generated with abundances computed according to a Dirichlet distribution with 21 endmembers.The spectral signatures of the endmembers are mineral reflectance with 224 bands from the ENVI spectral library [15].Additionally, a nonlinearity co-efficient was added ranging between [0, 1] these parameters were tuned accordingly with different numbers of endmembers ranging from 3 to 9. The images were corrupted with Random Gaussian noise with Signal to Noise Ratio (SNR) 10 dB, 30 dB and 50 dB respectively.Figure 1 show the spectral reflectance of endmembers of the simulated data.

Samson Data
Samson data is a hyperspectral data owned by Oregon State University provided by WeoGeo [71], which is a push broom visible to near infrared sensor.The pixel responses are captured by 156 bands in the spectral range of 401 nm-889 nm with resolution up to 3.13 nm.The data has 952 scan lines with 952 pixels in each line.For this experiment a subset of the image covering 95 × 95 pixels was used, which is comprised of three endmembers i.e., soil, tree and water.Figure 2 shows the spectral reflectance of endmembers of the Samson data.

Jasper Ridge
Jasper Ridge is a hyperspectral data cube recorded by AVIRIS over the standard scene of the Jasper Ridge, a biological reserve in California.The dataset consist of 512 × 614 pixels recorded in 224 channels ranging from 380 nm to 2500 nm.The data has a spectral resolution of 9.46 nm.In this experiment, a subset of 100 × 100 pixels was used from the original image and 198 bands were selected after removing those bands with atmospheric effects and dense water vapor.There are four main endmembers in this image: road, soil, water and tree [71].Both datasets and corresponding abundance ground truth are available at [71], and are used as a benchmark to test classification and unmixing algorithms.Figure 3 show the spectral reflectance of endmembers of the Jasper Ridge data.

Experiments with Synthetic Data
This experiment was carried out using the synthetic dataset described in Section 3.1.1,which allows a priori control of the data.Here, VCA, FCLS, PPNMM and GBM methods were used to unmix spectra of mineral mixtures.We compared the accuracy of the individual methods to the proposed hybrid methods for switching between linear and nonlinear spectral unmixing based on the diversity of the neighboring pixels.The algorithms were coded according to [15,21,57,58].The hybrid methods for switching were between VCA-PPNMM, VCA-GBM, FCLS-PPNMM, and FCLS-GBM, respectively.VCA was used to estimate the endmembers as contained in the dataset, while the four methods as well as the hybrid methods were used to estimate the fractional abundances.The experiment was conducted with different numbers of endmembers ranging 3, 5, 7 and 9 and different Signal to Noise Ratios of 10 dB, 30 dB and 50 dB respectively on the simulated dataset.We ran Monte Carlo simulations based on 100 generated images for each experiment.
The switching was predicted using Artificial Neural Networks (ANN).Here, we randomly split the samples into training, validation and test sets.During training, 70% of the datasets were selected to train the network, 15% were used as validation set to learn the hyperparameters of the neural networks and 15% of the remaining samples were used to test the accuracy of the networks.
In the first experiment, a 3 × 3 window was used around the pixel of interest.A vector containing 12 values i.e., SAD min, SAD max, covariance matrix (9 values) and nonlinearity was computed for each pixel as input to train the ANN.Each input data consisted of 12 nodes with the number of hidden nodes set to 10 and the output layer having 1 node which output (0 or 1) corresponding to either a linear or nonlinear approach where a threshold was set at 0.5 for the switching.The Artificial Neural Network was used to choose between a linear or nonlinear approach for each pixel.The overall accuracy and the abundance estimation error of the methods were computed and summarized in Table 1.Results shows that the VCA -PPNMM hybrid method predicted better overall accuracy of 98.8% as estimated by the confusion matrix with neural networks in switching between linear and nonlinear spectral unmixing, followed by FCLS-PPNMM with an overall accuracy of 95.6%, VCA-GBM and FCLS-GBM both have an overall accuracy of 92% and 92.4% respectively.Examples showing the generated data and the error in abudance estimation are shown in Figures 4 and 5.Here have chosen to display a linear method (VCA) and a non linear (GBM) for comaprison purposes and two different signal to noise ratios (SNR=10 and 50, respectively).A second experiment was conducted with the 3 × 3 window where each of the parameters used in creating the input training data was excluded one at a time.This was performed in order to assess the importance of the parameters in the vector created.The experiment was also repeated for SNR values 10 dB, 30 dB, and 50 dB with different endmembers of 3, 5, 7 and 9 respectively.Here we expect to have higher error values when each of the parameters are removed from the vector in comparison with the results in Table 1 where all the parameters are involved in the experiment.Results (Table 2) show that all parameters play an important role in the vector and hybrid switch methods.SAD max proves to be the most important parameter with the highest error value in the experiment where it was excluded for all SNR values as compared to the other parameters.For comparison purpose, the size of the window was increased to 4 × 4 and the experiment repeated to evaluate the accuracy of the parameters.Results show an increase in the error value when each of the parameters is excluded from the ANN input data.In order to assess the accuracy of the methods, we also trained a network with the raw data (i.e., 224 bands) as input instead of the vicinity parameters.Here we have 224 while the rest of the parameters remain the same.Table 3 summarizes the results of 100 Monte Carlo simulations.It is noted that the results are of the same order of magnitude as obtained in Table 1. Figure 3 displays the abundance estimated error by the 4 methods with SNR = 10 dB.From the experiments conducted between the 3 × 3 window and the raw data, it can be seen that the results are similar between the Signal to Noise Ratios 10 dB and 50 dB.However, the results were better with the 3 × 3 window with a Signal to Noise Ratio of 30 dB.Therefore, it can be concluded that the ANN does not require the whole raw data and the reduced choosen parameters provide good results.Figure 5 shows the abundance results with simulated data (SNR = 50 dB).The first row shows the ground truth abundances in grayscale where a white pixel means abundance equal to one for that class and a black pixel means no abundance for that class.The other rows display the error in abundance estimation for each class and each method also coded in grayscale where the brighter the pixel, the higher the error is.

Experiment with Real Data
To evaluate the accuracy of the methods involved, the raw data, as well as the vicinity parameters computed in a 3 × 3 window, and 4 × 4 window respectively, were used to train the neural network.In the first experiment, the Jasper Ridge data was used.The training samples for each experiment were selected randomly, 70% of the samples were used for training (7000 samples), 15% each were considered for validation and testing (1500 samples for validation and 1500 samples for testing) of the neural networks.In a second experiment, the number of training samples were reduced, with 30% used for training (3000 samples), 35% each used for validation and testing (3500 samples for validation and testing) of the neural networks respectively.Finally, the experiment was repeated with 1000 and 300 training samples, respectively.Figure 6 shows the groundtruth abundances and the abundances as estimated by a linear (VCA), nonlinear (PPMM) and the corresponding hybrid methods on the Jasper Ridge data.
The next experiment was with Samson data, where the training, validation and testing samples were randomly selected at 70%, 15% and 15% respectively resulting in 6317 samples for training, 1353 samples each for testing and validation, then number of training samples were reduced to 30%; 35% for validation and 35% for testing which is equivalent to 2707 samples for training, 3158 samples each for validation and testing the neural networks respectively.Finally, the experiment was repeated with 1000 and 300 training samples, respectively.Figure 7 shows the groundtruth abundances, and the abundances as estimated by the a linear (VCA), nonlinear (PPMM) and hybrid methods on the Samson data.
The experiment was repeated on a 3 × 3 window, and 4 × 4 window.This was to compare and evaluate the accuracy of the hybrid methods with regards to the size of the data used to train the networks.The results on both datasets show that our proposed methods achieved the best results in all scenarios.Results of the experiments based on the abundance estimation error, are summarized in Tables 4 and 5.The overall accuracy of the network, the abundance estimation error, the training, validation and testing abundance error of the networks were used to evaluate the performance of the methods investigated in this paper.From the results obtained, experiments with the raw dataset, 3 × 3 and 4 × 4 windows produce similar overall accuracy in all the experiments.It indicates that the hybrid methods for switching between linear and nonlinear spectral unmixing are more effective than the individual methods, meanwhile, it can also be said that ANN pattern recognition has good capability in recognizing patterns which is very effective even with fewer samples used to train the network.From the four hybrid switch methods, the VCA -PPNMM method outperforms the other methods with a higher overall accuracy of 96% as compared to the other methods, FCLS -PPNMM has an overall accuracy of 94.5% while VCA -GBM and FCLS -GBM both have overall accuracies of 92.8%.VCA -PPNMM also has the lowest abundance estimation error and produced the lowest abundance error in terms of training, validation and testing of the neural networks.However, it can be observed that the proposed hybrid switch methods obtained similar results when using the 3 × 3 and 4 × 4 window to conduct the experiment when fewer samples were used to train the networks.Therefore, it shows that the proposed hybrid method does not requires all the raw data for training the networks and can be used effectively to switch between linear and nonlinear spectral unmixing of hyperspectral data.In terms of computational time, the individual methods are 40% more time consuming compared to the hybrid method thereby making them computationally expensive in terms of simulation.Tables 6  and 7

Results
Nonlinearity occurs when the photons interact with different materials before reaching the sensor.We assumed here that the linear mixing could be associated to mixtures for which the pixel components appear in spatially segregated patterns.More specifically areas that are spatially correlated are more likely to be explained with the linear model.In this paper, we first used controlled simulated data.Each image consisted of a series of regions.Each region had the same type of ground cover with added noise. Figure 5 shows the results for the simulated data with 5 classes and SNR = 50 dB.Although the average error is in the same order of magnitude for both linear and nonlinear approaches, the distribution of the error differs.It is noted that the linear model (VCA) detects well the low abundances of classes and pixels that do not contain a particular class (shown in black in the ground truth figures).The errors are related to quantification rather than to detecting the wrong class.This might be due to the algorithm performing worse with high spectral variability within the classes.The non-linear method (GBM), on the other hand, returns an error which is more uniform and not so related to the spatial pattern of the data or spectral variability as displayed in Figures 4  and 5.The proposed approach assumptions are further proven with the real data sets.In particular, the Jasper Ridge data set includes different classes; water, soil and road.Figure 6 shows the abundance estimation for the different methods.It is noted that VCA has been reported to underperform in this data set [27].However, the road class is very well identified against the non-linear methods that failed to detect this class.On the other hand, the linear methods failed to classify correctly the water class which is more spectrally variable medium.Thus, it seems that noise and endmember spectral variability makes the non-linear models outperforming the linear ones while the spatially structured areas are well defined with the linear model.The vicinity parameters used in this paper address both the spatial and spectral diversity of the data.The test shown in Table 2 showed that all parameters played an important role in the decision making process.Moreover, Figures 6 and 7 also support that the chosen features are suitable and that the switching is appropriate achieving improved results.

Advantages and Limitations
The proposed method provides a switch between unmixing methods for given spectral images.It can not only provide more accurate results, as shown in the experimental section but also reduce computational costs by selecting the most appropriate approach.This research study has proven the capabilities of the proposed methodology based on certain parameters.However, the supervised ANN relies on having ground truth data for training which is not always available.Future work will expand to unsupervised approaches such as self-organizing maps which have been successfully used in spectral data for classification and anomaly detection tasks [33,72].Although we used spatial and spectral features within windows for learning and thus to make the decision, the switching was made at individual pixel level.Thus future work will base the decision on group of pixels or areas using for instance Markov random fields.

Conclusions
In this paper, a new hybrid switch method for switching between linear and nonlinear spectral unmixing of hyperspectral data based on deep learning neural networks is proposed.The endmembers were extracted using VCA while the abundances were estimated using individual and hybrid methods.The ANN was trained with a set of parameters extracted from the diversity of the neighboring pixels of the images computed within a 3 × 3 and 4 × 4 window.These parameters are spectral angular distance, covariance and nonlinearity parameters.Experiments were conducted with different Signal to Noise Ratio (SNR) ranging between 10 dB, 30 dB, 50 dB and different numbers of endmembers: 3, 5, 7 and 9.We have noted that the hybrid methods are more suitable than the individual technique with high overall accuracy and the abundance estimation error is significantly lower than that obtained with the individual methods in particular, VCA -PPNMM proved to be the best with about 98% accuracy in all the experiments conducted.The experiment with the Jasper Ridge and Samson data confirmed the effectiveness of the approach.The method was applied to two real datasets with ANN trained using 70%, 30%, 10% and 0.3% samples.Experimentation with the real data, 3 × 3 window and 4 × 4 window vectors, proved the effectiveness of the hybrid switch methods, the results show that the size of datasets used for training the network and the vector size does not affect the accuracy of the hybrid methods in switching between linear and nonlinear spectral unmixing, which means that the network can be trained with less sample data without the loss of prediction accuracy.An area to consider for future research is the application of Markovian Jump method for switching between linear/nonlinear spectral unmixing.

Figure 4 .
Figure 4. Abundance estimation errors with simulated data with 5 endmembers (SNR = 10 dB).The first row shows the ground truth abundances for the 5 classes.From the top, then the error in abundances as estimated by the hybrid, VCA and GBM methods, respectively.

Figure 5 .
Figure 5. Abundance estimation errors with simulated data with 5 endmembers (SNR = 50 dB).The first row shows the ground truth abundances for the 5 classes.From the top, then the error in abundances as estimated by the hybrid, VCA and GBM methods, respectively.

Figure 6 .
Figure 6.Abundance estimate of endmembers of the Jasper Ridge data showing from left; the groundtruth, linear (VCA), nonlinear (PPMM) and the hybrid methods.From top water; tree; soil; and road.
summarizes the result of the experiments showing the accuracy of the neural network based on training, testing and validation of the networks.

Figure 7 .Table 4 .
Figure 7. Abundance estimate of endmembers of the Samson data showing the groundtruth, linear (VCA), nonlinear (PPMM) and the hybrid methods.From top water; rock; tree.

Table 1 .
Abundance estimation error (3 × 3 window )of the individual and hybrid methods between linear and nonlinear spectral unmixing with different signal to noise ratios and endmembers.The best results are shown in bold.

Table 2 .
Abundance estimation error (3 × 3 window) of the individual and hybrid methods between linear and nonlinear spectral unmixing with different signal to noise ratios and 3 endmembers where each of the parameters is removed one at a time.The best results are shown in bold.

Table 3 .
Abundance estimation error with the individual and hybrid methods of the raw hyperspectral data between linear and nonlinear spectral unmixing with different signal to noise ratios and different endmembers.The best results are shown in bold.

Table 5 .
Average abundance estimation error of the hybrid methods with different numbers of training samples (6317 to 300) and different window size vectors on the Samson data as compared with the abundance estimation error of the individual methods which are: PPNMM = 0.1455, GBM = 0.1588, VCA = 0.1254, and FCLS = 0.1577.The best results are shown in bold.

Table 6 .
Abundance estimation error on Jasper Ridge data showing training, validation and testing accuracy on the individual and hybrid methods with different training samples and different window size vectors.The best results are shown in bold.

Table 7 .
Abundance estimation error on Samson data showing training, validation and testing accuracy on the individual and hybrid methods with different training samples and different window size vectors.The best results are shown in bold.