Morphological Principal Component Analysis for Hyperspectral Image Analysis

,


Introduction
Hyperspectral images allow us to reconstruct the spectral profiles of objects imaged by the acquisition of several tens or hundred of narrow spectral bands.Conventionally, in many applications hyperspectral images are reduced in dimension before any processing.Many hyperspectral image reduction methods are linear and do not care about the multiple sources of nonlinearity presented in this kind of images [1].Nonlinear reduction techniques are nowadays widely used on data reduction, and some of them have been used in hyperspectral images [2].Nevertheless, most of these techniques present some disadvantages [3] in comparison to the canonical linear principal component analysis (PCA).That is the rationale behind our choice of PCA as starting point.In particular, one major drawback of those nonlinear techniques is that they are computationally too complex in comparison to PCA.Hence most of the time, they cannot be applied on real full images.Another common disadvantage of classical linear and nonlinear dimensionality reduction techniques is that they consider a hyperspectral image as a set of vectors.They are appropriate when the data do not present useful spatial information, and therefore they are not really adapted to images.
As mentioned above, dimensionality reduction in hyperspectral images is usually considered as preprocessing step for supervised pixel classification as well as for other hyperspectral image tasks such as unmixing, target detection, etc.Hence, to incorporate spatial information into the classification goal is often considered by means of spatial/spectral dimensionality reduction.
The contribution of our approach can be summarized as follows.We propose to add structural/spatial information on the estimation of the covariance matrix used to carry out the PCA.This is done by means of morphological image representations, which involves a nonlinear embedding of the original hyperspectral image into a morphological image feature space.
Many previous works have considered how to introduce structural/spatial information into hyperspectral dimensionality image reduction.We can divide these techniques into different fields.The first family of techniques are close to our paradigm since they are based on mathematical morphology [4,5,6,7,8].Others approaches are founded on Markov random field image representation such as [9,10].Another family of techniques uses kernel methods, where kernels for spatial information and kernels for spectral information are combined together [11,12,13,14].Techniques based on tensor representation of hyperspectral images [15,16] have been successfully considered.Finally, wavelet representation and image extracted features have also used to add spatial information [17].
The rest of the paper is organized as follows.Section 2 provides a remind on the mathematical morphology multi-scale representation tools used in our approach.Mathematical morphology is a well known nonlinear image processing methodology based on the application of complete lattice theory to spatial structures.Section 3 introduces in detail our approach named morphological principal component analysis (MPCA).In order to justify our framework, a summary of the classical theory underlying the standard PCA is provided as well as the notion of Pearson image correlation.Then, the four variants of MPCA are discussed, including an analysis of their corresponding covariance matrix meaning.The application of MPCA to hyperspectral dimensionality image reduction is considered in Section 4. That involves an assessment of the different variants according to different criteria.For some of the criteria new techniques to evaluate the quality of a dimensionality reduction technique on image processing are introduced.Techniques arising from manifold learning are also considered.Finally, Section 5 closes the paper with the conclusions.
We note that this paper is an extended and improved version of the conference contribution [18].

Basics on morphological image representation
The goal of this section is to introduce a short background on morphological operators and transforms used for the sequel.Notation used in the rest of this paper is also stated.

Notation
Let E be a subset of the discrete space Z 2 , which represents the support space of a 2D image and F ⊆ R D be a set of pixels values in dimension D. Hence, it is assumed in our case that the value a pixel x ∈ E is represented by a vector v ∈ F of dimension D, where discrete space E has a size of n 1 ×n 2 pixels.This vector v represents the spectrum at position x.Additionally, we will write higher order tensors by a calligraphic upper-case letters (I, S, . ..).The order of tensor I ∈ R n 1 ×n 2 ×...×n J is J. Moreover if I ∈ R n 1 ×n 2 ×n 3 , for all i ∈ [1, n 3 ] I :,:,i represents a matrix of size n 1 × n 2 where the third component is equal to i.In our case we can also associate a tensor to the hyperspectral image F ∈ R n 1 ×n 2 ×D .

Nonlinear scale-spaces and morphological decomposition
Let f : E → Z be a grey-scale image.Area openings γ a s l (f ) (resp.area closings ϕ a s l (f )) are morphological filters that remove from the image f the bright (resp.dark) connected components having a surface area smaller than the parameter s l ∈ N [19]: where γ B (f ) and ϕ B (f ) represent respectively the morphological flat opening and closing according to structuring element B [20].We note that these connected filters can be implemented as binary filters on the stack decomposition of f into upper level sets.Figure 1 illustrates how area opening and area closing modify a simple image f .Figure 1: Illustration of an area opening γ a s l and an area closing ϕ a s l of image f , with s l = 7 pixels.
Area opening and area closing are quite good to simplify images, without deforming the contours of the objects.In addition, area opening and closing can be used to produce a multi-scale decomposition of an image.The notion of morphological decomposition is related to the granulometry axiomatic [20].Let us consider {γ a s l }, 1 ≤ l ≤ S and {ϕ a s l }, 1 ≤ l ≤ S, two indexed families of are openings and closings respectively.Typically, the index l is associated to scale, or more precisely to the surface area.Namely, we have on the one hand On the other hand, we can rewrite the decomposition [15]: Therefore we have an additive decomposition of the initial image f into S scales, together with the average larger area opening and closing.We remark that residue (γ a s l−1 (f ) − γ a s l (f )) represents bright details between levels s l and s l−1 .Similarly, (ϕ a s l (f ) − ϕ a s l−1 (f )) stands for dark details between levels s l and s l−1 .At this point, some issues must be taken into account.First, after decomposing an image into S scales, we have now to deal with an image representation of higher dimensionality.Second, the decomposition may not be optimal since it depends on the discretization of S scales, i.e., size of each scale.In order to illustrate that issue, we have represented in Figure 2(a) a channel of Pavia hyperspectral image and in Figure 2(b) its morphological decomposition by area openings that we have over-estimated.As it may be noticed from Figure 2(b), the choice of the scales is fundamental in order to avoid a redundant decomposition.
In order to deal with the problem of scale discretization, we propose to use the pattern spectrum that provides information about the image component size distribution.We can also notice another technique to find the optimal discretization [21].

Pattern Spectrum
The notion of pattern spectrum (PS) [22] corresponds to the probability density function (pdf) underlying a granulometric decomposition by morphological openings and closings [23,20].The area-based PS of f at size s l is given by where Mes represents here the integral of the image.Two images having the same pattern spectrum have the same morphological distribution according to the choice of the family of openings/closings.Since our goal is to have a non-redundant multi-scale representation with the same morphological representation than the original image, then by sampling the PS and choosing the scales of the distribution which keep it as similar as possible to the image PS, we can expect to find the appropriate discretization of scales.However, as you can see in Figure 3, the PS is not a smooth function, and consequently, sampling it with a limited number of scales would not lead to a good result.
Based on the analogy between the PS and probability density function, we can compute its corresponding cumulative pattern spectrum (CPS) for both sides l ≥ 0 and l ≤ 0. Naturally, this function is smoother than the PS.In order to select the appropriate scales, the CPS for openings and closings are sampled, where the number of samples is fixed and is equal to S, under the constrain that the sampled function must be as similar as possible to the original function.
An example of such sampling is given in Figure 3, where the approximation of the CPS is depicted in red and the CPS of the original image in blue.It is well known in probability that two distributions that have the same cumulative distribution function have the same probability distribution function.Based on this property, we can expect that the discretization from the CPS approximates the original PS of the image and consequently, the selected scales represent properly the size distribution of the image.

Gray-scale distance function
Let X be the closed set associated to a binary image.The distance function corresponding to set X gives at each point x ∈ X a positive number that depends on the position of x with respect to X and is given by [24]: where d(x, y) is the Euclidean distance between points x and y, and where X c is the complement of set X.This well known transform is very useful in image processing [24].Distance function of binary images can be extended to gray-scale images f by considering its representation into upper level sets {X h (f )} a≤h≤b , where such that a = min{f (x), x ∈ E}, and b = max{f (x), x ∈ E}.Then, the so-called the gray-scale distance transform of f is defined as [25]: That is, the gray-scale distance transform of f is equal to the sum of the distance functions from its upper level sets.3: On the left, the pattern spectrum (PS) by area openings of a gray-scale image using 100 scales.On the right, in blue, its corresponding cumulative pattern spectrum (CPS); in red, its approximation with S = 8 scales.

Morphological Principal Component Analysis (MPCA)
We introduce in this section the notion of MPCA and its variants.Before that, a mathematical background on PCA and covariance/correlation matrix will be provided in order to state the rationale behind MPCA.

Remind on classical PCA
Principal Component Analysis (PCA), also known as Karhunen-Loève transform, Hotelling transform, SVD transform, etc., is without any doubt the most useful technique for data dimensionality reduction.
Let us start with a set of vectors where n represents the number of vectors; in our case it corresponds to the number of image pixels, i.e., n = n 1 n 2 since n 1 and n 2 are the two spatial dimensions.The goal of the PCA is to reduce the dimension of this vector space thanks to a projection on the principal component space, namely with v ′ i ∈ R d , where d ≪ D. In our case, dataset F ∈ M n,D (R) represents the hyperspectral image F, where each column F k ∈ R n , 1 ≤ k ≤ D corresponds to a vectorized spectral band.PCA should find a projective space such that the mean squared distance between the original vectors and their projections is as small as possible.As we just show, this is equivalent to find the projection that maximizes the variance.
Let us call w j ∈ R D , where j are the principal components.The aim of PCA is to find the set of vectors {w j , 1 ≤ j ≤ D} such as arg min Developing now the distance we have: then, by adding the additional constraint that w j 2 = 1, replacing in (10), and keeping only terms that depend on w j , we obtain the following new objective function: if we consider that the dataset F has been column-centered, which means that n i=1 v i = 0, then var . Thus we can see that the PCA aims finding principal components that maximize the variance.The problem can be rewritten in a matrix way using the development: , is the covariance matrix of F .Hence the problem to be optimized is written as arg max Thanks to Lagrange multiplier theorem, we can rewrite the objective function (12) as: where λ ∈ R. Since one should maximize this function, we had to derive it and to make it equal to zero, i.e., ∂L ∂w j (w j , λ) = 2V w j − 2λw j = 0.
Finally, we obtain the solution: Thus, the principal component w j that satisfies the objective function is an eigenvector of the covariance matrix V , and the one maximizing L(w j , λ) is the one with the largest eigenvalue.Then we can have all the w j by simply computing the SVD of V .There are different approaches to choose the reduced dimension d, that is the number of the principal component to be kept.The underlying assumption is the following: if the intrinsic dimension of the data is d, then the remaining d − D eigenvalues, corresponding to the eigenvectors that are discarded, should be significantly small.This principle is expressed using Prop = d j=1 λ j / D j=1 λ j , which is equal to the proportion of the original variance kept.Typically, in all our examples we fix Prop = 0.9.

Covariance matrix and Pearson correlation matrix
The covariance between two channels (or spectral bands) of an hyperspectral image F is computed as where E(F) is the mean of the hyperspectral image.The covariance is very meaningful, however this is not a similarity measure [26], in the sense of a metric, since it is not range limited.In order to fulfill this requirement, a solution consists in normalizing the covariance, which leads to the notion of Pearson correlation: where . The correlation coefficient varies between +1 and −1, such that Corr F :,:,k , F :,:,k ′ = 1 involves that F :,:,k and F :,:,k ′ perfectly coincide.It has been proved that the best fitting case corresponds to [27]: Therefore, from (17), we can see that the correlation is a linear coefficient between F i,j,k and F i,j,k ′ .This means that Pearson correlation is a similarity criterion which depends on the intensities of the images and their linear relations.

MPCA and its variants
The fundamental idea of Morphological Principal Component Analysis (MPCA) consists in replacing the covariance matrix V of PCA, which represents the statistical interaction of spectral bands, by a covariance matrix V Morpho computed from a morphological representation of the bands.Therefore, mathematical morphology is fully integrated in the dimensionality reduction problem by standard SVD computation to solve

Scale-space Decomposition MPCA
In the first variant, we just use the area-based nonlinear scale-space discussed in the previous section.So the grey-scale image of each spectral band F :,:,k is decomposed into residues of area openings and area closings according to the discretization into S scales for each operator, i.e., r l (F :,:,k ) = γ a s l−1 (F :,:,k ) − γ a s l (F :,:,k ) and r −l (F :,:,k ) = ϕ a s l (F :,:,k )−ϕ a s l−1 (F :,:,k ), 1 ≤ l ≤ S. Thus we have increased the dimensionality of the initial dataset from a tensor (n 1 , n 2 , D) to a tensor (n 1 , n 2 , D, 2S + 1).As discussed in [15], this tensor can be reduced using high order-SVD techniques.We propose here to simply compute a covariance matrix as the sum of the covariance matrices from the various scales.More precisely, we introduce V Morpho-1 ∈ M D,D (R) with : where the covariance matrices at each scale l is obtained as We note that it involves an assumption of independence of the various scales.We remark also that this technique is different from the classical approaches of differential profiles as [5] where the morphological decomposition is applied after computing the spectral PCA (i.e., morphology plays a role for spatial/spectral classification but not for spatial/spectral dimensionality reduction as in our case).

Pattern Spectrum MPCA
In the second variant, we can consider a very compact representation of the morphological information associated to the area-based nonlinear scalespace of each spectral band.It simply involves considering the area-based PS of each spectral band as the variable to be used to find statistical redundancy on the data.In other words, the corresponding covariance matrix with 1 ≤ k, k ′ ≤ D and where P S a (F :,:,k , l), −S ≤ l ≤ S, is the area-based pattern spectrum obtained by area-openings and area-closings.We note that the pattern spectrum can be seen as a kind of pdf of image structures.Consequently the MPCA associated to it explores the intrinsic dimensionality of sets of distributions instead of sets of vectors.For illustrating the information carried out by the PS, we have provided in Figure 9 the pattern spectra computed from three different bands of a hyperspectral image.In order to better understand the interest of V Morpho-2 , we propose an analysis based on its Pearson correlation counterpart.Once the correlation of PS distribution is calculated, we have a linear coefficient between P S a (F :,:,k , l) and P S a (F :,:,k ′ , l).However since the PS is the result of nonlinear operations, the underlying extracted features are naturally nonlinear.
Let us consider the two binary images of Figure 7(a), which represents two objects having exactly the same size.If the correlations are calculated, we have: Hence, we can see that the morphological distribution being the same, the PS correlation is maximal.In a certain way, we observe that this transform builds size-invariants from the images and consequently it is robust to some group of transforms and deformations.For instance, it is invariant to rotation and to translation.
Classical PCA on the spectral bands and the MPCA based on the PS can be compared by the corresponding correlation matrices from a hyperspectral image, plotted in Figure 10

Distance Function MPCA
Classical PCA for hyperspectral images is based on exploring covariances between spectral intensities.The previous MPCA involves changing the covariance into a morphological scale-space representation of the images.An alternative is founded when transforming each spectral band from an intensity based map to a metric based map where at each pixel the value is associated to both the initial intensity and the spatial relationships between the image structures.This objective can be achieved using Molchanov gray-scale distance function for each spectral band dist(F :,:,k ).The new covariance matrix V Morpho-3 ∈ M D,D (R) is now defined as: with 1 ≤ k, k ′ ≤ D. Figure 9 depicts the corresponding gray-scale distance function from three spectral band of a hyperspectral image.We note the this function carries out simultaneously both intensity and shape information from the image.
where X h (F :,:,k ) denotes an upper level set at threshold h.The central term is the covariance between two binary distance functions and can be developed as follows: where < •, • > L 2 denotes the L 2 inner product.Using the classical relationship: we finally obtain that: From this latter expression, the term can be identified as a Baddeley distance [28] used in shape analysis.This distance is somehow equivalent to the most classical Hausdorff distance between the upper level sets h of spectral band k and h ′ of spectral band k ′ .Thus, the underlaying similarity from this covariance compares the shape of the spectral channels, and extracts a richer description than Pearson correlation from the spectral channels themselves.We note that the use of Hausdorff distance between upper level sets of hyperspectral bands was previously used in [29].
Finally, to illustrate qualitatively the behavior of the distance function correlation, let us consider this time the three binary images depicted in Figure 7(b), where image 2 and image 3 represent the same object placed at a different location on the image.One has: That is, this similarity criterion related to the use of distance function is more discriminative to the relative position of the objects on the image than the classical Pearson Correlation.
From Figure 10(c), one can compare now the correlation matrix using the gray-scale distance function.We note that this matrix provided also a better discrimination of cluster of bands than the Pearson correlation matrix used in standard PCA.

Spatial/Spectral MPCA
As we have discussed, V Morpho-2 represents a compact morphological representation of the image, however the spectral intensity information is also important for dimensionality reduction.To come with a last variant of MPCA, we build another covariance matrix V Morpho-4 that represents the spectral and spatial information without increasing the dimensionality by the sum of two covariance matrices: with β ∈ [0, 1], and where obviously V k,k ′ = Covar F :,:,k , F :,:,k ′ and β stands for a regularization term that balances the spatial over the spectral information.This kind of linear combination of covariance matrices is similar to the one used in the combination of kernels, where kernels providing different information sources are combined to have a new kernel which integrates the various contributions [12].4 MPCA applied to hyperspectral images 4.1 Criteria to evaluate PCA vs. MPCA We can now use PCA and the four variant of MPCA to achieve dimensionality reduction (DR) of hyperspectral images.In order to evaluate the interest for such a purpose, it is necessary to establish quantitative criteria that should be assessed.These criteria will evaluate both locally and globally the effectiveness of our dimension reduction.
Criterion 1 (C1) The reconstructed hyperspectral image F using the first d principal components should be a regularized version of F in order to be more spatially sparse.
Criterion 2 (C2) The reconstructed hyperspectral image F using the first d principal components should preserve local homogeneity and be coherent with the original hyperspectral image F.
Criterion 3 (C3) The manifold of variables (i.e., intrinsic geometry) from the reconstructed hyperspectral image F should be as similar as possible to the manifold from original hyperspectral image F.

Criterion 4 (C4)
The number of bands d of the reduced hyperspectral image should be reduced as much as possible.It means that a spectrally sparse image is obtained.
Criterion 5 (C5) The reconstructed hyperspectral image F using the first d principal components should preserve the global similarity with the original hyperspectral image F. Or in other words, it should be a good noise-free approximation.
Criterion 6 (C6) Separability of spectral classes should be improved in the dimensionality reduced space.That involves in particular a better pixel classification.
These criteria are used to analyse the effectiveness of the dimensionality reduction methods studying locally and globally their ability to remove redundancy and to preserve the fully richness of the spectral and spatial information.In order to assess C1, we compute the watershed transform [24] on each channel F k of the hyperspectral image.Watershed transform is a morphological image segmentation approach which in a certain way can be seen as an unsupervised classification technique.The advantage of using the watershed is that it allows us to cluster the image according to the local homogeneity; thus, an image with less details will have less spatial classes than an image with many insignificant details.Then, the number of clusters N k of F k is considered as an estimation of the image complexity.To evaluate the complexity of the reconstructed hyperspectral image, the number of spatial classes is counted after having done a watershed on each band.Finally, the mean of the number of spatial classes is taken, i.e., Error sparse spatially = (D −1 ) Assessment of C2, which involves image homogeneity, is based on a partition of the image into homogenous regions.Let us first remind the definition of a α-flat zones [30], used for such a purpose.Given a distance d : R D × R D −→ R, two pixels (f (x), f (y)) ∈ (R D ) 2 , from a vector-valued image f , belong to the same α-flat zone of f if and only if there is a path (p 0 , . . ., p n ) ∈ E n such as p 0 = x and p n = y and ∀i ∈ Computing the α-flat zones for a given value of α produces therefore a spatial partition of the image into classes such that in each connected class the image values are linked by paths of local bounded variation.Working on the d eigenvectors, the image partition π α associated to the α-flat zones quantize spatially and spectrally an hyper-spectral image, see example given in Figure 11.The goal of simultaneous spatial and spectral quantization of a hyperspectral image has been studied in [31], where we have studied in detail the dependency on the distance.Moreover, we have shown that in high dimensional spaces quantization results are generally not good.For the case considered here, we propose to use the Euclidean distance on the reduced space by PCA or MPCA.The choice of α is done in order to guarantee a number C of α-flat zones similar for all the compared approaches.We can expect that, by fixing the number of zones in the partition, the difference between a partition and another one depends exclusively on the homogeneity of the image.Now, using the partition π α , the spectral mean value of pixels from the original image F in each spatial zone is computed.This quantization produces a simplified hyperspectral image, denoted F πα .Finally, we assess how far pixels of the original image from each α-flat zone are from their mean; which involves computing the following error This criterion can consequently be seen as a way to see the trustworthiness of the DR technique, since it is measures if the homogeneous partition of the reduced hyperspectral corresponds to the homogeneous zone of the original image.
C3 has been evaluated by means of two manifold learning criteria called the K-intrusion and K-extrusion [32].They are based on other criteria called continuity and trustworthiness [33].These criteria reveal DR behavior in terms of its ability to preserve the data manifold structure.We have first sampled randomly 10 thousands spectra from our hyperspectral images, where each spectrum is a vector of dimension D. Then we have modelled the manifold by a graph where each node is a vector and each edge is the pairwise distance.We used the Euclidean distance as the pairwise distance.For the rest of the paragraph we note by x i a point from the original manifold, ν K i its neighbourhood of size K, xi the same point from the manifold after a DR and νK i its corresponding neighbourhood of size K.A neighbourhood of size K at point x i is composed of the K closest points to x i according to used metric.More precisely, the goal of K-extrusion is measures how the points that were in the K-neighbourhood of x i are preserved in the Kneighbourhood of xi after DR.The K-intrusion evaluates if points on the K-neighbourhood of xi on the DR manifold were in the K-neighbourhood of where r(i, j) is the rank of the data x j in the ordering according to the distance from x i , and respectively r(i, j) the rank of xj in the ordering according to the distance from xi , and the term G K scales the measure to be between zero and one, i.e., For a better understanding of these formulae, see [34].An important point is the dependence of these parameters on the size of the neighbourhood.From ( 22) and ( 23), the following parameters are computed [34]: The interest of Q(K) is that it estimates in average the quality of a DR technique, whereas B(K) reveals its behavior as being more intrusive or extrusive.
In order to assess C4, as classically done, the fraction of explained covariance is fixed.Then, the number of principal components needed is counted.The rationale is based on the fact that a good DR technique should reduce the number of dimensions and extract a limited number of features that would explain most of the image image.However since this criterion is linked to a sparsity criterion, we would like to add a distortion criterion, C5.
The evaluation of C5 is founded on computing a pattern spectrum of both the original hyperspectral image and the DR image.An important point is that the pattern spectrum will be computed by openings on the hyperspectral image viewed as a 3D image.By doing such assumption, the 3D openings are decomposing in a simultaneous way the spatial/spectral object of the image and the corresponding curves of the PS will represent the distribution of both the spatial and the spectral objects.Two hyperspectral images are similar if they have the same spectral/spatial size distribution.As discussed in Section 2, we prefer to use the cumulative PS in order to obtain a smoother curve.Normally we cannot deal with both spatial/spectral distortions with the reconstruction error of the two images.However we will also assess the SNR of the reconstruction error as an additional parameter.
Finally, C6 is related to supervised pixel classification of the hyperspectral image.We have considered the least square SVM algorithm [35] as a learning technique, with a linear kernel or RBF kernel, where the RBF kernel is initialized for each DR technique using cross validation.For each supervised classification run, we used for the AVIRIS Indian Pine Image 5% of the available data as a training set and the remaining 95% to validate.For the ROSIS Pavia University image we use a subset of 50 spectra (about 1% of the available data) per class as a training set and the remaining spectra to validate.

Evaluation of algorithms
The studied DR techniques presented are listed and compared upon three mathematical and computing properties in Tab. 1.These properties were also considered in the excellent comparative review [3].For comparison, we have also included in the table the Kernel-PCA (KPCA), which is a powerful generalization of PCA allowing integrating morphological and spatial features into DR.
The first one is the number of free parameters to be chosen.The interest of having these free parameters is that it provides more flexibility to the techniques, whereas the related inconvenient is the difficulty for properly tuning the right parameter.We notice that KPCA provides good flexibility thanks to the choice of any possible kernel which fits the data geometry.The most simple algorithms are the PCA, and the distance function MPCA.Then, we have the scale-space decomposition MPCA, and finally the pattern spectrum MPCA.The second issue analyzed is the computational complexity, and the third one is the memory requirements.From a computational viewpoint, the most demanding step in the PCA is the SVD, which can be done in O(D 3 ).PCA is the technique with the smallest computational need.On the contrary, the computational requirement of KPCA is O(n 3 ); since n ≫ D, this kind of algorithm seems infeasible in standard hyperspectral images.That is reason why most of hyperspectral KPCA techniques use tricks to be able to deal with the number of spectra [13,36,37].All these techniques lead to a spatial distortion, which is not avoidable by the need of a sampling procedure aiming at reducing the number of spectra.
Between the complexity of PCA and KPCA, we have our proposed MPCA algorithms.Regarding MPCA, the computationally demanding step is the computation of the morphological representation used in the corresponding covariance matrix.The complexity estimation has been carried out each time in the worse case, however efficient efficient morphological algorithms can improve this part.Distance function MPCA is last demanding, then the scale-space decomposition MPCA and finally the pattern spectrum MPCA.Regarding memory needs, for the PCA, the pattern spectrum MPCA, and the distance function MPCA, the steps requiring more memory is the storage of the covariance matrix, just of O(D 2 ).The Spatial/Spectral MPCA needs to store 2 covariance matrices then its memory need is O(2D 2 ); similarly the scale-space MPCA needs to store 2S + 1 covariance matrices, then its required memory is O((2S + 1)D 2 ).Note that KPCA uses a Gram matrix of size (n × n).

Technique
Parameter Computational Memory (1) (2) Table 1: Comparison of the properties of dimensionality reduction algorithms for hyperspectral images.

Evaluation on hyperspectral images
The We have applied classical PCA and the different variants of MPCA to Pavia hyperspectral image.Figure 12 shows the first three eigenimages, visualized as a RGB false color.We note that the pattern spectrum MPCA requires d = 5 to represent 92% of the variance whereas the other approaches only impose d = 3.An interesting aspect observed on the projection of the 103 spectral channels of Pavia hyperspectral image into the first two eigenvectors is how PCA and the scale-space decomposition MPCA cluster the bands linearly, see Figures 13(a) and 13(b).Bands close in the projection are also near in the spectral domain, whereas the patter spectrum MPCA, 13(c), and distance function MPCA, 13(d), tend to cluster spectral bands which are not necessary spectrally contiguous.This can be explained thanks to Figure 10, where the MPCA correlation matrices are different from the classical PCA one.
It can be noticed that in classical manifold learning techniques, the goal is to decrease the dimension of the data while keeping some properties on the data manifold.We work here on the manifold of the channels.This manifold is easier to use, but finding the good β in V Morpho-4 β that would maintain some property of the manifold is not always easy, since we had to deal with a double optimization problem, i.e., β and d.
From a quantitative viewpoint, one can see in Table 2 that globally MPCA produces a more homogenous regularization of the image than classical PCA, especially the distance function MPCA and Spatial/Spectral MPCA with an appropriate β = 0.2, which gives the lowest values of Error Homg .We noted that Error sparse spatially follows a different ranking.A good method is the one with a good trade-off between both criteria, since one wants a DR to be trustworthy, which is evaluated thanks to Error Homg .But if the signal is too noisy, one may prefer a sparser representation.According to these criteria, the distance function MPCA and the pattern spectrum MPCA seem to have the best result.We can also note that if we use manifold learning parameters for criterion C3, see Figure 14, the pattern spectrum MPCA has the best results.
With respect to criterion C5, we have computed the 3D pattern spectrum distribution of Pavia hyperspectral image and of the different reduced images into d components, see Figure 15.From this result, we can see that both PCA and scale-space decomposition MPCA follow very well the hyperspectral image, since their spatial and spectral cumulative distributions are similar.However if one would like to denoise the hyperspectral image thanks to a DR technique, these results are not always positive.If we compare the    spatial and spectral cumulative distribution of the distance function MPCA and the one of the hyperspectral image, we notice that for small 3D size (i.e., small spatial/spectral variations) that can be considered as noise, there are differences between these two distributions.But when the size increases, the distribution of the the distance function MPCA tends to the hyperspectral image one.So it seems that the distance function MPCA simplifies the spectral/spatial noise, view as the small 3D objects, but keeps the objects of interest.
Finally, Table 3 summarizes the results of supervised classification of respectively Pavia and Indian Pine hyperspectral images.We note that results for Pavia image are quite similar in all the case even if MPCA seems to be better than PCA.Therefore, we have chosen to focus on the Indian Pine image, which is more challenging for supervised classification benchmark, see also Figures 16 and 17.We note that MPCA improves the results, especially the scale-space decomposition MPCA.To evaluate the classification results, first we fixed the dimension d of the reduce image.We have

Conclusions
We have introduced in the paper the notion of MPCA, which is available in four different approaches.We have also conducted a study to evaluate the performance of each technique.On this study we used classical and new criteria.Hence we proposed new criteria based on mathematical morphology image representation.MPCA allows to deal with a spatial/spectral representation of the image based on mathematical morphology tools for SVD-based dimensionality reduction.It is important to note that it involves only to change the covariance matrix used in order to add spatial information in this matrix.Then we have just done ah SVD to obtain the eigenvectors where the spectral bands are projected.
We have shown that the best results are obtained when we combine spatial and spectral information.
Let us point out that we did not compare our MPCA techniques to KPCA since, as explained above, the size of the covariance (gram) matrix used in KPCA is n × n, where n is the number of pixels and it is impossible to manipulate such a matrix using our test images.In the state-of-the-art, some works randomly select some pixels and consequently the results will depend on the selected pixels or sampling technique.

Figure
Figure 3: On the left, the pattern spectrum (PS) by area openings of a gray-scale image using 100 scales.On the right, in blue, its corresponding cumulative pattern spectrum (CPS); in red, its approximation with S = 8 scales.
Figure 7: (a) Example of pair of binary images for pattern spectrum correlation discussion.(b) Example of triplet of binary images for distance function correlation discussion.

Figure 11 :
Figure 11: (a) A 3-variate image (first three eigenimages after PCA on Pavia hyperspectral image) and (b) its corresponding α-flat zone partition into 84931 spatial classes using the Euclidean distance.
assessment of the performance of PCA and MPCA has been carried out on three hyperspectral images.The first image was acquired over the city of Pavia (Italy) and it represents the university campus.The dimensions of the image are 610 × 340 pixels, with D = 103 spectral bands and its geometrical resolution is of 1.3 m.We also used a second hyperspectral image which represents the University of Houston campus and the neighbouring urban area at the spatial resolution of 2.5 m and which dimensions are 349 × 1905 pixels and D = 144 spectral bands [38].The third image, acquired over the region of the Indian Pines test site in North-western Indiana, is composed for two-thirds of agriculture, and one-third of forest.The dimensions of this image are 145 × 145 pixels, D = 224 spectral bands and its geometrical resolution is of 3.7 m.
(a) for Pavia hyperspectral image, (b) for Houston hyperspectral image, (d) for Indian Pines hyperspectral image.The values have been normalized to the worst case, which gives 100.

Figure 15 :
Figure 15: (a) 3D pattern spectrum distribution of Pavia hyperspectral image and of the different reduced images into d components.(b) Corresponding 3D cumulative pattern spectrum distributions.

Figure 16 :
Figure 16: Results of supervised classification using least square SVM with a linear kernel on Indian Pines hyperspectral image.Note the OA is the overall accuracy.

Figure 17 :
Figure 17: Results of supervised classification using least square SVM with a RBF kernel on Indian Pines hyperspectral image.Note the OA is the overall accuracy.

8 Figure 18 :
Figure 18: Results of kappa statistic for the least square SVM with a RBF kernel and different number of dimensions on Indian Pines hyperspectral image.

Table 3 :
Comparison of hyperspectral supervised classification on PCA and MPCA spaces using least square SVM algorithm and different kernels: