Multiscale Adjacent Superpixel-Based Extended Multi-Attribute Proﬁles Embedded Multiple Kernel Learning Method for Hyperspectral Classiﬁcation

: In this paper, superpixel features and extended multi-attribute proﬁles (EMAPs) are embedded in a multiple kernel learning framework to simultaneously exploit the local and multiscale information in both spatial and spectral dimensions for hyperspectral image (HSI) classiﬁcation. First, the original HSI is reduced to three principal components in the spectral domain using principal component analysis (PCA). Then, a fast and efﬁcient segmentation algorithm named simple linear iterative clustering is utilized to segment the principal components into a certain number of superpixels. By setting different numbers of superpixels, a set of multiscale homogenous regional features is extracted. Based on those extracted superpixels and their ﬁrst-order adjacent superpixels, EMAPs with multimodal features are extracted and embedded into the multiple kernel framework to generate different spatial and spectral kernels. Finally, a PCA-based kernel learning algorithm is used to learn an optimal kernel that contains multiscale and multimodal information. The experimental results on two well-known datasets validate the effectiveness and efﬁciency of the proposed method compared with several state-of-the-art HSI classiﬁers.


Introduction
At present, hyperspectral images (HSIs) are attracting increasing attention. With the fast iteration of hyperspectral sensors, researchers can easily collect a large amount of HSI data having high spatial resolution and multiple bands that form high-dimensional features, such as complex and fine geometrical structures [1,2]. These characteristics encourage the wide use of HSIs for various thematic applications, such as military object detection, precision agriculture [3], biomedical technology, and geological and terrain exploration [4,5]. As one of the basic methods for the above applications, HSI classification plays an important role and has made certain developments in the past few decades [6].
Many classic machine learning methods can be directly applied to the classification of HSIs, such as naive Bayes, decision trees, K-nearest neighbor (KNN), wavelet analysis, support vector machines (SVMs), random forest (RF), regression trees, ensemble advancement, and linear regression [7][8][9]. However, these methods either treat the HSI as a combination of several hundreds of gray images and extract the corresponding features for classification or use only spectral features for classification, thus producing unsatisfactory results [6].
Recently, classification methods for HSIs based on sparse representation have attracted the attention of researchers [10]. Many well-performing sparse representation classification (SRC) methods have been developed. SRC assumes that each spectrum can be sparsely represented by spectra belonging to the same class and then obtains a good approximation of the original data through a corresponding algorithm [11,12]. Therefore, more and more scholars have used the sparse representation technique to conduct HSI classification. Due to the irrationality of the traditional joint KNN algorithm for setting the same weights in the same region, Tu et al. proposed a weighted joint nearest neighbor sparse representation algorithm [13]. Later, a self-paced joint sparse representation (SPJSR) model was proposed. The least-squares loss used in the classical joint sparse representation model was changed to a weighted least-squares loss, and a self-paced learning strategy was employed to automatically determine the weights of the adjacent pixels [14]. Because different scales of a region in an HSI contain complementary and correlated knowledge, Fang et al. proposed an adaptive sparse representation method using a multiscale strategy [15]. To make full utilization of the spatial correlation of HSI and improve the classification accuracy, in [16], Dundar et al. proposed a multiscale superpixel and guided filter-based spatial-spectral HSI classification method. However, when the sample size is small, SRC will ignore the cooperative representation of other categories of samples, resulting in an incomplete dictionary for each category of samples, which produces a larger residual in the results. To alleviate this problem, the cooperative representation method (CRC) was presented by researchers. Compared to SRC, the L 2 norm used by the CRC not only has discriminability like that of SRC, but also has lower computational complexity. Using the collaborative representation, Jia et al. proposed a multiscale superpixel-based classification method [17]. To further improve the accuracy of HSI classification, Yang et al. proposed a joint collaborative representation method using a multiscale strategy and a locally adaptive dictionary [18]. Considering the correlation between different classes of HSIs, with the assistance of Tikhonov regularization, a discriminative kernel collaborative representation method was proposed by Ma et al. [19] that utilized nuclear collaborative representation for HSI classification. More recently, low rank representation (LRR) has been studied by scholars in the field of HSI classification. To fully exploit the local geometric structure in the data, Wang et al. proposed a novel LRR model with regularized locality and structure [20]. Meanwhile, a new self-supervised low-rank representation algorithm was proposed by Wang et al. for further improvement of HSI classification [21]. Moreover, Ding et al. proposed a sparse low-rank representation method that relied on key connectivity for HSI classification [22]. This study combined low-rank representation and sparse representation while retaining the connectivity of key representations within classes. To decrease the impact of spectral variation on subsequent spectral analyses, Mei et al. conducted anti-coagulation research within coherent spatial and spectral low-rank representation, which effectively suppressed the class spectral variations [23].
As one of the hottest feature extraction techniques, deep learning has made excellent progress in computer vision and image processing applications [24][25][26][27]. Currently, deep learning is also becoming more popular in the field of HSI classification [28]. HSI data consist of multi-dimensional spectral cubes that contain much useful information, so the intrinsic features of the image are easily extracted by deep learning techniques [29]. For instance, the currently popular convolutional neural network (CNN) model has produced excellent HSI classification results [30,31]. In addition, some improved HSI classification methods have been proposed based on deep learning. For instance, by combining active learning and deep neural networks, Liu et al. proposed an effective framework for HSI classification, where the deep belief network (DBN) was employed to extract the deep features hidden in the spectral dimension [32]. Then, a learning classifier was used to refine those training samples to boost their quality. Zhong et al. further enhanced the DBN model and proposed a diversified DBN model [33]. In their design, the pre-training and tuning procedures of the DBN were regularized to achieve better classification accuracy. Moreover, 1D CNN [34,35] and 1D generative antagonistic networks [36,37] were also employed to describe the spectral features of the HSI. However, deep learning methods also have some problems; for example, they usually need many training samples, and the extracted features are not always interpretable. Therefore, the trend of subsequent research is to combine traditional feature extraction methods with deep learning methods to obtain more accurate classification results.
Another, more powerful classification technique for the HSI is the kernel method, especially the composite kernel (CK) method. In the actual classification process, samples in the original space are often not linearly separable [38]. Therefore, to solve the problem of linear inseparability, the kernel method is used to map the samples to a higher dimensional feature space so that the samples are linearly separable [39]. Certainly, the performance of the kernel method depends largely on the kernel selection. For example, the Gaussian kernel and the polynomial kernel are common kernels, and they are often not flexible enough to reflect the comprehensive features of the data [40]. Moreover, with increasing requirements of classification accuracy, a single kernel with a specific function cannot deliver a satisfactory result [41]. To solve this problem, the CK was proposed. It combines two or more different features, such as the global and local kernel, or the local kernel and spectral kernel, into a kernel composition framework for HSI classification [42]. Sun et al. proposed a CK classification method using spatial-spectral information and abundance information in the HSI [43]. For intrinsic image decomposition of the HSI, Jin et al. put forward a new optimization algorithm, and the CK learning method was then utilized to combine reflectance with the shading component [44]. Furthermore, Chen et al. proposed a spatial-spectral composite feature broad learning system classification method [45]. This method inherits the advantages of a broad learning system and is well-suited to multi-class tasks. As the most widely-used classier, SVM can also bring excellent classification accuracy to the HSI [46]. Huang et al. proposed an SVM-based method for HSI classification [47]. In this work, weighted mean reconstruction and CKs were combined to explore the spatial-spectral information in the HSI. With the continuous development of superpixel segmentation technology, Duan et al. further improved edge-preserving features by considering the inter-and intra-spectral properties of superpixels and formed one CK for the spectral and edge-preserving features [48]. Because the HSI contains many spectral bands, mapping the high-dimensional data to achieve improved classification speed has been of great concern in recent years. To address this problem, Tajiri et al. proposed a fast patch-free global learning kernel method based on a CK method [49]. Compared with the original single-kernel method, the CK function has the following obvious advantages: (1) it maps the data into a complex nonlinear space to extract more useful information and make the data separable; (2) it provides the flexibility to include multiple and multimodal features.
Different from a CK in which only one kernel function is constructed to contain both spatial and spectral information, the spatial-spectral kernel (SSK) constructs two clusters in kernel space, thus capturing the hidden manifold in the HSI [50]. For example, the spatial-spectral weighted kernel embedded manifold distribution alignment method constructs a complex kernel with different weights for the spatial kernel and the spectral kernel [51]. The spatial-spectral multiple-kernel learning method utilizes extended morphological profiles (EMPs) as spatial features and the original spectra as spectral features. In this way, multiscale spatial and spectral kernel methods are formed [52]. In addition, the joint classification methods for HSIs based on spatial-spectral kernels and multi-feature fusion are especially suitable for a limited number of training samples [53]. Generally, the CK method and the SSK method adopt square windows or superpixel technology to extract spatial information. However, both methods may misclassify the pixels at the class boundary. To alleviate this problem, several methods for selecting adaptive neighborhood pixels to construct the spatial-spectral kernel have been proposed, which further improved classification performance [54,55]. For those CK or SSK methods, determining the weights of the base kernels is another difficult and urgent challenge. Therefore, many scholars have proposed multiple kernel learning methods, where the core idea is to obtain a linear optimal combination of those base kernels using an optimization algorithm [56][57][58].
In this study, following the line of the multiple kernel learning framework, we propose a novel multiscale, adjacent superpixel-based embedded multiple kernel learning method with the extended multi-attribute profile (MASEMAP-MKL) for HSI classification. The proposed method makes full use of superpixels and the EMAP to exploit multiscale and multimodal spatial and spectral features for the generation of multiple kernels. For the spatial information, both the superpixel and its first-order neighborhood superpixels are utilized to extract geometric features at different scales and combine the EMAP features to construct different base kernels. Finally, a principal component analysis (PCA)-based multiple kernel learning method is employed to determine the optimal weights of the base kernels. The main contributions of the proposed MASEMAP-MKL method are summarized as follows. • The superpixel segmentation is used to extract geometric structure information in the HSI, and multiscale spatial information is simultaneously extracted according to the number of superpixels. In addition, the spectral feature of each pixel is replaced by the average of all the spectra in its superpixel, which is used to construct a superpixelbased mean spectral kernel. • The EMAP features, together with the multiscale superpixels and the adjacent superpixels obtained above, are used to construct the superpixel morphological kernel and the adjacent superpixel morphological kernel. At this stage, multiscale features and multimodal features are fused together to construct three different kernels for classification. • The multiple kernel learning technique is used to obtain the optimal kernel for HSI classification, which is a linear combination of all the above kernels. • An experimental evaluation with two well-known datasets illustrates the computational efficiency and quantitative superiority of the proposed MASEMAP-MKL method in terms of all classification accuracies.

Kernelized Support Vector Machine
As a two class classification model, the basic principle of SVM is to classify data by solving the convex quadratic programming problem in the feature space. The kernelized SVM introduces a kernel function based on the SVM, which simplifies the calculation of the complicated vector inner product in the original space and directly calculates the inner product in the feature space. Specifically, given a set of labeled samples in HSIs, i.e., {(x 1 , y 1 ), (x 2 , y 2 ), ..., (x N , y N )}, where x i ∈ R L×1 is the i-th labeled spectrum and L is the number of bands, y i ∈ {−1, 1} for i ∈ {1, 2, ..., N}, and N is the number of all labeled samples in the scene. Therefore, the classification function of the kernelized SVM is formulated as: where α denotes the Lagrangian duality parameter, K(x i , x) denotes a kernel function, and b denotes the bias parameter. Then, we obtain the objective function as: where β is a parameter that controls the weight between two items in the objective function (e.g., to find the hyperplane with the largest margin and to guarantee the smallest deviation of the data points). Usually, the value of β is determined manually and in advance. Under the above two constraint conditions, the most suitable values of α and l are obtained to get an optimal classifier. Our commonly used kernel functions are as follows. The first one is the Gaussian kernel function: This kernel function maps the original space to an infinite-dimensional space. We can control the mapping dimension flexibly by adjusting the value of parameter σ.
The second kernel function is the polynomial kernel function: where C denotes an offset parameter and d is an integer. We can change the dimension of the mapping by setting the value of d. When the value of d is one, the kernel function degrades to a linear kernel function. The linear kernel function is actually an SVM in the case of linear separability, which can only process linear data. In this situation, the classifier degrades to the most primitive SVM.

Superpixel Segmentation
A superpixel is a sub-region of the image that is local, consistent, and able to maintain certain local structural characteristics of the image. Superpixel segmentation is the process of aggregating pixels into a superpixel. Compared with a pixel, the basic unit of traditional processing methods, a superpixel is not only more conducive to the extraction of local features and the expression of structural information, but can also greatly reduce the computational complexity of subsequent processing. Achanta and others proposed simple linear iterative clustering (SLIC), which is based on the relationship between color similarity and spatial distance [59]. Firstly, J initial clustering centers are uniformly initialized in the image, and all pixels are labeled with the nearest cluster center. Therefore, the normalized distance based on color and spatial location features is: In the formula, vector c represents the 3D color feature vector in the CIELAB color space. The vector s represents the two-dimensional spatial position coordinates. The subscript j = 1, 2, · · · , J is the label of the cluster center. The subscript i is the pixel label in the 2 s × 2 s neighborhood corresponding to the cluster center j, and s = √ N/J, where N is the total number of image pixels. n c and n s are normalization constants for the color and space distance, respectively. After the initial clustering, the clustering center ϕ j is updated iteratively according to the mean values of all the pixels' color and spatial features in the corresponding clustering HSI block G j .
where n j is the number of pixels in the image block G j . This formula iteratively clusters and updates until the termination conditions are met. Finally, a neighbor merging strategy is used to eliminate the isolated small size superpixels, which ensures the compactness of the results.

EMAP
Based on mathematical morphology, Mauro et al. [60] proposed a method using the EMAP and independent component analysis. Compared with PCA, this method is more suitable for modeling the different information sources in the scene, and the classification accuracy obtained is higher. Later, Stijn et al. [61] proposed computing extended attribute profiles (EAPs) on features derived from supervised feature extraction methods. Song et al. [62] proposed a new image data classification strategy-decision fusion method that combines a complete classifier with extended multi-attribute morphological profiles to optimize the classification results.
The attribute profile (AP) is an extension of the morphological profile (MP), which is obtained by processing a scalar grayscale image f (such as each band image of an HSI or one principal component (PC) of an HSI), according to a criterion t with morphological attribute thickening (φ t ) and n attribute thinning (γ t ) operators: Analogous to the definition of the extended MPs (EMPs), EAPs are generated by concatenating many APs. Each AP is computed on one of the q PCs extracted by the PCA of an HSI: An EMAP is composed of m different EAPs based on different attributes (a 1 , a 2 , ..., a m ): where EAP a i = EAP a i / PC 1 , PC 2 , ..., PC q means calculating the EAPon each PC of the HSI with attribute a i . The value of q is usually set to be smaller than three. Three attributes, i.e., area, inertia, and the standard deviation, are used in the proposed method to extract different spatial information from the HSI.

CK
The CK usually takes the spectral mean or the variance of the neighborhood pixels as the spatial spectral characteristics and then forms the CK through the following core combination methods.
(1) Stacked characteristic kernel: In this design, both the spectral and spatial features are directly stacked together as sample features.
(2) Direct addition kernel: The spatial feature after nonlinear mapping is juxtaposed with the spectral feature as the feature of a high-dimensional space.
(3) Weighted summation kernel: By assigning different weights to the spatial and spectral features, µ is the weight parameter of the balanced spatial and spectral kernel. The weighted summation kernel can be constructed as follows:

The Proposed MASEMAP-MKL Method
In this section, we propose a novel multiscale adjacent superpixel-based EMAP embedded multiple kernel learning method named MASEMAP-MKL for HSI classification.
The flowchart of the proposed MASEMAP-MKL is shown in Figure 1. Two steps are performed as follows.

Adjacent Superpixel-Based EMAP Generation
The first step is the generation of the adjacent superpixel-based EMAP. Three different image features, i.e., the mean superpixel spectral feature, the superpixel morphological feature, and the adjacent superpixel morphological feature, are given. (1) Superpixel-based mean spectral feature: After superpixel segmentation, the PC images are divided as sp = {sp 1 , sp 2 , · · · sp J }, where J is the number of superpixels. We employ the local mean operator to obtain the superpixel-based mean spectral feature of a given spectrum in sp i , which is formulated as: where n i is the number of pixels in the i-th superpixel. Therefore, after implementing the local mean operator for all superpixels, the obtained average features constitute the superpixel-based mean spectral feature of the HSI. (2) Superpixel-based morphological feature: Enforcing the EMAP feature extraction operation on the segmented multiscale superpixel produces the superpixel morphological feature: where x EMAP i represents the EMAP feature vector. The superpixel morphological feature inherits the advantages of superpixel segmentation and morphological features at the same time and achieves a more thorough description of spatial information during classification. (3) Adjacent superpixel-based morphological feature: Based on the superpixel morphological feature, we further empower the adjacent superpixel strategy to obtain adjacent superpixel morphological features; thus, the fusion of multiscale feature and multimodal features is realized for classification. The adjacent superpixel set is defined as: where r is the number of adjacent superpixels with respect to the central superpixel x sp i i . Because the mean pixels are the representative feature of the superpixel, after calculating the mean pixels of the superpixels in x asp i , the weighted adjacent superpixel morphological feature can be obtained by calculating the weighted mean pixel of the central adjacent superpixel: where ω i,j is the weight of adjacent superpixel x sp j i with respect to the central superpixel x sp i i , which is obtained by: Unlike most studies that use the Euclidean distance to measure the distance between pixels, we use the spectral angle distance (SAD) to further explorer the spectral correlation of the HSI, which is denoted as: The above three different features achieve a more comprehensive description of the local and multiscale spatial-spectral information, and the richer features bring higher classification accuracy.

Multiscale Kernel Generation
For each pixel, we can obtain the three feature kernels according to the above Equations (13), (14), and (16), that is the superpixel-based mean spectral feature kernel K sp−mean , the superpixel-based mean EMAP feature kernel K sp−EMAP , and the adjacent superpixel-based weighted EMAP feature kernel K ASP−EMAP . Then, by setting different number of superpixels, we can obtain the corresponding kernels at each scale. That is,

{K
Here, m1, m2, m3 are three parameters related to the scales for the superpixels of the corresponding features. For simplicity, we let m = m1 = m2 = m3 represent the multiscale for the superpixels of all three features.
So far, we have the kernels of different scales corresponding to the three features. To fuse all the above kernels, we use a linear combination to obtain an optimal kernel for HSI classification.
where w 1 i , w 2 i , and w 3 i are the weights that control the ratio of the three kinds of kernels.

Multiple Kernel Learning Based on PCA
From Equation (22), we know that the optimal kernel is completely determined by the weights of the base kernels. Because K MASPEMAP is the linear combination of the base kernels, we can employ the multiple kernel learning method based on PCA to solve it, and we can directly use the first principal component of the PCA of the matrix composed of all base kernels as the optimal kernel, that is, ]) (23) where PCA 1 denotes the first component of the PCA. Therefore, the details of calculating the weights based on the PCA technique can be summarized as follows. (1) Construct the three kinds of kernel matrices at scale i using the training samples, i.e., {x 1 , x 2 , ..., x Nt } where Nt is the number of all training samples. Therefore, the kernel matrix K i for the three kinds of features can be calculated by the following formulations.  (24) where vec(·) is the vectorized operator, which converts a matrix into a vector by column.
(3) Calculate the singular value decomposition of the covariance matrix of the matrix, that is 1 3m D T D ∈ R 3m×3m , and we have the following formula to calculate the weights in Equation (22).
where u i1 , i = 1, ..., 3m is the eigenvector corresponding to the largest eigenvalue of the covariance matrix of D. After obtaining the multiscale adjacent superpixel-based optimal kernel, we can directly employ SVM for HSI classification. The entire process of the proposed MASEMAP-MKL method is summarized in Algorithm 1. Integrate the above multiscale kernels to obtain the optimal kernel K MASEMAP via (22) or (23); 8: Output: Generate the color map of classification by SVM.

Results
In this section, two well-known HSI datasets, i.e., Indian Pines and University of Pavia, are utilized to validate the effectiveness of the proposed method. Both datasets were normalized to the range of [0, 1] before the experiment. All experiments were run on MATLAB 2018b and an Intel Core i7-9700 CPU with 32GB RAM.

Indian Pines
The Indian Pines (IP) dataset was acquired by the AVIRIS sensor and consists of 16 different types of land cover. It contains 220 continuous bands that range from 0.4 to 2.4 µm. The spatial resolution is 145 × 145 pixels at 20 m/pixel. Several bands were highly corrupted by mixed noise; thus, we chose a subset of 145 × 145 × 200 for the experiments. In the experiment, two-point-seven percent of the samples per class were randomly selected for training, and the rest were kept for testing. The details of the IP dataset are listed in Table 1.

University of Pavia
The University of Pavia (UP) dataset was acquired by the ROSIS sensor and consists of nine different types of land cover. It contains 115 continuous bands in the range of 0.43 to 0.86 µm. The spatial resolution is 610 × 340 pixels at 1.3 m/pixel. Several bands were highly corrupted by mixed noise; thus, we chose a subset of 610 × 340 × 103 for the experiments.

Comparison Methods and Evaluation Indexes
To verify the classification performance of the proposed MASEMAP-MKL, multiple kernel-based spatial-spectral classification methods were employed for comparison, i.e., the SVM and CK method (SVM-CK) [63], the superpixel-based multiple kernel (SpMK) [64] method, the adaptive nonlocal SSK (ANSSK) [54], the EMAP-based method [60], a novel invariant attribute profile (AIP) method [65], the region-based multiple kernel (RMK) [66] method, the adjacent superpixel-based multiscale SSK (ASMSSK) [52] method, and the lowrank component-induced SSK (LRCISSSK) [55] method. As a competitor, the SVM classifier only considers spectral information; the SVMCK, SpATV, SSK, and ANSSK methods take advantage of spatial-spectral information for HSI classification and deliver much smoother classification results, and the SCMK and RMK methods combine spectral and multiscale spatial information for HSI classification to achieve outstanding classification results. In addition, the overall accuracy (OA), average accuracy (AA), Kappa coefficient, and corresponding standard deviations (std) were employed to quantitatively assess the classification accuracy.

Classification Results
The classification results of all methods on the IP dataset are shown in Figure 2. Clearly, with the limited training samples, the proposed MASEMAP-MKL maintains the best object boundaries and detailed information of the image edge. The insufficient description of spatial information results in the methods based on SVM and CK giving the worst classification map. The SpMK method and the non-local similarity-based method SSK achieve better preservation of object boundaries only when they have enough training samples. Serious over-smoothing occurs in the classification map given by the EMAP, and the AIP method fails to accurately classify the marginal part of the categories. With the limited number of training samples, their ability to maintain better image details were inferior to that of the MASEMAP-MKL. However, a comparison with the ground-truth reveals that the proposed MASEMAP-MKL gives a classification map much closer to the ground-truth. Using adjacent superpixel segments and low-rank induced components, AMSSK and LRCISSSK further improve the ability to maintain image edges.
Compared with the latest methods, one certain conclusion is that the proposed MASEMAP-MKL achieved state-of-the-art classification results on the IP dataset, whose ground-truth distribution is relatively simple. Similar conclusions can be drawn from the classification results on the UP dataset, which has a more complex objective distribution. The classification maps are shown in Figure 3. The proposed method achieves the most accurate classification of the pixels located at the object boundaries, with the finest image texture. Another clear conclusion is that the advantages of the proposed method MASEMAP-MKL are further highlighted for more intricate images. The ability of our design to fully utilize spatial-spectral information is proven.

Classification Accuracy
The classification accuracy values of all methods on the IP and UP datasets are shown in Tables 2 and 3, respectively. The optimal values are highlighted in bold. Ten Monte Carlo runs were executed to obtain the average value. In almost all categories, the proposed MASEMAP-MKL achieved the highest accuracy. On the IP dataset, several categories achieved 100% classification accuracy. Meanwhile, the optimal OA, AA, and Kappa values were also obtained. On the UP dataset, which contained more complicated ground objectives, many competitors failed to maintain the balance between smoothness and fineness: the details of the image were lost, thus leading to inferior classification accuracy. The proposed method still achieved the optimal OA, AA, and Kappa values, which indicates that MASEMAP-MKL is more competitive on high spatial resolution images. Furthermore, the proposed MASEMAP-MKL also achieved the most stable std value for both OA and AA indicators. The results also show that the SSK-based method achieved higher classification accuracy than other competitors. The multiscale features-based method ASMSSK and the low-rank property-based method LRCISSSK achieved average second-best results compared with the proposed MASEMAP-MKL classifier. Thus, the advantages of fusing the multiscale features and multimodal features are reflected. In summary, the results of the above quantitative analysis demonstrate the effectiveness of the proposed method.

Parameter Analysis
The width of the kernel σ plays a key role in classification. Figure 4 shows the classification accuracy of the proposed MASEMAP-MKL under different settings of −log(σ) values on both the IP and UP datasets. On the IP dataset, when the −log(σ) value was selected in the interval [5,9], the proposed MASEMAP-MKL method achieved the highest classification accuracy. On the UP dataset, the value of −log(σ) was selected in the interval [3,6]. Therefore, we recommend choosing the kernel width parameter −log(σ) according to the dataset itself.

Execution Efficiency
The execution efficiency of an algorithm determines its practicality in real scenarios. Table 4 lists the average execution time of the proposed MASEMAP-MKL and three SSKbased methods in seconds. Each method was executed ten times to obtain the average value. For the SSK-based methods, the main computational cost was composed of two parts. The first part was the search of similar regions, and for this, ANSSK consumed the most time. This is because a non-local algorithm requires tedious block-matching and aggregation operations. With the assistance of the highly efficient superpixel division, both ASMSSK and MASEMAP-MKL achieved competitive time consumption for the search. The second part is the kernel computations. Because the HSI contained many pixels, both SpMK and ANSSK consumed an extremely high computing time. Meanwhile, The superpixels replaced the original pixels in our design, so the number of superpixels was much lower than the number of original pixels; therefore, both AMSSK and MASEMAP-MKL achieved a low kernel calculation cost. As the number of pixels increased, this advantage of the proposed method became more and more obvious. Although the time consumed by the proposed MASEMAP-MKL was slightly higher than that of the ASMSSK method, MASEMAP-MKL also achieved higher classification accuracy. The conclusion drawn is that our method achieves a better trade-off between efficiency and classification accuracy.

Conclusions and Future Work
To improve the accuracy and efficiency of HSI classification, a novel multiscale adjacent superpixel-based extended morphological attribute profile embedded multiple kernel learning method was proposed in this study. In summary, multiscale and multimodal spatial-spectral features are described by superpixel segmentation and the EMAP to construct different kernel functions. Meanwhile, we employed a PCA-based multi-kernel learning method to determine the weight of the different kernels. With our careful design, multiscale and multimodal features were fused, which makes full utilization of the ample spatial-spectral features for HSI classification. Extensive experiments on two datasets proved that the proposed method was both effective and efficient.
For future work, the fusion of deep features, multiscale features, and multimodal features will be considered to further improve the accuracy of HSI classification.