Explainable Two-Layer Mode Machine Learning Method for Hyperspectral Image Classification

Chen, Wenjia; Cheng, Junwei; Yang, Song; Sun, Li

doi:10.3390/app15115859

Open AccessArticle

Explainable Two-Layer Mode Machine Learning Method for Hyperspectral Image Classification

¹

College of Information Science and Engineering, Shandong Agricultural University, Tai’an 271018, China

²

Postdoctoral Research Station for Agricultural Resource Utilization, Shandong Agricultural University, Tai’an 271018, China

³

College of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(11), 5859; https://doi.org/10.3390/app15115859

Submission received: 16 April 2025 / Revised: 12 May 2025 / Accepted: 20 May 2025 / Published: 23 May 2025

(This article belongs to the Special Issue Application of Machine Learning in Land Use and Land Cover)

Download

Browse Figures

Versions Notes

Abstract

Explainable machine learning methods with a specific mathematical model provide insights into how the model works. We propose a new mode that contains a two-layer architecture for hyperspectral image (HSI) classification. In the front-end learning layer, superpixel segmentation and mathematical models are combined to achieve the band selection, which obtains the data re-expression in a lower dimension. The mathematical model uses the

l_{2,1}

norm and graph regularized term, which helps induce sparsity, improve robustness to outliers and noise, and enhance the explainability of the data re-expression. We employ the support vector machine or the K-nearest neighbor algorithms in the back-end layer to classify low-dimensional data. Finally, the two-layer mode classification method is applied to the three real HSI dataset classifications. Numerical results show that the overall classification accuracy of our method is improved.

Keywords:

land cover; land use; superpixel segmentation; sparse space clustering; l_2,1 norm; band selection

1. Introduction

Hyperspectral images (HSIs) are obtained through an imaging spectrometer with hundreds of continuous spectral bands, and they integrate spectral and image information. They have multi-band and high spectral resolution characteristics [1]. Hyperspectral technology has become an effective tool for remote sensing applications, recognizing and learning multidimensional information from hyperspectral images to achieve corresponding research goals [2]. HSI classification has been used in agriculture for crop identification and health monitoring [3], environmental protection for land cover mapping [4], and water quality assessment [5]. In practical applications, the multi-band characteristics of hyperspectral remote sensing data enable a more comprehensive description of the reflection characteristics of ground objects and have a strong ability to identify ground object attributes [6]. However, the large amount of redundant information in the data can lead to high computational complexity, which is not conducive to subsequent image analysis and weakens the classifier’s generalization ability [7]. The curse of dimensionality and the Hughes phenomenon affect classification performance [8].

In recent years, HSI classification research has made significant progress [9]. Traditional machine learning methods, such as support vector machine (SVM) [10] and random forest [11], effectively handle high-dimensional spectral data by leveraging the unique spectral signatures of materials. However, these methods often struggle with capturing complex spatial–spectral relationships in HSI data, which led to the exploration of more advanced techniques.

With the advent of deep learning, convolutional neural networks (CNNs) emerged as a powerful tool for HSI classification. CNNs can automatically extract features from spectral and spatial dimensions of HSI data, significantly improving classification accuracy [12]. For example, 3D-CNNs have been particularly effective in capturing spatial–spectral information by applying convolutional operations across both dimensions. Also, hybrid models that combine CNNs with other architectures, such as recurrent neural networks (RNNs) or graph neural networks (GNNs) [9], have been developed to enhance performance. Due to their ability to handle non-Euclidean data and capture complex relationships between pixels, GNNs perform well.

The decision-making process of deep learning models remains a black box, making it challenging to understand how and why certain classifications are made [13]. Explainable artificial intelligence (EXI) methods [14,15] address this issue by providing interpretability to complex models.

Developing EXI methods based on integrating deep learning and mathematical models can help analyze the model’s decision-making basis and further optimize its performance. The HSI classification methods are divided into two categories: one is the end-to-end learning approach, for example, using deep learning algorithms [16,17,18,19,20,21] to directly train models from input data to classification results; another type has the characteristics of two-layer mode [22,23,24,25,26,27], which completes the data representation through front-end learning and then uses classifiers to determine the classification model in back-end learning.

A new two-layer mode machine learning method for HSI classification is proposed in this paper, which is shown in Figure 1.The classification method contains two layers: the front-end learning layer and the back-end learning layer. The former layer achieves HSI re-expression, which combines superpixel segmentation technology and a regularized subspace clustering method for band selection. The latter layer uses SVM or K-nearest neighbor (KNN) classification algorithms to classify the re-expressed HSI data. Finally, the evaluation metrics from the back-end learning help adjust the number of bands in the front-end learning layer.

To sum up, our main contributions are as follows. First, a two-layer mode machine learning method for HSI classification is proposed, combining the data re-expression and classification layers. Second, entropy-rate superpixel segmentation is used to reduce the impact of homogeneity and heterogeneity caused by different geographical locations. Third, an explainable mathematical model employs the

l_{2,1}

norm, the graph regulation term, which maintains the robustness and inherent geometric local structure of HSI data. Fourth, numerical comparison on three HSI data sets, i.e., TeaFarm, Salinas, and Indian Pines datasets, with the existing HSI classification methods indicate that the proposed method works well.

2. Band Selection with Sparse Space Clustering

This section summarizes the classical Sparse Space Clustering (SSC) method [28,29] for hyperspectral data analysis given the HSI dataset

H \in R^{m \times n \times b}

, where

m, n

, and

b

denote the width, height, and the number of bands, respectively. We reshape the original HSI cube

H

into matrix form

X

, where

X = (x_{1}, x_{2}, \dots, x_{b}) \in R^{m n \times b}

. The SSC method assumes that all bands (i.e., band vectors) of the

X

lie in a union of low-dimensional subspace, and a linear or affine combination of several bands can represent every band vectors. Accordingly, each band vector

x_{i} \in R^{m n}, i = 1,2, . . ., b

can be represented as

x_{i} = z_{1 i} x_{1} + z_{2 i} x_{2} + \dots + z_{b i} x_{b},

(1)

where

z_{i j}

in (1) is the

i j

th element of matrix

Z \in R^{b \times b}

. It should be noted that

x_{i}

cannot be represented by itself, which indicates that

z_{i i} = 0, i = 1,2, . . . b

.

We combine

b

band vectors into a matrix format so that (1) can be organized as

X = X Z, d i a g (Z) = 0,

(2)

where

Z

is the coefficient matrix of all band vectors and

d i a g (Z)

denotes the diagonal vector of matrix

Z

. By minimizing the following optimization Problem (3), the solution of (2) can be found [29].

\hat{Z} = a r g m i n {‖Z‖}_{q}^{2} s . t . X = XZ, diag (Z) = 0 .

(3)

where

{‖Z‖}_{q}

represents the

l_{q}

norm of

Z

, which is defined as

{‖Z‖}_{q} = {(\sum_{i = 1}^{b} \sum_{j = 1}^{b} {|z_{i j}|}^{q})}^{\frac{1}{q}}

. For

q = 0

, the

l_{0}

norm counts the number of nonzero elements of

Z

, and it helps to maintain sparse property of

Z

. The

l_{1}

norm is usually considered as a relaxation

l_{0}

norm.

Then, the sparse matrix

\hat{Z}

is used to construct the similarity matrix

W

, that is,

W = (\hat{Z} + {\hat{Z}}^{T}) / 2

. With

W

, the spectral clustering method [30] clusters all band vectors into their underlying subspaces.

We convert the band selection problem into the general form

\min {‖X - X Z‖}_{q}^{2} + λ J (Z)

(4)

where

J (\cdot)

represents the regularized term and

λ > 0

is a regularized parameter. The regularized term

J (Z)

can depend on certain properties of the data, such as geometrical information and physical meaning [22].

3. HSI Classification Method in Explainable Two-Layer Mode

This section introduces a new paradigm for HSI classification: explainable two-layer mode. We divide the classification algorithm into two layers: front-end learning and back-end learning [31]. In the front-end learning layer, the band selection is realized by combining superpixel segmentation and SSC, resulting in a low-dimensional HSI data re-expression. In the back-end learning layer, support vector machine (SVM) or K-nearest neighbor (KNN) is employed to classify low-dimensional data. Band selection and classification tasks are combined through the front-end and back-end integration strategy.

3.1. Front-End Learning Layer for Data Re-Expression

The front-end learning layer completes the band selection task, which gives HSI data a low-dimension re-expression. Our sparse SSC method combines the superpixel segmentation strategy, the

l_{2,1}

norm, and graph regularization to explore HSI information fully.

3.1.1. Segment HSI with Superpixel Segmentation

When interpreting HSI, the same land cover type is affected by geographical location, growth conditions, and other factors, presenting different spectral curves in regions, resulting in the same object with a different spectrum. The sensor resolution may yield the same spectral curve containing spectral information of multiple land cover types simultaneously.

Entropy-rate superpixel segmentation is used to reduce the impact of the above issues. It constructs a graph for the HSI image and defines the objective function for superpixel segmentation with two components: the entropy rate of a random walk on the graph and a balance term. The entropy rate favors compact and homogeneous clusters, while the balancing term encourages clusters of similar size. The superpixel segmentation provides spatial support for region-based features [32,33].

Given the HSI cube

H \in R^{m \times n \times b}

, where m and n represent the spatial width and height, respectively, and

b

denotes the number of bands, superpixel segmentation cuts an image into

N

parts, where each piece is called a superpixel. We divide

H

into

N

non-overlapping parts and obtain

N

pixel set

S_{i}

, where

S_{i}

is the index set of the

i

th superpixel and

i = 1,2, \dots, N

. Each superpixel is a group of adjacent pixels that share similar characteristics.

3.1.2. Explainable Minimization Problem for SSC

We compute the average spectral values at each band of

N

superpixels. The original matrix

X \in R^{m n \times b}

transfers into

F \in R^{N \times b}

. Then, the minimization problem for SSC can be formulated as follows:

m i n {‖F - F Z‖}_{F}^{2} + λ {‖Z‖}_{1}

where

{‖Z‖}_{1} = \underset{1 \leq j \leq b}{m a x} \sum_{i = 1}^{b} |Z_{i j}|

and

J (Z)

in (4) is set to

{‖Z‖}_{1}

, which helps to maintain a sparse coefficient matrix

Z

.

As described in (1), each band vector is written as a linear combination of other band vectors. It is reasonable in the re-expression procedure to preserve the inherent geometric local structure among the data in order to design a model that can reflect the relationship between the data

f_{1}, f_{2}, \dots, f_{N}

, where

F = {({f_{1}}^{T}, {f_{2}}^{T}, \dots, {f_{N}}^{T})}^{T}

and the re-expressed data

{\tilde{f}}_{1}, {\tilde{f}}_{2}, \dots, {\tilde{f}}_{N}

, where

F Z = {({\tilde{f}}_{1}^{T}, {\tilde{f}}_{2}^{T}, \dots, {\tilde{f}}_{N}^{T})}^{T}

. The assumption is that if

f_{i}

and

f_{j}

are in close relationship, their corresponding re-expressed data

{\tilde{f}}_{i}, {\tilde{f}}_{j}

should also be close. According to this discussion, we suppose that

S = (s_{i j}) \in R^{N \times N}

represents a matrix where

s_{i j}

quantifies the similarity between the data point

f_{i}

and

f_{j}

. The problem of learning local structure can be formulated as

\begin{matrix} m i n \frac{1}{2} \sum_{i, j = 1}^{N} {‖{\tilde{f}}_{i} - {\tilde{f}}_{j}‖}_{2}^{2} s_{i j} = \frac{1}{2} \sum_{i, j = 1}^{N} ({\tilde{f}}_{i} - {\tilde{f}}_{j}) {({\tilde{f}}_{i} - {\tilde{f}}_{j})}^{T} s_{i j} \\ = \frac{1}{2} \sum_{i, j = 1}^{N} ({\tilde{f}}_{i} {\tilde{f}}_{i}^{T} + {\tilde{f}}_{j} {\tilde{f}}_{j}^{T} - {\tilde{f}}_{j} {\tilde{f}}_{i}^{T} - {\tilde{f}}_{i} {\tilde{f}}_{j}^{T}) s_{i j} = \sum_{i, j = 1}^{N} ({\tilde{f}}_{i} {\tilde{f}}_{i}^{T} - {\tilde{f}}_{i} {\tilde{f}}_{j}^{T}) s_{i j} \\ = \sum_{i = 1}^{N} {\tilde{f}}_{i} d_{i i} {\tilde{f}}_{i}^{T} - \sum_{i = 1}^{N} {\tilde{f}}_{i} s_{i j} {\tilde{f}}_{j}^{T} = t r (Z^{T} F^{T} D F Z) - t r (Z^{T} F^{T} S F Z) = t r (Z^{T} F^{T} L F Z) \end{matrix}

where

L = D - S

is the Laplacian matrix related to the data,

S = {(s_{i j})}_{N \times N},

and

D

is a diagonal matrix with diagonal elements

d_{i i} = \sum_{j = 1}^{N} s_{i j}

for

i = 1, 2, \dots, N

.

s_{i j} = e x p (- {‖{\tilde{f}}_{i} - {\tilde{f}}_{j}‖}^{2} / 2 σ^{2}), i = 1,2, . . . N,

(5)

The explainable minimization problem for SSC is defined as follows:

\min {‖F - F Z‖}_{2,1}^{2} + λ_{1} t r (Z^{T} F^{T} L F Z) + λ_{2} {‖Z‖}_{2,1}^{2},

(6)

where

{‖\cdot‖}_{2,1}

denotes the

l_{2,1}

norm,

{‖Z‖}_{2,1} = {\sum_{j = 1}^{b} (\sum_{i = 1}^{b} {z_{i j}}^{2})}^{\frac{1}{2}}

.

The gradient of the objective function in (6) is

\nabla f (Z) = F^{T} P_{1} F Z - F^{T} P_{1} F + λ_{1} F^{T} L F Z + λ_{2} P_{2} Z

(7)

We let

d f (Z) / d Z = 0

; therefore, we have

Z = {(F^{T} P_{1} F + λ_{1} F^{T} L F + λ_{2} P_{2})}^{- 1} (F^{T} P_{1} F)

(8)

where

P_{1}

and

P_{2}

are diagonal matrices. The diagonal elements are

{(P_{1})}_{i i} = 1 / \sqrt{\sum_{j = 1}^{b} {(F - F Z)}_{i j}^{2}}, {(P_{2})}_{i i} = 1 / \sqrt{\sum_{j = 1}^{b} Z_{i j}^{2}}

We set

W = (\hat{Z} + {\hat{Z}}^{T}) / 2

where

\hat{Z}

is the estimated solution of (5); then, we cluster all band vectors into

k

underlying subspaces with the spectral clustering method. Finally, we obtain the selected band set

B_{k}

; then, after feature selection, the input data to the second layer are

F_{B_{k}} {\in R}^{N \times k}

, obtained with band selection.

3.2. Back-End Learning Layer for HSI Classification

Front-end learning obtains HSI data re-expression in a lower dimension, while back-end learning starts up the classification procedure.

KNN or SVM is used to complete the classification task for back-end learning. Given a new sample point to be classified, KNN searches for the nearest training samples in the training set and predicts the new sample point’s category based on the neighbors’ categories. In our numerical test, the distance between samples is calculated with Euclidean distance. The basic idea of SVM is to find hyperplanes that maximize the interval between different categories. In the numerical test, we call Matlab’s function KKN-Classifier or SVM-Classifier to complete the classification work.

3.3. Two-Layer Mode HSI Classification Method

The two-layer mode HSI classification method proposed in this paper fuses the front-end and back-end through the optimal indicators. We set the number of selected bands

K

to 5, 10, and 15. The indicator is calculated using the classification results. Finally, the optimal indicator determines the number of selected bands. Figure 2 shows the two-layer mode of our method.

Now, we are ready to give Algorithm 1, which contains our band selection method for HSI data.

Algorithm 1: Classification Algorithm under two-Layer Mode (CALM)

Input: HSI data

H \in R^{m \times n \times b}, the number of the selected band K, the number of segmented region N, parameters λ_{1} and λ_{2}, Z^{0} = r a n d n (b), k ≔ 0, ε .

Output: The index set of the selected band Z_K.

Step 1: Segment H into N regions with the index set s₁, s₂, ..., s_N via ERS.

Step 2: Compute F with the index set S₁, S₂, ..., S_N.

Step 3: Compute

\nabla f (Z) with (7), if ‖\nabla f (Z)‖ < ε

then goto Step 6, else, go to Step 4.

Step 4: Update Z with (5).

Step 5: Update S with (8), go to step 3.

Step 6: Let Z = (Z + Z^T)/Z, and generate Z_K with spectral clustering algorithm.

Step 7: Using Z_K as input, train a classification model using SVM or KNN, andcalculate the evaluation metrics obtained by selecting K bands.

Step 8:

If K < 50, then K ≔ K + 5; go to Step 6 . Otherwise, O A = \max ({O A}_{K}), K ≔ \underset{K}{\arg} m a x ({O A}_{K}) .

4. Experiments

Having presented the explainable two-layer mode HSI classification method, we are ready to test its performance on three HSI datasets.

4.1. Datasets

The three HSI datasets in our experiments are TeaFarm, Salinas, and Indian Pines, whose pseudo-color and ground-truth images are shown in Figure 3, Figure 4 and Figure 5.

TeaFarm: Located in the tea tree planting base of Fanglu village, Changzhou, Jiangsu Province, the data contain 348 × 512 pixels with a spatial resolution of 2.25 m. The dataset consists of 80 spectral bands covering a spectral range of 417–855 nm.

It includes 10 types of land cover classes, which is available at https://www.geodoi.ac.cn/edoi.aspx?DOI=10.3974/geodb.2017.03.04.V1 (access on 21 May 2025).

Salinas: The data have a size of 512 × 217 with a spatial resolution of 3.7 m. The dataset contains 224 contiguous spectral bands. After removing 20 water absorption bands (108–112, 154–167, 224), 204 bands are actually used for training. The dataset includes 16 types of land cover classes.

Indian Pines: Imaged by the AVIRIS spectrometer, with a wavelength range of 0.4–2.5 μm, the data size is 145 × 145, and there are 224 spectral bands, of which 200 are used for experiments. The dataset contains 16 types of land cover classes.

4.2. Comparison Algorithms

The algorithm proposed in this article is compared numerically with the following four band selection algorithms. After implementing band selection for front-end learning, we use the same classifier for classification and compare the performance of five band selection algorithms. The information for all of these methods is summarized as follows.

ASPS_MN [34]: The core idea is to divide the HSI cube into multiple sub-cubes via an adaptive subspace partition strategy and then select the band with minimum noise in each sub-cube as the representative band to reduce the data dimension while retaining more useful information.

SEASP [35]: By calculating the correlation coefficient between adjacent bands to identify the orderliness between bands, a subset of characteristic bands is selected from the partitioned subspace using information entropy as the metric.

S⁴P [36]: The spatial structure of HSI is captured through superpixel segmentation, and the spectral correlation between bands is learned using a self-representation model. An adaptive and weighted multi-graph fusion term is designed to generate a unified similarity graph between different superpixels, and the importance of the bands is measured by imposing an

l_{2,1}

norm on the self-representation coefficient matrix.

RLFFC [37]: Spatial information is obtained through superpixel segmentation, and the separability among bands is enhanced by a latent feature fusion strategy, thereby improving the performance of band selection.

GRSC [22]: The spatial information of HSI is preserved through superpixel segmentation, and representative bands are selected by combining spectral correlation and spatial structure information using a self-representation subspace clustering model, thereby reducing data redundancy and noisy bands.

4.3. Experimental Setup

In the experiments, three metrics, OA, AA, and Kappa, are used as the evaluation criteria, and two typical traditional classifiers, KNN and SVM, are also used. The number of selected bands is systematically varied from 5 to 50, with increments of 5, to observe the impact of band selection on classification performance. The parameter

k

of KNN is set to 5 in our experiments. As for the SVM classifier, the cross-validation process determines that the Radial Basis Function (RBF) with the kernel parameter γ = 0.5 performs best for our classification tasks. In order to reduce the sampling random bias, each classification algorithm is run 10 times, and the average classification accuracy is reported.

There are two critical parameters,

λ_{1}

and

λ_{2},

in our model. We meticulously tune these parameters within a wide range of values, specifically from

10^{- 4}

to

10^{4}

. The optimal settings of these parameters that yield the best classification results are determined, that is,

λ_{1}

= 0.5 and

λ_{2}

= 0.1.

4.4. Experimental Results

Table 1 provides a detailed list of OA, AA, and Kappa values obtained using different band selection algorithms on different datasets.

During the classification stage, the stratified sampling approach helps construct training and testing sets. To form the training set, 70% of the samples from each class for each frequency band are randomly selected. This ensures that the training data are representative of the entire dataset. The remaining 30% of the samples are allocated to the testing set, which is used to evaluate the model’s performance on unseen data.

The results show that CALM showed a certain degree of improvement after introducing the regularization term compared to S⁴P and GRSC in all three evaluation metrics. Compared with ASPS_MN, introducing superpixel segmentation technology makes the band selection results more effective. Under all classifiers and conditions, the OA value of CALM is generally high, with values generally approaching or exceeding 0.9. As shown in Table 1, the new method, CALM, consistently outperforms other methods in terms of different metrics, i.e., OA, AA, Kappa. Taking TeaFarm datasets as an example, the OA of CALM with the SVM classifier increases by 6.88%, 1.86%, 1.71%, 5.39%, and 2.09%; the value of AA increases by 11.53%, 11.41%, 11.52%, 17.81%, and 12.35%; the value of Kappa increases by 25.08%, 2.68%, 2.48%, 7.83%, and 3.02%.

As Figure 6, Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11 demonstrates the visualization result graph under different classification methods. Taking the TeaFarm dataset as an example, from Figure 6 and Figure 7, the areas classified as bamboo forest (red area) and grassland (orange area) are less affected by other categories and exhibit higher color category purity in the classification map. In the tea tree area, the distribution range of tea trees can be more accurately identified, reducing the occurrence of misclassification and missed classification. Compared with other methods, the CALM algorithm has higher classification accuracy in complex regions and significantly reduces speckle noise. The classification results indicate the effectiveness of the CALM method, which is more suitable for subsequent hyperspectral data classification.

As is well known, raw hyperspectral images often contain noise and a large amount of redundant information, which can significantly negatively impact classification accuracy [38]. Therefore, it is not easy to consistently achieve optimal performance for HSI classification using a complete set of bands. On the contrary, selecting the appropriate number of bands can improve classification performance to a certain extent.

The optimal number of bands for 3 HSI datasets is obtained with the OA curves, which is shown in Figure 12, Figure 13 and Figure 14. The OA curves describe the OA values with different numbers of selected bands.

Figure 12, Figure 13 and Figure 14 show that the OA values increase obviously when the number of selected bands increases from 5 to 10 for all six band selection methods. The CALM method stays stable, and its OA values are better than those of the other methods.

5. Conclusions

This paper proposes a two-layer mode HSI classification method divided into front-end and back-end learning stages. To reduce the dimensionality of HSI data, the representation learning in the front end combines superpixel segmentation techniques and the minimization model defined with norm and graph regularization, which yields the re-expression of HSI data for the back-end classification. After the back-end procedure implements classification, the number of bands selected by the front-end procedure is determined by classification accuracy.

Introducing a two-layer mode machine learning method helps further develop the interpretability of end-to-end deep learning models, and so does the use of certain mathematical models to achieve data re-expression, providing explainable features for analyzing numerical results of related applications.

Author Contributions

Conceptualization, W.C., J.C. and L.S.; data curation, W.C. and S.Y.; investigation W.C., J.C. and L.S.; validation S.Y. and L.S.; formal analysis, S.Y. and L.S.; software, W.C. and S.Y.; supervision, J.C.; visualization, W.C.; writing—original draft, W.C.; funding acquisition, L.S.; methodology, W.C. and L.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by the Strategic Priority Research Program of Chinese Academy of Sciences, Grant No. XDB 0900201.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article.

Acknowledgments

We would like to thank the anonymous referees for their helpful comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cheng, M.-F.; Mukundan, A.; Karmakar, R.; Valappil, M.A.E.; Jouhar, J.; Wang, H.C. Modern Trends and Recent Applications of Hyperspectral Imaging: A Review. Technologies 2025, 13, 170. [Google Scholar] [CrossRef]
Zhang, Y.; Wu, W.; Zhou, X.; Cheng, J.H. Non-Destructive Detection of Soybean Storage Quality Using Hyperspectral Imaging Technology. Molecules 2025, 30, 1357. [Google Scholar] [CrossRef]
Zhang, Z.; Huang, L.; Wang, Q.; Jiang, L.; Qi, Y.; Wang, S.; Gu, Y. UAV Hyperspectral Remote Sensing Image Classification: A Systematic Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 3099–3124. [Google Scholar] [CrossRef]
Pushpalatha, V.; Mallikarjuna, P.B.; Mahendra, H.N.; Subramoniam, S.R.; Mallikarjunaswamy, S. Land use and land cover classification for change detection studies using convolutional neural network. Appl. Comput. Geosci. 2025, 25, 100227. [Google Scholar] [CrossRef]
Valme, D.; Rassõlkin, A.; Liyanage, D.C. From ADAS to Material-Informed Inspection: Review of Hyperspectral Imaging Applications on Mobile Ground Robots. Sensor 2025, 25, 2346. [Google Scholar] [CrossRef]
Ma, K.; Yao, C.; Liu, B.; Hu, Q.; Li, S.; He, P.; Han, J. Segment Anything Model-Based Hyperspectral Image Classification for Small Samples. Remote Sens. 2025, 17, 1349. [Google Scholar] [CrossRef]
Hennessy, A.; Clarke, K.; Lewis, M. Hyperspectral Classification of Plants: A Review of Waveband Selection Generalisability. Remote Sens. 2020, 12, 113. [Google Scholar] [CrossRef]
Mianji, F.A.; Zhang, Y. Robust hyperspectral classification using relevance vector machine. IEEE Trans. Geosci. Remote Sens. 2011, 49, 2100–2112. [Google Scholar] [CrossRef]
Zhao, X.; Ma, J.; Wang, L.; Zhang, Z.; Ding, Y.; Xiao, X. A review of hyperspectral image classification based on graph neural networks. Artif. Intell. Rev. 2025, 58, 172. [Google Scholar] [CrossRef]
He, Z.; Xia, K.; Zhang, J.; Wang, S.; Yin, Z. An enhanced semi-supervised support Vector Machine Algorithm for spectral-spatial hyperspectral image classification. Pattern Recognit. Image Anal. 2024, 34, 199–211. [Google Scholar]
Chen, C.; Yuan, X.; Gan, S.; Kang, X.; Luo, W.; Li, R.; Gao, S. A new strategy based on multi-source remote sensing data for improving the accuracy of land use/cover change classification. Sci. Rep. 2024, 14, 26855. [Google Scholar] [CrossRef] [PubMed]
Banerjee, A.; Swain, S.; Rout, M.; Bandyopadhyay, M. Composite spectral spatial pixel CNN for land-use hyperspectral image classification with hybrid activation function. Multimed. Tools Appl. 2024, 84, 10527–10550. [Google Scholar] [CrossRef]
Liu, J.; Lan, J.; Zeng, Y.; Luo, W.; Zhuang, Z.; Zou, J. Explainability Feature Bands Adaptive Selection for Hyperspectral Image Classification. Remote Sens. 2025, 17, 1620. [Google Scholar] [CrossRef]
Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Hussain, A. Interpreting black-box models: A review on explainable artificial intelligence. Cogn. Comput. 2024, 16, 45–74. [Google Scholar] [CrossRef]
Contreras, J.; Bocklitz, T. Explainable artificial intelligence for spectroscopy data: A review. Pflügers Arch. Eur. J. Physiol. 2024, 477, 603–615. [Google Scholar] [CrossRef]
Li, R.; Zheng, S.; Duan, C.; Yang, Y.; Wang, X. Classification of hyperspectral image based on double-branch dual-attention mechanism network. Remote Sens. 2020, 12, 582. [Google Scholar] [CrossRef]
Yu, H.; Zhang, H.; Liu, Y.; Zheng, K.; Xu, Z.; Xiao, C. Dual-channel convolution network with image-based global learning framework for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
Li, T.; Zhang, X.; Zhang, S.; Wang, L. Self-supervised learning with a dual-branch ResNet for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Wang, C.; Zhan, C.; Lu, B.; Yang, W.; Zhang, Y.; Wang, G.; Zhao, Z. SSFAN: A Compact and Efficient Spectral-Spatial Feature Extraction and Attention-Based Neural Network for Hyperspectral Image Classification. Remote Sens. 2024, 16, 4202. [Google Scholar] [CrossRef]
Tang, X.; Zhang, K.; Zhou, X.; Zeng, L.; Huang, S. Enhancing Binary Convolutional Neural Networks for Hyperspectral Image Classification. Remote Sens. 2024, 16, 4398. [Google Scholar] [CrossRef]
Zhao, H.; Lu, Z.; Sun, S.; Wang, P.; Jia, T.; Xie, Y.; Xu, F. Classification of Large Scale Hyperspectral Remote Sensing Images Based on LS3EU-Net++. Remote Sens. 2025, 17, 872. [Google Scholar] [CrossRef]
Wang, J.; Tang, C.; Zheng, X.; Liu, X.; Zhang, W.; Zhu, E. Graph regularized spatial–spectral subspace clustering for hyperspectral band selection. Neural Netw. 2022, 153, 292–302. [Google Scholar] [CrossRef]
Wang, Y.; Ma, H.; Yang, Y.; Zhao, E.; Song, M.; Yu, C. Self-supervised deep multi-level representation learning fusion-based maximum entropy subspace clustering for hyperspectral band selection. Remote Sens. 2024, 16, 224. [Google Scholar] [CrossRef]
Wang, Q.; Li, Q.; Li, X. A fast neighborhood grouping method for hyperspectral band selection. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5028–5039. [Google Scholar] [CrossRef]
Liu, K.; Chen, Y.; Chen, T. A band subset selection approach based on sparse self-representation and band grouping for hyperspectral image classification. Remote Sens. 2022, 14, 5686. [Google Scholar] [CrossRef]
Zhao, H.; Bruzzone, L.; Guan, R.; Zhou, F.; Yang, C. Spectral-spatial genetic algorithm-based unsupervised band selection for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 9616–9632. [Google Scholar] [CrossRef]
Habermann, M.; Fremont, V.; Shiguemori, E.H. Supervised band selection in hyperspectral images using single-layer neural networks. Int. J. Remote Sens. 2019, 40, 3900–3926. [Google Scholar] [CrossRef]
Elhamifar, E.; Vidal, R. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2765–2781. [Google Scholar] [CrossRef]
Sun, W.; Zhang, L.; Du, B.; Li, W.; Lai, Y.M. Band selection using improved sparse subspace clustering for hyperspectral imagery classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2784–2797. [Google Scholar] [CrossRef]
Ng, A.; Jordan, M.; Weiss, Y. On spectral clustering: Analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2001, 14, 849–856. [Google Scholar]
Guo, T.; Han, C.; Li, M. Fusion of front-end and back-end learning based on layer-by-layer data re-representation. Sci. Sin. Inform. 2019, 49, 739–759. (In Chinese) [Google Scholar] [CrossRef]
Liu, M.; Tuzel, O.; Ramalingam, S.; Chellappa, R. Entropy rate superpixel segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, 20–25 June 2011; pp. 2097–2104. [Google Scholar]
Li, X.; Chen, J.; Zhao, L.; Li, H.; Wang, J.; Sun, L.; Guo, S.; Chen, P.; Zhao, X. Superpixel segmentation based on anisotropic diffusion model for object-oriented remote sensing image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 17, 7621–7639. [Google Scholar] [CrossRef]
Wang, Q.; Li, Q.; Li, X. Hyperspectral band selection via adaptive subspace partition strategy. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 4940–4950. [Google Scholar] [CrossRef]
Tang, C.; Wang, J. A hyperspectral band selection method via adjacent subspace partition. Tientsin Univ. J. 2022, 55, 255–262. [Google Scholar]
Tang, C.; Wang, J.; Zheng, X.; Liu, X.; Xie, W.; Li, X.; Zhu, X. Spatial and spectral structure preserved self-representation for unsupervised hyperspectral band selection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar] [CrossRef]
Wang, J.; Tang, C.; Li, Z.; Liu, X.; Zhang, W.; Zhu, E.; Wang, L. Hyperspectral band selection via region-aware latent features fusion based clustering. Inf. Fusion 2022, 79, 162–173. [Google Scholar] [CrossRef]
Wang, H.; Yu, G.; Cheng, J.; Zhang, Z.; Wang, X.; Xu, Y. Fast hyperspectral image classification with strong noise robustness based on minimum noise fraction. Remote Sens. 2024, 16, 3782. [Google Scholar] [CrossRef]

Figure 1. Structure diagram of two-layer mode machine learning method.

Figure 2. Band selection model based on superpixel segmentation.

Figure 3. TeaFarm: (a) pseudo-color image; (b) ground-truth image; (c) class names.

Figure 4. Salinas: (a) pseudo-color image; (b) ground-truth image; (c) class names.

Figure 5. Indian Pines: (a) pseudo-color image; (b) ground-truth image; (c) class names.

Figure 6. TeaFarm dataset classification results using KNN classifier: (a) ASPS_MN; (b) S⁴P; (c) GRSC; (d) RLFFC; (e) SEASP; (f) CALM; (g) Color labels.

Figure 7. TeaFarm dataset classification results using SVM classifier: (a) ASPS_MN; (b) S⁴P; (c) GRSC; (d) RLFFC; (e) SEASP; (f) CALM; (g) Color labels.

Figure 8. Salinas dataset classification results using KNN classifier: (a) ASPS_MN; (b) S⁴P; (c) GRSC; (d) RLFFC; (e) SEASP; (f) CALM; (g) Color labels.

Figure 9. Salinas dataset classification results using SVM classifier: (a) ASPS_MN; (b) S⁴P; (c) GRSC; (d) RLFFC; (e) SEASP; (f) CALM; (g) Color labels.

Figure 10. Indian Pines dataset classification results using KNN classifier: (a) ASPS_MN; (b) S⁴P; (c) GRSC; (d) RLFFC; (e) SEASP; (f) CALM; (g) Color labels.

Figure 11. Indian Pines datasets classification results using SVM classifier: (a) ASPS_MN; (b) S⁴P; (c) GRSC; (d) RLFFC; (e) SEASP; (f) CALM; (g) Color labels.

Figure 12. OA of two different classifiers with varied numbers of selected bands on the TeaFarm dataset. (a) OA obtained by KNN; (b) OA obtained by SVM.

Figure 13. OA of two different classifiers with varied numbers of selected bands on the Salinas dataset. (a) OA obtained by KNN; (b) OA obtained by SVM.

Figure 14. OA of two different classifiers with varied numbers of selected bands on the Indian Pines dataset. (a) OA obtained by KNN; (b) OA obtained by SVM.

Table 1. Classification results of different methods with three metrics on different datasets.

Datasets	Classifier	Metrics	ASPS_MN	S⁴P	GRSC	RLFFC	SEASP	CALM
TeaFarm	KNN	OA	0.9012	0.9417	0.9508	0.9437	0.9445	0.9522
		AA	0.7747	0.8576	0.8638	0.8633	0.8647	0.8902
		Kappa	0.8932	0.9152	0.9140	0.9180	0.9194	0.9302
	SVM	OA	0.8802	0.9304	0.9319	0.8951	0.9281	0.9490
		AA	0.7479	0.7491	0.7480	0.6 851	0.7397	0.8632
		Kappa	0.6747	0.8987	0.9007	0.8472	0.8953	0.9255
Salinas	KNN	OA	0.8282	0.9150	0.9242	0.9074	0.9201	0.9260
		AA	0.8830	0.9565	0.9629	0.9537	0.9610	0.9637
		Kappa	0.8172	0.9075	0.9174	0.8994	0.9129	0.9193
	SVM	OA	0.8318	0.9318	0.9458	0.8908	0.9478	0.9469
		AA	0.8849	0.9673	0.9738	0.9320	0.9721	0.9757
		Kappa	0.8210	0.9255	0.9259	0.8812	0.9418	0.9417
Indian Pines	KNN	OA	0.7184	0.7678	0.8644	0.7446	0.7653	0.8363
		AA	0.7324	0.7678	0.8000	0.6824	0.7150	0.8285
		Kappa	0.6885	0.7479	0.7807	0.7234	0.7449	0.8285
	SVM	OA	0.7552	0.8616	0.8971	0.8009	0.8799	0.9016
		AA	0.7284	0.8626	0.8976	0.7412	0.8852	0.8989
		Kappa	0.7352	0.8477	0.8857	0.7821	0.8671	0.8907

NOTE: The best results on the quality index are labeled in bold.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, W.; Cheng, J.; Yang, S.; Sun, L. Explainable Two-Layer Mode Machine Learning Method for Hyperspectral Image Classification. Appl. Sci. 2025, 15, 5859. https://doi.org/10.3390/app15115859

AMA Style

Chen W, Cheng J, Yang S, Sun L. Explainable Two-Layer Mode Machine Learning Method for Hyperspectral Image Classification. Applied Sciences. 2025; 15(11):5859. https://doi.org/10.3390/app15115859

Chicago/Turabian Style

Chen, Wenjia, Junwei Cheng, Song Yang, and Li Sun. 2025. "Explainable Two-Layer Mode Machine Learning Method for Hyperspectral Image Classification" Applied Sciences 15, no. 11: 5859. https://doi.org/10.3390/app15115859

APA Style

Chen, W., Cheng, J., Yang, S., & Sun, L. (2025). Explainable Two-Layer Mode Machine Learning Method for Hyperspectral Image Classification. Applied Sciences, 15(11), 5859. https://doi.org/10.3390/app15115859

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Explainable Two-Layer Mode Machine Learning Method for Hyperspectral Image Classification

Abstract

1. Introduction

2. Band Selection with Sparse Space Clustering

3. HSI Classification Method in Explainable Two-Layer Mode

3.1. Front-End Learning Layer for Data Re-Expression

3.1.1. Segment HSI with Superpixel Segmentation

3.1.2. Explainable Minimization Problem for SSC

3.2. Back-End Learning Layer for HSI Classification

3.3. Two-Layer Mode HSI Classification Method

4. Experiments

4.1. Datasets

4.2. Comparison Algorithms

4.3. Experimental Setup

4.4. Experimental Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI