Next Article in Journal
Polarimetric SAR Ship Detection Using Context Aggregation Network Enhanced by Local and Edge Component Characteristics
Previous Article in Journal
Study on the Spatiotemporal Heterogeneity and Threshold Effects of Ecosystem Services in Honghe Prefecture, Yunnan Province
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hyperspectral Band Selection via Tensor Low Rankness and Generalized 3DTV †

Department of Mathematics, University of Kentucky, Lexington, KY 40506, USA
*
Author to whom correspondence should be addressed.
This article is a revised and expanded version of a paper entitled “Hyperspectral Band Selection based on Generalized 3DTV and Tensor CUR Decomposition”, which was presented at the Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 27–30 October 2024.
Remote Sens. 2025, 17(4), 567; https://doi.org/10.3390/rs17040567
Submission received: 19 December 2024 / Revised: 30 January 2025 / Accepted: 5 February 2025 / Published: 7 February 2025

Abstract

:
Hyperspectral band selection plays a key role in reducing the high dimensionality of data while maintaining essential details. However, existing band selection methods often encounter challenges, such as high memory consumption, the need for data matricization that disrupts inherent data structures, and difficulties in preserving crucial spatial–spectral relationships. To address these challenges, we propose a tensor-based band selection model using Generalized 3D Total Variation (G3DTV), which utilizes the 1 p norm to promote smoothness across spatial and spectral dimensions. Based on the Alternating Direction Method of Multipliers (ADMM), we develop an efficient hyperspectral band selection algorithm, where the tensor low-rank structure is captured through tensor CUR decomposition, thus significantly improving computational efficiency. Numerical experiments on benchmark datasets have demonstrated that our method outperforms other state-of-the-art approaches. In addition, we provide practical guidelines for parameter tuning in both noise-free and noisy data scenarios. We also discuss computational complexity trade-offs, explore parameter selection using grid search and Bayesian Optimization, and extend our analysis to evaluate performance with additional classifiers. These results further validate the proposed robustness and accuracy of the model.

Graphical Abstract

1. Introduction

Hyperspectral Imaging (HSI) is an advanced remote sensing technology that captures detailed spectral information across hundreds or thousands of narrow and contiguous wavelength bands. This capability provides far more comprehensive data than conventional RGB imagery, enabling in-depth insights into the physical and chemical properties of observed objects. The rich spectral data acquired through HSI supports a wide range of applications, including remote sensing, environmental monitoring, precision agriculture, biomedical imaging, and cultural heritage preservation. However, the high-dimensional nature of HSI data poses significant challenges, such as data redundancy and increased computational complexity, which can hinder efficient data processing and analysis. These issues necessitate the development and application of effective dimensionality reduction techniques to optimize data utility and usability.
To address these challenges, hyperspectral band selection focuses on identifying and retaining a subset of the most informative spectral bands while discarding redundant or less significant ones. The goal of band selection is to preserve the essential spectral characteristics of the data while substantially reducing the dimensionality. This process not only alleviates computational burdens but also helps mitigate issues related to the curse of dimensionality [1], thereby enhancing the efficiency and accuracy of subsequent analysis and classification tasks. A detailed categorization of band selection methods can be referred to in [2]. In this work, we aim to develop an unsupervised hybrid band selection approach based on smoothness and sparsity but without prior knowledge of the label information, which can be practically beneficial when the labeled data are insufficient or even unavailable.
Recent advancements in supervised hyperspectral band selection methods have leveraged deep learning techniques to improve performance. These approaches often utilize convolutional neural networks (CNNs) and attention mechanisms to identify the most informative spectral bands. For instance, the CNN Embedded Genetic Algorithm (CNNeGA) method [3] combines the feature extraction capabilities of CNNs with the optimization power of genetic algorithms (GAs). In this approach, a CNN is trained to extract spectral–spatial features from hyperspectral images, which are then used as inputs to a GA for band selection. Additionally, the Deep Reinforcement Learning for Semisupervised Hyperspectral Band Selection method [4] employs a deep reinforcement learning framework for band selection. In this approach, an agent is trained to select the most informative bands through interactions with the data environment, which is guided by a reward signal that reflects classification performance.
Despite these advancements in supervised methods, unsupervised approaches offer several compelling advantages for hyperspectral band selection. They can operate without labeled training data, which are often scarce or expensive to obtain in hyperspectral imaging applications. This characteristic provides significant flexibility and generalizability, as these techniques are not biased towards specific tasks or classes. In this work, we aim to develop an unsupervised hybrid band selection approach based on smoothness and sparsity but without prior knowledge of label information. This approach can be particularly beneficial when labeled data are insufficient or unavailable, leveraging the inherent strengths of unsupervised methods to extract meaningful spectral information from hyperspectral imagery.
Many band selection methods convert hyperspectral data into a matrix by treating each band image as a high-dimensional point and then identifying the most important bands based on their geometric or statistical similarities. For example, ranking-based methods such as Maximum-Variance Principal Component Analysis (MVPCA) [5] prioritize bands by leveraging variance-based criteria. The enhanced Fast Density Peak-based Clustering (E-FDPC) algorithm [6] aims to identify density peaks within the spectral space, effectively selecting the most discriminative bands. Additionally, the Optimal Neighborhood Reconstruction (ONR) method [7] selects bands by reconstructing local neighborhoods, focusing on minimizing reconstruction error while preserving both local and global spectral structures. To automatically determine the minimum number of necessary bands, the Fast Neighborhood Grouping (FNGBS) method [8] employs a coarse–fine strategy and partitions the hyperspectral image into several groups based on local density and information entropy. As an improved version of E-FDPC, the Similarity-based Ranking Strategy with Structural Similarity (SR-SSIM) [9] offers another unique perspective on band selection, prioritizing bands based on their contribution to the overall structural similarity of the hyperspectral image. The SR-SSIM preserves both the spectral and spatial structures of hyperspectral data, maintaining the essential characteristics of the original dataset through the selected bands. However, transforming hyperspectral data into their matrix form can disrupt the spatial–spectral relationships inherent in the original tensor structure, potentially reducing the effectiveness of subsequent analysis. These limitations highlight the need for efficient and memory-conscious methods capable of directly operating on the tensor structure of the hyperspectral data while preserving their inherent spatial–spectral relationships.
To further improve the efficiency and accuracy of the aforementioned methods, recent band selection approaches have been employed that integrate spatial and spectral smoothness information into the selection process. For instance, graph-based methods explore the spatial smoothness encoded in a Laplacian graph. The Marginalized Graph Self-Representation (MGSR) method [10] uses superpixel segmentation to create a region-wise similarity graph and learns the band importance through solving a minimization problem according to the sparse self-representation of the hyperspectral data under the graph structure. Building on this concept, the Tensor Graph Self-Representation (TGSR) method [11] utilizes the inherent tensor structure of hyperspectral data, simultaneously exploiting spectral–spatial relationships to provide a more comprehensive representation for band selection. To enhance the efficiency of graph-based frameworks, a Fast and Robust Principal Component Analysis on Laplacian Graph (FRPCALG) method [12] was proposed, which incorporates a graph regularization term based on a band-wise Laplacian graph into a robust PCA framework. This integration preserves spectral correlation while maintaining a sparse and low-rank representation of the data matrix. Recently, the Matrix Hyperspectral Band Selection based on CUR decomposition approach (MHBSCUR) [13] has been proposed to integrate matrix CUR decomposition into the graph-based framework. Here, the matrix CUR decomposition is a low-rank factorization method that approximates a given matrix by selecting a small subset of its rows and columns, improving algorithmic efficiency. However, graph-based strategies often incur additional computational overhead due to the graph construction process. Furthermore, storing graph information can require a significant amount of memory, especially for large-scale hyperspectral datasets. These challenges may limit their applicability to smaller datasets or necessitate substantial computational resources. As an alternative to graph-based regularization, Total Variation (TV) [14] offers promising performance and has been widely used in signal and image processing to promote smoothness while preserving features such as edges. TV has been extended from 2D to 3D, such as 3DTV [15], and applied for hyperspectral data recovery and processing, such as hyperspectral image denoising [16], super-resolution reconstruction [17], and unmixing [18,19,20], due to its ability to preserve sharp edges while reducing noise. To address the staircase artifacts caused by traditional TV, several generalizations and extensions of TV have been introduced to improve reconstruction accuracy, such as Total Generalized Variation (TGV) [21] and Higher-Degree Total Variation (HDTV) [22]. The fast Fourier transform enables the efficient implementation of finite difference operators in TV, even for hyperspectral data. However, TV and its variants have not been fully explored in the context of hyperspectral band selection.
To address the aforementioned challenges, we propose a novel tensor-based hyperspectral band selection model [23] that decomposes hyperspectral data into a low-rank, smooth component and a sparse component. Specifically, we develop a Generalized 3D Total Variation (G3DTV) regularization which applies the 1 p norm to derivatives to enhance the smoothness across both spatial and spectral domains. Different from 3D total variation [15] involving high-order derivatives, G3DTV exploits the high sparsity of first-order derivatives. By promoting smooth transitions in spectral response among selected bands, our model demonstrates improved robustness to outliers and noise, leading to more reliable results in practical applications. Using the Alternating Direction Method of Multipliers (ADMM) [24,25,26], we develop an efficient algorithm to solve the proposed model, resulting in three types of subproblems. Two of the subproblems admit closed-form solutions based on the proximal operator of 1 p norm when p = 1 , 2 , 3 , 4 . For the subproblem involving the low-rank constraint, in light of [13,27,28], we employ a tensor CUR decomposition to rewrite a low-rank tensor as a product of three smaller factor tensors, which are updated efficiently within a gradient descent framework. In this work, we adopt the t-CUR decomposition [29] to accelerate the computation, which differs from the tensor CUR decompositions extended from the matrix case such as the Fiber and Chidori CUR decompositions [30,31,32,33]. The innovative integration of low rankness, captured by t-CUR, and smoothness, represented by G3DTV, enables our algorithm to effectively characterize the underlying complex geometry of HSI data. This makes our method particularly well suited for handling large-scale datasets while ensuring robust band selection performance.
The organization of the remainder of this paper is as follows: Section 2 introduces fundamental concepts and definitions in tensor algebra, along with a detailed definition of the proposed G3DTV regularization. This section provides the necessary theoretical background for understanding the proposed method. Section 3 presents our novel band selection method in detail, outlining the derivation of our algorithm based on ADMM, implementation, and key features. Section 4 demonstrates the performance of the proposed method through a series of comprehensive numerical experiments. We evaluate our algorithm using benchmark remote sensing datasets, comparing its performance with state-of-the-art band selection techniques. In Section 5, we discuss parameter selection methods, including grid search and Bayesian Optimization, analyze the impact of noise on the performance of our algorithm, and explore alternative classifiers to demonstrate the robustness of our approach. Finally, conclusions and future work are presented in Section 6.

2. Preliminaries

In this section, we introduce fundamental definitions and notation about third-order tensors [34]. Unless otherwise specified, let A R n 1 × n 2 × n 3 and B R n 2 × l × n 3 , with the kth frontal slice (or face) of A denoted by A k = A ( : , : , k ) . We also denote the fast Fourier transform of A along the third dimension as A f = f f t ( A , [ ] , 3 ) and the inverse FFT of A f as i f f t ( A f , [ ] , 3 ) . We define the identity tensor I as the tensor where the first frontal slice is the identity matrix, and all other frontal slices are zeros. Thus, I f = f f t ( I , [ ] , 3 ) is a tensor where each frontal slice is the identity matrix. The zero tensor is denoted by O . For a finite set I, we use | I | to denote its cardinality. We also define [ n ] = { 1 , 2 , , n } .
In addition, we define three commonly used tensor norms. The 1 norm of a tensor A , denoted as | | A | | 1 , is the sum of the absolute values of all its entries. The Frobenius norm of A , denoted as | | A | | F , is the square root of the sum of the squared absolute values of all its entries. This norm is analogous to the Frobenius norm for matrices. The norm of A , denoted as | | A | | , is the maximum absolute value among its entries.
Definition 1 
([34]). The block circulant operator, denoted as bcirc ( · ) , converts a tensor to a block circulant matrix and is defined as follows:
bcirc ( A ) : = A 1 A n 3 A 2 A 2 A 1 A 3 A n 3 A n 3 1 A 1 R n 1 n 3 × n 2 n 3 .
Definition 2 
([34]). The operator unfold(·) and its inversion fold(·) for the conversion between tensors and matrices are defined as
unfold ( A ) = A 1 A 2 A n 3 R n 1 n 3 × n 2 , fold A 1 A 2 A n 3 = A .
Definition 3 
([34]). The block diagonal matrix form of A , denoted by bdiag ( A ) , is defined as a block diagonal matrix with diagonal blocks A 1 , , A n 3 , i.e.,
bdiag ( A ) = A 1 A 2 A n 3 .
Definition 4 
([34]). The t-product of the tensors A and B is defined as
A B : = fold ( bcirc ( A ) unfold ( B ) ) .
The t-product can also be converted to matrix multiplication in the Fourier domain such that C = A B is equivalent to bdiag ( C f ) = bdiag ( A f ) bdiag ( B f ) . Note that, given the appropriate dimensions, A I = I A = A .
Definition 5 
([35]). A tensor A R n × n × n 3 is orthogonal if A * A = A A * = I , for all i 3 [ n 3 ] , and A * is the complex conjugate transpose of A , which is defined as ( A * ) f ( : , : , i 3 ) = ( A f ( : , : , i 3 ) ) * for all i 3 [ n 3 ] .
Definition 6 
([29]). The pseudoinverse (or Moore–Penrose inverse) of the tensor A , denoted by A , is defined by taking the pseudoinverse of each frontal slice in the Fourier domain:
A = i f f t f o l d ( A f ) 1 ( A f ) 2 ( A f ) n 3 , [ ] , 3 .
Definition 7 
([35]). A tensor A is f-diagonal if each frontal slice A ( : , : , i 3 ) is diagonal for all i 3 [ n 3 ] .
Definition 8 
([35]). The tensor Singular Value Decomposition (t-SVD) induced by the t-product is A = U S V * where U R n 1 × n 1 × n 3 and V R n 2 × n 2 × n 3 are orthogonal, and the core tensor S R n 1 × n 2 × n 3 is f-diagonal. Moreover, the t-SVD rank of tensor A is defined as rank t ( A ) = | { i : S ( i , i , 1 ) 0 } | .
Definition 9 
([29]). The multirank of a tensor A is defined as the vector rank m ( A ) = ( r 1 , , r n 3 ) , where r k = rank ( A f ( : , : , k ) ) .
Definition 10 
([29]). Consider a tensor Y R n 1 × n 2 × n 3 . Let I { 1 : n 1 } and J { 1 : n 2 } be index subsets, and define R = Y ( I , : , : ) , C = Y ( : , J , : ) , and U = Y ( I , J , : ) . The t-CUR decomposition of Y is Y ^ = C U R , where U is the pseudoinverse of U as defined in (1) above. Figure 1 depicts this low-rank decomposition, where s r = | I | and s c = | J | .
Definition 11 
([29]). Consider a tensor Y R n 1 × n 2 × n 3 with the multirank rank m ( Y ) = r m and the t-SVD rank rank t ( Y ) = r t . If the index sets I and J, as defined in the t-CUR decomposition in Definition 10, satisfy | I | , | J | r t and the core tensor U in the t-CUR decomposition has multirank rank m ( U ) = r m , then the recovery is exact, i.e., Y = Y ^ = C U R .
Definition 12. 
The proximal operator of 1 p is defined as
prox λ · 1 p ( Z ) = argmin X R n 2 × l × n 3 1 2 X Z F 2 + λ X 1 p ,
where p > 0 is an integer.
Note that the proximal operator prox λ · 1 p ( Z ) when p = 1 , 2 , 3 , 4 can be computed using Algorithms 1 and 2 in [36]. If p = 1 , then this is the soft thresholding operator [37].
In addition, to describe smoothness while preserving edge sharpness, the traditional two-dimensional total variation [14] has been extended to 3D images, such as 3DTV [15] and the spatial–spectral total variation for hyperspectral images [16]. Motivated by our recent work on the power of the 1 norm regularizer [38], we propose a novel generalized 3D total variation regularization to further enhance the sparsity of derivatives.
Definition 13. 
The Generalized 3D Total Variation (G3DTV) regularization of a third-order tensor U is defined as
U G 3 DTV = i = 1 3 D i U 1 p
where p 1 is an integer, and D i represents the finite difference operator along the ith mode.
In our experiments, D i was set as the forward difference operator with Neumann boundary conditions.

3. Proposed Method

Equipped with the basic knowledge of tensors, we present a novel tensor-based hyperspectral band selection approach, which consists of two phases. In Phase 1, we develop a tensor-based model that decomposes the input tensor Y into a low-rank and smooth component B and a sparse component S . In Phase 2, we perform k-means clustering on the frontal slices of B to identify the most important clusters and store their corresponding centroids in an index set Q. The algorithm then outputs the band-selected tensor Y ( : , : , Q ) , and we evaluate the accuracy using two data classification methods, i.e., K-Nearest Neighbors (KNNs) and Support Vector Machine (SVM). The entire pipeline is illustrated in Figure 2. It is worth noting that the use of k-means clustering [39] and performance evaluation through classification algorithms are well-established practices in the band selection literature. In what follows, we will dive into the details of Phase 1 of the algorithm, which is the core of our novel contributions.
As depicted in Figure 2, we begin with an initial hyperspectral tensor Y R n 1 × n 2 × n 3 with n 3 spectral bands, each composed of n 1 × n 2 spatial pixels. We introduce an indicator function to take care of the rank constraint. Let Π = { X R n 1 × n 2 × n 3 | rank t ( X ) r } . Then, we introduce an indicator function χ Π defined as χ Π ( X ) = 0 if X Π and as + if otherwise. According to the assumption that the most informative bands are able to preserve spectral relationships and smooth transitions across neighboring bands, we decompose Y into a sum of a spatial–spectral smooth tensor B , which is low-rank and a sparse tensor S via the following model:
min B , S 1 2 B + S Y F 2 + λ 1 S 1 q + λ 2 B G 3 D T V + χ Π ( B ) ,
where q 1 is an integer. The positive parameter λ 1 is the regularization parameter for controlling the sparsity of the outlier tensor S , and λ 2 > 0 controls the spatial–spectral smoothness of the tensor B . Here, the third term is the G3DTV of B as defined in Definition 13, i.e.,  B G 3 D T V = i = 1 3 D i B 1 p . The parameter p in the G3DTV and q in the sparsity regularizer are positive integers. Our numerical experiments have shown that p = 2 and q = 1 lead to the best performance. In order to apply the ADMM framework to minimize (4), we introduce the auxiliary variables X i and rewrite (4) as
min B , S , X i 1 2 B + S Y F 2 + λ 1 S 1 q + λ 2 i = 1 3 X i 1 p + χ Π ( B ) s . t . X i = D i B for i = 1 , 2 , 3 .
Then, the augmented Lagrangian reads as
L = 1 2 B + S Y F 2 + λ 1 S 1 q + χ Π ( B ) + λ 2 i = 1 3 X i 1 p + β 2 i = 1 3 D i B X i + X ˜ i F 2 .
Here, X ˜ i are dual variables, and β > 0 is the penalty parameter. Applying the ADMM algorithm leads to solving three types of subproblems in each iteration. That is, by minimizing L with respect to B , S , and X i at each iteration, we obtain the following algorithm:
B argmin B 1 2 B + S Y F 2 + χ Π ( B ) + β 2 i = 1 3 D i B X i + X ˜ i F 2 ; S argmin S 1 2 B + S Y F 2 + λ 1 S 1 q ; X i argmin X i λ 2 X i 1 p + β 2 D i B X i + X ˜ i F 2 ; X ˜ i X ˜ i + D i B X i .
Here, X i and X ˜ i are updated for i = 1 , 2 , 3 . The B subproblem is essentially a rank-constrained least-squares problem which has no closed-form solution. Inspired by the fast gradient descent method in robust PCA [27] and its variants [13,28], we employ gradient descent while maintaining the rank constraint. At the jth iteration, we denote the respective estimate of B , S , X i from the previous iteration by B j , S j , X i j and then define the objective function f of the B subproblem without the indicator function as follows:
f ( B ) : = 1 2 B + S j Y F 2 + β 2 i = 1 3 i B X i j + X ˜ i j F 2 .
Then, B is updated via B j + 1 = argmin rank t ( B ) r f ( B ) . To handle the rank constraint, we employ the t-CUR decomposition [29] instead of the skinny t-SVD to reduce computational costs, especially for large tensors. By applying gradient descent with step size τ > 0 , we obtain the updating scheme for B as B j + 1 = B j τ f ( B j ) , where the gradient is
f ( B j ) = B j + S j Y + β i = 1 3 i T ( i B j X i j + X ˜ i j ) .
Then, the factor tensors in the t-CUR decomposition of B j + 1 are updated as
C j + 1 = C j τ f ( B j ) ( : , J , : ) , R j + 1 = R j τ f ( B j ) ( I , : , : ) , U j + 1 = 1 2 ( C j + 1 + R j + 1 ) .
Here, I and J are the respective row and column index sets. Based on the Definition 10, we update B as
B j + 1 = C j + 1 ( U j + 1 ) R j + 1 .
Both the S subproblem and the X i subproblem have closed-form solutions using the proximal operators in the Definition 12. By fixing other variables in the respective S subproblem and X i subproblem in (5), we update S and X i as
S j + 1 = prox λ 1 · 1 q ( Y B j + 1 ) ,
X i j + 1 = prox λ 2 β · 1 p ( D i B j + 1 + X ˜ i j ) , i = 1 , 2 , 3 .
The algorithm terminates when the convergence condition is met, i.e., the maximum absolute error between two successive solutions falls below a tolerance
B j + 1 B j < ε ,
where ε is a predefined tolerance, e.g., 10 5 for our experiments. While other band selection methods may use a combination of multiple convergence criteria, such as changes in the objective function value or gradient norms, we stick to this single criterion to save computational time and simplify the procedure. Finally, we apply a classifier such as k-means on the frontal slices of B ˜ to find the desired k clusters. The fiber indices of the bands closest to the cluster centroids are stored in the set Q. Thus, the corresponding bands from the original tensor Y constitute the desired subset of bands.
The main computational cost of Phase 1 in Algorithm 1 is due to the update of B using the t-product. The per-iteration complexity is O ( n 1 n 2 n 3 log ( n 3 ) + n 1 s r s c n 3 ) . When s c s r n 2 , the term O ( n 1 n 2 n 3 log ( n 3 ) ) dominates the complexity. This creates a computational trade-off when applying the t-CUR decomposition, as selecting larger values for s c and s r leads to increased computational complexity. In Phase 2, the application of k-means clustering requires O ( T k n 1 n 2 n 3 k ) operations, provided we have a fixed number of T k iterations in the k-means clustering. The complexity of this step is directly proportional to the number of selected bands k and the dimensions of the hyperspectral image. Therefore, the overall computational complexity of the algorithm is O ( n 1 n 2 n 3 log ( n 3 ) + T k n 1 n 2 n 3 k ) .
Algorithm 1 Hyperspectral Band Selection Based on Tensor CUR Decomposition (THBSCUR)
  • Input:  Y R n 1 × n 2 × n 3 , maximum number of iterations T, number of sampled rows and columns s r and s c , number of desired bands k, step size τ , parameters λ 1 , λ 2 , β , and tolerance ε
  • Output: The index Q of the desired band set.
  • 1. Optimize the model in (4) using ADMM:
  • Initialize:  C 0 , U 0 , R 0 , B 0 , S 0 , X i 0 = O
  • for  j = 0 , 1 , 2 , 3 , T 1   do
  •     Update C j + 1 , U j + 1 , and R j + 1 as in (6)
  •     Update B j + 1 as in (7)
  •     Update S j + 1 by solving (8)
  •     Update X i j + 1 by solving ()
  •     Update X ˜ i j + 1 = X ˜ i j + D i B j + 1 X i j + 1 , i = 1 , 2 , 3
  •     Check the convergence condition (10)
  •     if converged then
  •         Exit and set B ˜ = B j + 1
  •     end if
  • end for
  • Set B ˜ = B T + 1
  • 2. Cluster frontal slices of B ˜ using a clustering method such as k-means or spectral clustering to find the index set Q which indicates the bands closest to the k cluster centroids.

4. Experimental Data and Results

4.1. Experimental Setup

In our numerical experiments, we evaluated the proposed method using three publicly available HSI datasets visualized in Figure 3: Indian Pines, Salinas-A, and Pavia University (The data are publicly available at the following link: https://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes, accessed on 18 December 2024). To assess the effectiveness of our approach, we compared it with several other state-of-the-art band selection methods, including E-FDPC [6], SR-SSIM [9], FNGBS [8], MGSR [10], MHBSCUR [13], and TGSR [11]. The memory complexity of the TGSR algorithm is O ( V n 1 2 n 2 2 + V n 1 n 2 n 3 + n 3 2 ) , where V refers to the number of unique superpixel regions. This reflects the dependence of the algorithm on both spatial dimensions and spectral bands, with a significant contribution from the quadratic scaling of pixel interactions in each superpixel region. In contrast, the proposed THBSCUR algorithm has a complexity of O ( n 1 n 2 n 3 + r s c s n 3 + k n 1 n 2 ) , which simplifies to O ( n 1 n 2 n 3 ) when considering the dominant factor of tensor dimensions. The memory complexity of MHBSCUR is O ( n 1 n 2 n 3 + n 3 2 + n 1 2 n 2 2 ) , where the Laplacian graph computation dominates the complexity, but efficient data structures reduce the practical memory usage. The MGSR algorithm has a memory complexity of O ( n 1 2 n 2 2 + n 3 2 + n 1 n 2 n 3 ) , highlighting its quadratic scaling in the spatial dimensions and its dependence on both the spectral and spatial features. E-FDPC and FNGBS both exhibit a simpler memory complexity of O ( n 3 2 ) , indicating that their performance is more dependent on the spectral dimension. Similarly, the complexity of the SR-SSIM algorithm, O ( n 3 2 + n 1 n 2 n 3 ) , is influenced by both spectral and spatial features, with the spectral component dominating.
For Algorithm 1, the selection of specific rows and columns was randomly selected at the beginning of each trial and then fixed throughout the algorithm. This selection was implemented using a random permutation, where the Mersenne Twister generator seed [40] was initialized to 1 at the start of each trial. For each dataset, classification approach, and targeted number of bands, we fine-tuned the parameters λ 1 , λ 2 , β , and τ using Bayesian Optimization (BO). Refer to Section 5 for further discussion on parameter tuning.
We evaluated the performance of these methods through classification tests using SVM and KNN, measuring their effectiveness with the Overall Accuracy (OA) metric, which is defined as the ratio of correctly classified samples to the total number of samples in the test set. For the classification setup, we randomly selected 90% of the samples from each dataset for training and the remaining 10% for testing. To reduce the randomness effect, each classification test was repeated 50 times with different training data. The number of bands tested ranged from 3 to 30, increasing in increments of three.
All numerical experiments were conducted using MATLAB 2023b on a desktop computer equipped with an Intel i7-1065G7 CPU, 12GB RAM, and running Windows 11.

4.2. Experiment 1: Indian Pines

In the first experiment, we tested the Indian Pines dataset, acquired by the Airborne Visible InfraRed Imaging Spectrometer (AVIRIS) sensor in Northwestern Indiana, which contains 145 × 145 pixels, 200 spectral bands, and 16 distinct classes. Figure 4 and Figure 5 plot the OA curves produced by SVM and KNNs, respectively, for this dataset. In general, as the number of selected bands increased, the overall accuracy for all methods plateaued. The proposed THBSCUR outperformed the state-of-the-art methods in terms of its OA when SVM or KNNs was used to classify the results for a high number of bands. In this case, SVM also produced slightly higher classification accuracy than KNNs. The average running times for each method in seconds averaged over the number of selected bands k are 0.0382 for E-FDPC, 0.1417 for FNGBS, 30.1334 for SR-SSIM, 15.3760 for MHBSCUR, 19.7766 for MGSR, 32.2180 for TGSR, and 169.5353 for the proposed method. One can see that while our method required a longer running time, our BO-optimized THBSCUR algorithm achieved comparable or superior classification accuracy across different datasets and classifiers. The consistent improvement in performance observed with our method, especially when considering the optimization across various numbers of selected bands, highlights its robustness and effectiveness. This competitive edge against state-of-the-art methods like TGSR further validates the significance of our approach.

4.3. Experiment 2: Salinas-A

In the second experiment, we tested the Salinas-A dataset, which is a subset of a larger image captured by the AVIRIS sensor in California, consisting of 86 × 83 pixels, 204 bands, and six classes. Figure 6 and Figure 7 plot the OA curves produced by SVM and KNNs, respectively, for the Salinas-A dataset. The average running times for each method in seconds averaged over the number of selected bands k are 0.0107 for E-FDPC, 0.0502 for FNGBS, 22.673 for SR-SSIM, 6.1636 for MHBSCUR, 5.9491 for MGSR, 39.2225 for TGSR, and 75.1455 for the proposed method. Despite a longer running time, the proposed method exhibited consistently superior performance with both the SVM and KNNs classifiers.

4.4. Experiment 3: Pavia University

In the third experiment, we evaluated the Pavia University dataset, which was acquired by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor over the urban area of Pavia, Italy. This dataset consists of 610 × 340 pixels with 103 spectral bands. The Pavia University scene is characterized by its complex urban environment, featuring nine distinct land-cover classes. This dataset is particularly challenging due to its urban complexity and the spectral similarity between some of the classes, such as asphalt and bitumen. The high spatial resolution combined with the diverse urban classes makes Pavia University a valuable benchmark for evaluating hyperspectral image classification algorithms in urban environments.
Figure 8 and Figure 9 plot the OA curves produced by SVM and KNNs, respectively. The average running times for each method in seconds averaged over the number of selected bands k are 0.2537 for E-FDPC, 0.9218 for FNGBS, 138.8463 for SR-SSIM, 3.3029 × 10 3 for MHBSCUR, 56.0402 for TGSR, and 654.5938 for the proposed method. Overall, THBSCUR performed comparably or outperformed the other methods, especially when the number of selected bands was low. The MGSR method was excluded here due to computational limitations. Specifically, the construction of the adjacency matrix during the preprocessing stage of MGSR led to an out-of-memory error, making it infeasible for this experiment. Meanwhile, this highlights the computational challenges that can arise when dealing with large, high-dimensional hyperspectral datasets.

5. Discussion

In this section, we discuss key aspects of the proposed THBSCUR algorithm that may impact its performance in hyperspectral band selection. We begin by providing detailed guidance on parameter selection using both a traditional grid search method and Bayesian Optimization. We offer insights into optimal parameter ranges for different datasets and discuss the impact of noise on these choices. Then we show the performance of our algorithm in the noisy setting, providing recommendations for parameter adjustments. Finally, we extend our analysis to include additional classifiers, such as the convolutional neural network (CNN) [41], to further validate the effectiveness of our method across different classification approaches in calculating overall accuracy. Throughout this discussion, we aim to provide a comprehensive understanding of the proposed algorithm, practical implementation considerations, and the robustness across various scenarios and classification techniques.

5.1. Parameter Selection via Bayesian Optimization

In the previous sections, we presented the numerical experiment results for the proposed THBSCUR method, which were obtained using Bayesian Optimization [42] for parameter tuning. In this section, we will delve deeper into the details of this optimization process and compare it to the exhaustive grid search commonly used in practice.
Bayesian Optimization-based parameter selection is a powerful and efficient strategy for tuning parameters in unsupervised algorithms. It is particularly useful in machine learning, where the objective function can be the performance of a model, such as loss or accuracy, and the parameters to be optimized may include regularization coefficients and learning rates. Unlike a grid search which evaluates the objective function exhaustively, Bayesian Optimization uses a probabilistic model and acquisition function to explore the parameter space by prioritizing the most promising regions based on prior evaluations, often leading to faster convergence.
In our approach, the parameters to be optimized are β , λ 1 , λ 2 , and τ , which are critical to the performance of THBSCUR. Next, we consider a Bayesian Optimization problem that maximizes the overall classification accuracy of the THBSCUR-based feature selection on hyperspectral data. That is, we solve the following problem:
{ β * , λ 1 * , λ 2 * , τ * } = argmax ( β , λ 1 , λ 2 , τ ) Γ OA ( β , λ 1 , λ 2 , τ , k ) ,
where the parameter space Γ = [ a , b ] × [ c , d ] × [ e , f ] × [ g , h ] (refer to Section 5.2 for more details), k represents the number of spectral bands selected, and OA is defined in Section 4.1. Since there is no analytical expression for the objective function, a probabilistic surrogate model is used to approximate the unknown OA for various combinations of parameters. The surrogate model is a Gaussian Process (GP), which is fully specified by a mean function and a covariance function. The GP model captures both the estimated value of the objective function, which in this case is the overall accuracy, and the uncertainty around this estimate.
The next step in Bayesian Optimization involves strategically selecting the next set of parameters to evaluate. This is accomplished by minimizing an acquisition function, which balances the exploration of uncertain areas with the exploitation of promising regions. Common choices for the acquisition function include the Expected Improvement (EI), Probability of Improvement (PI), and Lower Confidence Bound (LCB).
In our experiments, we used the “bayesopt” function in MATLAB to implement Bayesian Optimization and adopted the “expected-improvement-plus” acquisition function, which is an enhanced version of the EI to prevent the overexploitation of some areas.

5.2. Optimal Parameters for Grid Search and Bayesian Optimization

In this section, we discuss optimal parameters for each test dataset using grid search and Bayesian Optimization, along with general guidelines. Before delving into specific tuning processes, it is important to understand the key parameters in the tensor CUR decomposition that influence the performance of our band selection method.
In the tensor CUR decomposition, the numbers of selected rows and columns, denoted as s c and s r , are key factors influencing both approximation quality and computational efficiency. There are a few common approaches to determining these parameters. One simple approach is to choose a target rank k, i.e., the number of bands, for the approximation. Another approach, which we adopt here, is to choose s c and s r as small multiples of k as O ( k log ( n 1 n 2 ) ) and O ( k log ( n 2 ) ) , respectively [43]. Throughout our experiments, we set s c and s r to round ( k log ( n 1 n 2 ) ) and round ( k log ( n 2 ) ) , respectively. The logarithmic factor provides a slight oversampling beyond the rank k, which helps to ensure that enough information is captured to accurately represent the k-dimensional subspace of the data. This oversampling accounts for potential noise or minor variations in the data that might not be captured by selecting exactly k columns and rows.
For the THBSCUR algorithm, we compared two methods of parameter tuning for the parameters λ 1 , λ 2 , β , and τ : grid search and BO. The ranges for the grid search approach were defined as follows:
  • λ 1 , λ 2 { 10 4 , 10 3 , 10 2 , 10 1 , 1 , 10 , 10 2 , 10 3 } ;
  • β , τ { 10 1 , 1 , 10 , 10 2 } .
As the number of selected bands increases, the optimal choices for λ 1 and λ 2 become stable, while β and τ require more careful tuning. We suggest starting with β and τ set to 1 and then fine-tuning based on the specific dataset. Importantly, there exist multiple combinations of parameter choices that may lead to optimal overall accuracy. It is worth noting that the optimal parameters found through grid search may differ from those identified using Bayesian Optimization, as the latter explores the parameter space more efficiently and can potentially discover better combinations.
The optimal parameters for the SVM classifier on the Indian Pines dataset, obtained by grid search and presented in Table 1, reveal distinct patterns across different numbers of selected bands. The parameter β demonstrated the highest variability, with a Standard Deviation (SD) of 40.9585 , indicating that the algorithm is particularly sensitive to the number of selected bands.
The parameter λ 1 remained constant at 0.0001 for all k values, indicating that the sparsity control for one aspect of the decomposition is consistent across different band selection scenarios. In contrast, λ 2 showed some variation, with a standard deviation of 3.1561 × 10 2 , primarily due to a single outlier value of 0.1 at k = 30 . The parameter τ , which governs the step size in the B update, displayed considerable variability with a standard deviation of 4.6126 , taking on values of 0.1 , 1, and 10 across different k values.
These results indicate that grid search can identify discrete, widely spaced optimal values for β and τ while maintaining more consistent values for λ 1 and λ 2 (with one exception). This discretization is a characteristic of the grid search method, which may miss the fine-grained optimal parameter values that could exist between the tested grid points. The high variability in β and τ highlights the need of careful parameter tuning in the THBSCUR algorithm for different band selection scenarios.
To allow for a more refined exploration of the parameter space, we additionally employed BO to further refine the parameter selection process for the THBSCUR algorithm. In our implementation, the ranges for the four optimizable variables were defined as follows:
  • β , τ [ 0.1 , 10 ] ;
  • λ 1 , λ 2 [ 10 4 , 0.01 ] .
These ranges were chosen based on insights gained from the initial grid search on the THBSCUR algorithm. The Bayesian Optimization approach complements grid search by providing a more refined exploration of the parameter space, uncovering optimal parameter combinations that were missed in the discrete grid search. For implementation details, refer to Section 5.1.
The optimal parameters for the SVM classifier on the Indian Pines dataset, as shown in Table 2, reveal interesting patterns across different numbers of selected bands. The parameter β exhibited the highest variability, with a standard deviation of 3.8651 . The parameters λ 1 and λ 2 , which control the sparsity of the decomposition, showed relatively consistent values across different k, with means of 4.6 × 10 3 and 3.5 × 10 3 , respectively. The parameter τ , governing the threshold in the iterative process, also displayed high variability with a standard deviation of 3.8816 . This variability in β and τ indicates that these parameters are particularly sensitive to the number of selected bands, while the sparsity-controlling parameters λ 1 and λ 2 remain more stable.
Figure 10 and Figure 11 demonstrate the success of the BO results with the THBSCUR algorithm applied to the Indian Pines dataset. The graphs show a consistent improvement in the classification accuracy across all numbers of selected bands when compared to grid search. This improvement is evident for both the SVM and KNNs classifiers, with BO-optimized parameters yielding higher accuracy levels regardless of the number of bands selected. This consistent superiority of the BO-tuned method underscores its effectiveness in fine-tuning the THBSCUR algorithm parameters, leading to more robust and accurate hyperspectral band selection across various scenarios.

5.3. Effects of Noise

In addition to the optimal parameter ranges outlined in Table 1 and Table 2, it is important to consider the effects of noise on parameter selection for THBSCUR. Hyperspectral data are often polluted by noise, commonly modeled as Gaussian noise, which can significantly impact the performance of band selection algorithms. To address this, we applied parameter adjustments to accommodate the deviations from the original signal introduced by the noise.
In scenarios where Gaussian noise with varying standard deviation levels is introduced into the dataset, we propose the following guidelines for selecting 15 bands from the Indian Pines dataset. The parameters λ 1 and λ 2 , which control the trade-off between the sparsity term and the G3DTV regularization term, may require adjustment to mitigate the effects of noise. For noisy cases, β = 1 and τ = 10 4 consistently yield optimal results across different noise levels. Notably, for each noise level, multiple combinations of λ 1 and λ 2 can lead to optimal outcomes. Tuning λ 2 is particularly critical, as optimal results can still be achieved with varying λ 1 values. The optimal parameters for σ = 1 , 2 , 3 , 4 , 5 , 6 are presented in Table 3.

5.4. Method Selection for Phase 2

In Phase 2 of our algorithms, we applied the k-means clustering to group the spectral bands and ultimately generate a final index set of k representative bands. This process begins with the refined band representation B ˜ obtained in the previous phase and treating each frontal slice of B ˜ as a data point for further classification. The k-means algorithm begins by randomly choosing k cluster centers from the band data. Each band is then assigned to the nearest cluster center, and the cluster centers are updated as the mean of all bands assigned to each cluster. This iterative process continues until convergence, resulting in a set of representative bands that capture the representative information from the original spectral data. By default, the k-means algorithm in MATLAB uses Euclidean distance. However, we opted to use Manhattan distance instead, as the Euclidean distance-based approach may fail to converge in our experiments. This modification significantly improved the robustness and stability of the clustering process, particularly because the Manhattan distance is more robust in the presence of outliers [44,45].
To evaluate the stability and robustness of the k-means clustering process, we conducted 50 trials with random centroid initializations and visualized the results using an envelope plot in Figure 12. This plot highlights the variability in performance by displaying the maximum, minimum, and mean SVM accuracies across all trials for the Indian Pines dataset. The shaded region between the max and min values represents the range of possible outcomes, while a dashed line indicates the mean accuracy. In our experiments, the envelope plot consistently demonstrated a narrow band, signifying minimal variation across trials and showcasing the stability of our k-means-based band selection.
While k-means is effective for hyperspectral band selection, there are several alternative clustering methods that could be considered for band selection. One advantage of k-means is its relative high speed, making it an efficient choice for clustering in this context; alternative methods may not offer the same level of computational efficiency. To explore the effectiveness of other clustering approaches, we implemented and tested three additional methods: spectral clustering, fuzzy c-means [46], and Density-based Spatial Clustering of Applications with Noise (DBSCAN) [47] using MATLAB implementations. Table 4 details the computational complexity of the methods mentioned in this section.
We first implemented spectral clustering using the “spectralcluster” function in MATLAB, which leverages graph-based techniques to partition the spectral bands. Spectral clustering transforms the data into a lower-dimensional space using the eigenvalues of a similarity matrix, capturing complex relationships between bands that may not be evident in the original high-dimensional space. When applied to hyperspectral band selection, spectral clustering produced comparable results compared to the k-means approach. Figure 13 demonstrates the performance of the spectral clustering method as opposed to the k-means method for the Indian Pines dataset.
The fuzzy c-means algorithm, which allows for soft cluster assignments, showed moderate performance. With optimized parameters, it achieved an overall accuracy of 0.6 for the Indian Pines dataset using an SVM classifier to assess the performance. This result, while not surpassing our k-means approach, demonstrates that fuzzy clustering could be a viable alternative for hyperspectral band selection.
In contrast, the DBSCAN algorithm struggled to perform well in this context. Despite its ability to discover clusters, DBSCAN only averaged about 0.2 for overall accuracy across our experiments. This poor performance can be attributed to several factors, including the high-dimensional nature of hyperspectral data and the sensitivity of the algorithm to its input parameters (epsilon and minimum points). The density-based approach of DBSCAN may not be well suited to the spectral characteristics of hyperspectral bands, where the concept of density in high-dimensional space becomes less intuitive.
These results underscore the effectiveness of our k-means-based approach for hyperspectral band selection. While fuzzy c-means shows potential with moderate performance, and spectral clustering demonstrates competitive results, the significant underperformance of DBSCAN highlights the challenges of applying density-based clustering to this specific problem domain. Our findings suggest that centroid-based clustering methods such as k-means and fuzzy c-means, along with graph-based clustering methods like spectral clustering, are well suited for hyperspectral band selection tasks.

5.5. Classifier Selection for Evaluation

In addition to SVM and KNNs, we also calculated the overall accuracy with CNN as the classifier. In this experiment, we evaluated the performance of the previously considered methods on the Indian Pines dataset. Figure 14 plots the resulting OA curves. For the Indian Pines dataset, it is difficult to distinguish the performance of the different methods, as the accuracy curves are closely clustered. However, the proposed method demonstrated high overall accuracy across different numbers of selected bands as compared to the state-of-the-art methods. Interestingly, CNN classification in this case reported lower overall accuracy than SVM or KNNs classification for the same selected bands.
The CNN network utilized here consists of two convolutional layers with 6 and 16 filters respectively, with each followed by ReLU activation, batch normalization, and max pooling. These layers extract hierarchical spectral features from the input bands. The convolutional layers are followed by three fully connected layers of sizes 120, 84, and k (number of bands), with ReLU activations between them. The final layer uses softmax activation for multiclass classification. The network was trained using the Adam optimizer for 25 epochs, with a learning rate schedule that dropped by a factor of 0.1 every 15 epochs. After training, the network classified the test data, and the overall accuracy was calculated as the proportion of correctly classified samples.
It is worth noting that while CNN demonstrated lower accuracy than SVM and KNNs, it still yielded decent results. This performance could potentially be improved through further optimization of the network architecture and training process.
Other classifiers that have been used to evaluate the performance of hyperspectral band selection in the literature include Classification and Regression Trees (CART) and Linear Discriminant Analysis (LDA) [48,49,50]. CART is a non-parametric decision tree learning technique that produces either classification or regression trees depending on whether the dependent variable is categorical or numeric [51]. It recursively partitions the feature space into subsets where the instances share similar values of the target variable. LDA is a method used to find a linear combination of features that characterizes or separates two or more classes of objects or events [52]. The resulting combination may be used as a linear classifier or for dimensionality reduction before later classification.
Both CART and LDA have been applied to hyperspectral data classification tasks to assess the effectiveness of band selection methods, as they offer different approaches to the classification problem and can provide insights into the discriminative power of the selected spectral bands. Ultimately, we chose to report results from SVM and KNNs due to their robust performance across various datasets and their ability to achieve consistent performance evaluation without extensive hyperparameter tuning or architectural modifications.

6. Conclusions

The high dimensionality and redundancy inherent in hyperspectral imaging data demand effective band selection strategies to ensure computational efficiency and data interpretability. This work introduces a novel tensor-based band selection approach that leverages G3DTV regularization to preserve spatial–spectral smoothness while employing tensor CUR decomposition to enhance processing efficiency and maintain low rankness. Our method stands out by preserving the tensor structure of HSI data, thereby avoiding inefficient conversions between tensors and matrices, and directly addresses spectral redundancy in the tensor form. This approach maintains the multidimensional integrity of the data, making it well suited for a wide range of datasets encountered in practical applications.
In our discussion, we emphasize several key contributions. We highlight the effectiveness of Bayesian Optimization for parameter tuning, which significantly improves the robustness and performance of our method. In addition, we underscore the importance of careful parameter selection, providing insights into how different parameters impact algorithm performance. We also address the challenges posed by noisy HSI data, proposing modifications to enhance resilience in real-world scenarios. Furthermore, we explore alternative methods for the second phase of our algorithm, which offers potential avenues for further performance improvements.
In future work, we aim to develop an accelerated version of our algorithm to further improve its efficiency. We also plan to investigate various importance sampling schemes within the t-CUR decomposition and examine how gradient tensor sparsity affects band selection accuracy.

Author Contributions

Conceptualization, J.Q.; methodology, J.Q.; software, K.H.; validation, K.H.; formal analysis, K.H.; investigation, K.H.; data curation, K.H.; writing—original draft preparation, K.H.; writing—review and editing, K.H. and J.Q.; supervision, J.Q.; project administration, J.Q.; funding acquisition, J.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research is partially funded by the NSF grant DMS-1941197.

Data Availability Statement

The datasets analyzed in this study are publicly available and can be accessed through the following link: https://www.ehu.eus/ccwintco/index.php, accessed on 18 December 2024. These datasets are widely used benchmarks in hyperspectral image analysis and are freely available for research purposes. No new data were created in this study. Demo codes for our algorithm are available at https://github.com/khenneberger/THBSCUR, accessed on 18 December 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Bellman, R. Dynamic Programming; Press Princeton: Princeton, NJ, USA, 1957; Volume 39. [Google Scholar]
  2. Sun, W.; Du, Q. Hyperspectral band selection: A review. IEEE Geosci. Remote Sens. Mag. 2019, 7, 118–139. [Google Scholar] [CrossRef]
  3. Esmaeili, M.; Abbasi-Moghadam, D.; Sharifi, A.; Tariq, A.; Li, Q. Hyperspectral image band selection based on CNN embedded GA (CNNeGA). IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 1927–1950. [Google Scholar] [CrossRef]
  4. Feng, J.; Li, D.; Gu, J.; Cao, X.; Shang, R.; Zhang, X.; Jiao, L. Deep reinforcement learning for semisupervised hyperspectral band selection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5501719. [Google Scholar] [CrossRef]
  5. Chang, C.I.; Du, Q.; Sun, T.L.; Althouse, M.L. A joint band prioritization and band-decorrelation approach to band selection for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 1999, 37, 2631–2641. [Google Scholar] [CrossRef]
  6. Jia, S.; Tang, G.; Zhu, J.; Li, Q. A novel ranking-based clustering approach for hyperspectral band selection. IEEE Trans. Geosci. Remote Sens. 2015, 54, 88–102. [Google Scholar] [CrossRef]
  7. Wang, Q.; Zhang, F.; Li, X. Hyperspectral band selection via optimal neighborhood reconstruction. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8465–8476. [Google Scholar] [CrossRef]
  8. Wang, Q.; Li, Q.; Li, X. A fast neighborhood grouping method for hyperspectral band selection. IEEE Trans. Geosci. Remote Sens. 2021, 59, 5028–5039. [Google Scholar] [CrossRef]
  9. Xu, B.; Li, X.; Hou, W.; Wang, Y.; Wei, Y. A similarity-based ranking method for hyperspectral band selection. IEEE Trans. Geosci. Remote Sens. 2021, 59, 9585–9599. [Google Scholar] [CrossRef]
  10. Zhang, Y.; Wang, X.; Jiang, X.; Zhou, Y. Marginalized Graph Self-Representation for Unsupervised Hyperspectral Band Selection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5516712. [Google Scholar] [CrossRef]
  11. Zhang, Y.; Qi, J.; Wang, X.; Cai, Z.; Peng, J.; Zhou, Y. Tensorial Global-Local Graph Self-Representation for Hyperspectral Band Selection. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 13213–13225. [Google Scholar] [CrossRef]
  12. Sun, W.; Du, Q. Graph-regularized fast and robust principal component analysis for hyperspectral band selection. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3185–3195. [Google Scholar] [CrossRef]
  13. Henneberger, K.; Huang, L.; Qin, J. Hyperspectral Band Selection Based on Matrix CUR Decomposition. In Proceedings of the IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 16–21 July 2023; pp. 7380–7383. [Google Scholar]
  14. Rudin, L.I.; Osher, S.; Fatemi, E. Nonlinear total variation based noise removal algorithms. Phys. Nonlinear Phenom. 1992, 60, 259–268. [Google Scholar] [CrossRef]
  15. Persson, M.; Bone, D.; Elmqvist, H. Total variation norm for three-dimensional iterative reconstruction in limited view angle tomography. Phys. Med. Biol. 2001, 46, 853. [Google Scholar] [CrossRef]
  16. Aggarwal, H.K.; Majumdar, A. Hyperspectral image denoising using spatio-spectral total variation. IEEE Geosci. Remote Sens. Lett. 2016, 13, 442–446. [Google Scholar] [CrossRef]
  17. Qin, J.; Yanovsky, I. Robust super-resolution image reconstruction method for geometrically deformed remote sensing images. In Proceedings of the 2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 8050–8053. [Google Scholar]
  18. Iordache, M.D.; Bioucas-Dias, J.M.; Plaza, A. Total variation spatial regularization for sparse hyperspectral unmixing. IEEE Trans. Geosci. Remote Sens. 2012, 50, 4484–4502. [Google Scholar] [CrossRef]
  19. Feng, X.R.; Li, H.C.; Li, J.; Du, Q.; Plaza, A.; Emery, W.J. Hyperspectral unmixing using sparsity-constrained deep nonnegative matrix factorization with total variation. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6245–6257. [Google Scholar] [CrossRef]
  20. Qin, J.; Lee, H.; Chi, J.T.; Drumetz, L.; Chanussot, J.; Lou, Y.; Bertozzi, A.L. Blind hyperspectral unmixing based on graph total variation regularization. IEEE Trans. Geosci. Remote Sens. 2020, 59, 3338–3351. [Google Scholar] [CrossRef]
  21. Bredies, K.; Kunisch, K.; Pock, T. Total generalized variation. Siam J. Imaging Sci. 2010, 3, 492–526. [Google Scholar] [CrossRef]
  22. Hu, Y.; Jacob, M. Higher degree total variation (HDTV) regularization for image recovery. IEEE Trans. Image Process. 2012, 21, 2559–2571. [Google Scholar] [CrossRef]
  23. Henneberger, K.; Qin, J. Hyperspectral Band Selection Based on Generalized 3DTV and Tensor CUR Decomposition. In Proceedings of the Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 27–30 October 2024. [Google Scholar]
  24. Glowinski, R.; Marroco, A. Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. Rev. Française d’automatique Inform. Rech. Opérationnelle. Anal. Numérique 1975, 9, 41–76. [Google Scholar] [CrossRef]
  25. Gabay, D.; Mercier, B. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 1976, 2, 17–40. [Google Scholar] [CrossRef]
  26. Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers; Foundations and Trends® in Machine Learning; Now Publishers Inc.: Delft, The Netherlands, 2011; Volume 3, pp. 1–122. [Google Scholar]
  27. Yi, X.; Park, D.; Chen, Y.; Caramanis, C. Fast algorithms for robust PCA via gradient descent. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
  28. Cai, H.; Huang, L.; Li, P.; Needell, D. Matrix completion with cross-concentrated sampling: Bridging uniform sampling and CUR sampling. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10100–10113. [Google Scholar] [CrossRef]
  29. Chen, J.; Wei, Y.; Xu, Y. Tensor CUR decomposition under T-product and its perturbation. Numer. Funct. Anal. Optim. 2022, 43, 698–722. [Google Scholar] [CrossRef]
  30. Mahoney, M.W.; Maggioni, M.; Drineas, P. Tensor-CUR decompositions for tensor-based data. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 20–23 August 2006; pp. 327–336. [Google Scholar]
  31. Caiafa, C.F.; Cichocki, A. Generalizing the column–row matrix decomposition to multi-way arrays. Linear Algebra Its Appl. 2010, 433, 557–573. [Google Scholar] [CrossRef]
  32. Cai, H.; Hamm, K.; Huang, L.; Needell, D. Mode-wise tensor decompositions: Multi-dimensional generalizations of CUR decompositions. J. Mach. Learn. Res. 2021, 22, 1–36. [Google Scholar]
  33. Cai, H.; Chao, Z.; Huang, L.; Needell, D. Fast robust tensor principal component analysis via fiber CUR decomposition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 189–197. [Google Scholar]
  34. Kilmer, M.E.; Martin, C.D. Factorization strategies for third-order tensors. Linear Algebra Its Appl. 2011, 435, 641–658. [Google Scholar] [CrossRef]
  35. Qin, W.; Wang, H.; Zhang, F.; Wang, J.; Luo, X.; Huang, T. Low-rank high-order tensor completion with applications in visual data. IEee Trans. Image Process. 2022, 31, 2433–2448. [Google Scholar] [CrossRef]
  36. Prater-Bennette, A.; Shen, L.; Tripp, E.E. A constructive approach for computing the proximity operator of the p-th power of the 1 norm. Appl. Comput. Harmon. Anal. 2023, 67, 101572. [Google Scholar] [CrossRef]
  37. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  38. Henneberger, K.; Qin, J. Power of _1-Norm Regularized Kaczmarz Algorithms for High-Order Tensor Recovery. arXiv 2024, arXiv:2405.08275. [Google Scholar]
  39. Macqueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5-th Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Berkeley, CA, USA, 1967. [Google Scholar]
  40. MathWorks. Random Number Generator (Mersenne Twister). 2015. Available online: https://www.mathworks.com/help/matlab/ref/rng.html (accessed on 18 December 2024).
  41. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  42. Snoek, J.; Larochelle, H.; Adams, R.P. Practical bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2012; Volume 25. [Google Scholar]
  43. Bien, J.; Xu, Y.; Mahoney, M.W. CUR from a sparse optimization viewpoint. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2010; Volume 23. [Google Scholar]
  44. Alliney, S. Digital filters as absolute norm regularizers. IEEE Trans. Signal Process. 1992, 40, 1548–1562. [Google Scholar] [CrossRef]
  45. Chan, T.F.; Esedoglu, S. Aspects of total variation regularized L1 function approximation. Siam J. Appl. Math. 2005, 65, 1817–1837. [Google Scholar] [CrossRef]
  46. Bezdek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms; Springer Science & Business Media: Berlin, Germany, 2013. [Google Scholar]
  47. Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the KDD’96: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; Volume 96, pp. 226–231. [Google Scholar]
  48. Zhang, F.; Wang, Q.; Li, X. Optimal neighboring reconstruction for hyperspectral band selection. In Proceedings of the IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 4709–4712. [Google Scholar]
  49. Yuan, Y.; Zheng, X.; Lu, X. Discovering diverse subset for unsupervised hyperspectral band selection. IEEE Trans. Image Process. 2016, 26, 51–64. [Google Scholar] [CrossRef]
  50. Hennessy, A.; Clarke, K.; Lewis, M. Hyperspectral classification of plants: A review of waveband selection generalisability. Remote Sens. 2020, 12, 113. [Google Scholar] [CrossRef]
  51. Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; CRC Press: Boca Raton, FL, USA, 1984. [Google Scholar]
  52. Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
Figure 1. A visualization of the t-CUR decomposition where * represents the t-product as defined in Definition 4.
Figure 1. A visualization of the t-CUR decomposition where * represents the t-product as defined in Definition 4.
Remotesensing 17 00567 g001
Figure 2. Flowchart of the proposed method.
Figure 2. Flowchart of the proposed method.
Remotesensing 17 00567 g002
Figure 3. Avisual representation of the 10th hyperspectral band from each of the test dataset. Here, each dataset is displayed using its original spatial scale and the same color range, where Indian Pines is 145 × 145 pixels, Salinas-A is 86 × 83 pixels, and Pavia University is 610 × 340 pixels.
Figure 3. Avisual representation of the 10th hyperspectral band from each of the test dataset. Here, each dataset is displayed using its original spatial scale and the same color range, where Indian Pines is 145 × 145 pixels, Salinas-A is 86 × 83 pixels, and Pavia University is 610 × 340 pixels.
Remotesensing 17 00567 g003
Figure 4. Overall accuracy with SVM for the Indian Pines dataset.
Figure 4. Overall accuracy with SVM for the Indian Pines dataset.
Remotesensing 17 00567 g004
Figure 5. Overall accuracy with KNNs for the Indian Pines dataset.
Figure 5. Overall accuracy with KNNs for the Indian Pines dataset.
Remotesensing 17 00567 g005
Figure 6. Overall accuracy of SVM for the Salinas-A dataset.
Figure 6. Overall accuracy of SVM for the Salinas-A dataset.
Remotesensing 17 00567 g006
Figure 7. Overall accuracy of KNNs for the Salinas-A dataset.
Figure 7. Overall accuracy of KNNs for the Salinas-A dataset.
Remotesensing 17 00567 g007
Figure 8. Overall accuracy of SVM for the Pavia University dataset.
Figure 8. Overall accuracy of SVM for the Pavia University dataset.
Remotesensing 17 00567 g008
Figure 9. Overall accuracy of KNNs for the Pavia University dataset.
Figure 9. Overall accuracy of KNNs for the Pavia University dataset.
Remotesensing 17 00567 g009
Figure 10. SVM overall accuracy for the Indian Pines dataset using grid search and BO.
Figure 10. SVM overall accuracy for the Indian Pines dataset using grid search and BO.
Remotesensing 17 00567 g010
Figure 11. KNNs overall accuracy for the Indian Pines dataset using grid search and BO.
Figure 11. KNNs overall accuracy for the Indian Pines dataset using grid search and BO.
Remotesensing 17 00567 g011
Figure 12. SVM overall accuracy envelope for the Indian Pines dataset over 50 trials using random k-means initialization.
Figure 12. SVM overall accuracy envelope for the Indian Pines dataset over 50 trials using random k-means initialization.
Remotesensing 17 00567 g012
Figure 13. SVM overall accuracy for the Indian Pines dataset using k-means and spectral clustering in Phase 2 of our algorithm.
Figure 13. SVM overall accuracy for the Indian Pines dataset using k-means and spectral clustering in Phase 2 of our algorithm.
Remotesensing 17 00567 g013
Figure 14. CNN overall accuracy for the Indian Pines dataset.
Figure 14. CNN overall accuracy for the Indian Pines dataset.
Remotesensing 17 00567 g014
Table 1. Optimal SVM grid search parameters for Indian Pines with different k bands, including the mean and Standard Deviation (SD).
Table 1. Optimal SVM grid search parameters for Indian Pines with different k bands, including the mean and Standard Deviation (SD).
k β λ 1 λ 2 τ
31 1.0 × 10 4 1.0 × 10 4 1
610 1.0 × 10 4 1.0 × 10 4 0.1
9100 1.0 × 10 4 1.0 × 10 4 0.1
121 1.0 × 10 4 1.0 × 10 4 10
151 1.0 × 10 4 1.0 × 10 4 10
1810 1.0 × 10 4 1.0 × 10 3 0.1
21100 1.0 × 10 4 1.0 × 10 4 1
241 1.0 × 10 4 1.0 × 10 4 1
271 1.0 × 10 4 1.0 × 10 4 10
301 1.0 × 10 4 1.0 × 10 1 0.1
Mean32.4000 1.0 × 10 4 1.1 × 10 2 3.3400
SD40.95850 3.1561 × 10 2 4.6126
Table 2. Optimal SVM BO parameters of our algorithm for the Indian Pines dataset with different k bands, including the mean and Standard Deviation (SD).
Table 2. Optimal SVM BO parameters of our algorithm for the Indian Pines dataset with different k bands, including the mean and Standard Deviation (SD).
k β λ 1 λ 2 τ
32.9918 5.4 × 10 3 1.7 × 10 3 9.7053
69.624 9.4 × 10 3 2.5 × 10 3 0.1224
90.1192 9.9 × 10 3 6.8 × 10 3 5.0883
120.1247 3.2 × 10 4 1.3 × 10 4 0.1345
150.1077 2.0 × 10 3 8.0 × 10 3 1.0986
189.913 8.4 × 10 3 7.0 × 10 3 0.7634
211.5704 4.2 × 10 3 1.7 × 10 3 0.1980
243.422 4.7 × 10 3 3.4 × 10 3 6.4772
270.2138 1.0 × 10 4 8.4 × 10 4 8.7834
300.1398 1.7 × 10 3 3.2 × 10 3 0.1031
Mean2.8226 4.6 × 10 3 3.5 × 10 3 3.2474
SD3.8651 3.7 × 10 3 2.8 × 10 3 3.8816
Table 3. Optimal parameters of our algorithm for Indian Pines with Gaussian noise.
Table 3. Optimal parameters of our algorithm for Indian Pines with Gaussian noise.
Parameters σ = 1 σ = 2 σ = 3 σ = 4 σ = 5 σ = 6
λ 1 10 3 10 3 10 3 10 3 10 3 10 1
λ 2 1010 10 2 10 2 10 3 10 2
β 111111
τ 10 4 10 4 10 4 10 4 10 4 10 4
Table 4. Computational time complexity for various methods for Phase 2 of the algorithm for data of size n 1 × n 2 × n 3 . Here, T k denotes the number of iterations, and k is the number of classes.
Table 4. Computational time complexity for various methods for Phase 2 of the algorithm for data of size n 1 × n 2 × n 3 . Here, T k denotes the number of iterations, and k is the number of classes.
MethodTime Complexity
k-means O ( T k n 1 n 2 n 3 k )
spectral clustering O ( ( n 1 n 2 n 3 ) 3 )
DBSCAN O ( ( n 1 n 2 n 3 ) 2 )
fuzzy c-means O ( T k n 1 n 2 n 3 k )
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Henneberger, K.; Qin, J. Hyperspectral Band Selection via Tensor Low Rankness and Generalized 3DTV. Remote Sens. 2025, 17, 567. https://doi.org/10.3390/rs17040567

AMA Style

Henneberger K, Qin J. Hyperspectral Band Selection via Tensor Low Rankness and Generalized 3DTV. Remote Sensing. 2025; 17(4):567. https://doi.org/10.3390/rs17040567

Chicago/Turabian Style

Henneberger, Katherine, and Jing Qin. 2025. "Hyperspectral Band Selection via Tensor Low Rankness and Generalized 3DTV" Remote Sensing 17, no. 4: 567. https://doi.org/10.3390/rs17040567

APA Style

Henneberger, K., & Qin, J. (2025). Hyperspectral Band Selection via Tensor Low Rankness and Generalized 3DTV. Remote Sensing, 17(4), 567. https://doi.org/10.3390/rs17040567

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop