Advanced Hyperspectral Image Analysis: Superpixelwise Multiscale Adaptive T-HOSVD for 3D Feature Extraction

Hyperspectral images (HSIs) possess an inherent three-order structure, prompting increased interest in extracting 3D features. Tensor analysis and low-rank representations, notably truncated higher-order SVD (T-HOSVD), have gained prominence for this purpose. However, determining the optimal order and addressing sensitivity to changes in data distribution remain challenging. To tackle these issues, this paper introduces an unsupervised Superpixelwise Multiscale Adaptive T-HOSVD (SmaT-HOSVD) method. Leveraging superpixel segmentation, the algorithm identifies homogeneous regions, facilitating the extraction of local features to enhance spatial contextual information within the image. Subsequently, T-HOSVD is adaptively applied to the obtained superpixel blocks for feature extraction and fusion across different scales. SmaT-HOSVD harnesses superpixel blocks and low-rank representations to extract 3D features, effectively capturing both spectral and spatial information of HSIs. By integrating optimal-rank estimation and multiscale fusion strategies, it acquires more comprehensive low-rank information and mitigates sensitivity to data variations. Notably, when trained on subsets comprising 2%, 1%, and 1% of the Indian Pines, University of Pavia, and Salinas datasets, respectively, SmaT-HOSVD achieves impressive overall accuracies of 93.31%, 97.21%, and 99.25%, while maintaining excellent efficiency. Future research will explore SmaT-HOSVD’s applicability in deep-sea HSI classification and pursue additional avenues for advancing the field.


Introduction
Hyperspectral images (HSIs) are acquired through dedicated sensors such as AVIRIS, Hyperion, and CHRIS, capturing numerous spectral bands across Earth's geography [1,2].Distinct species exhibit spectral separability, reflecting unique spectral information characteristics [3].Unlike conventional images, HSI extends beyond spatial features, offering detailed one-dimensional spectral profiles per pixel [4].At the same time, hyperspectral imaging has a wide range of potential in various electromagnetic radiation frequency ranges, such as X-ray, ultraviolet, visible and near-infrared, and terahertz [5][6][7][8][9].With applications spanning agriculture, geology, ecology, and mineralogy [10][11][12][13], HSI faces challenges in classification due to its high spectral resolution and noise, potentially leading to dimensionality disaster and reduced classification accuracy, especially with limited training samples [14,15].Hence, efficient feature extraction is pivotal for HSI classification tasks.
Various feature extraction methods have emerged, from classical linear techniques like principal component analysis (PCA) and linear discriminant analysis (LDA) to nonlinear approaches such as neighborhood preserving embedding (NPE) and locality preserving projections (LPP) [16][17][18][19].However, these methods often neglect spatial detail and concentrate solely on spectral information, limiting their effectiveness [20].To address this, approaches integrating spatial and spectral information have gained prominence, categorized into segmentation-based, decision-based fusion, and feature fusion methods [21].In segmentation-based methods, superpixelwise PCA (SuperPCA) and superpixelwise adaptive singular spectral analysis (SpaSSA) emphasize localized feature extraction with superpixel segmentation [22,23].Decision-based fusion methods like robust matrix discriminative analysis (RMDA) combine multiple classification decisions for improved integration [19].Yet, these methods often treat spectral and spatial information independently, overlooking their intrinsic connection within HSI [20].
Recognizing HSI's cubic form encompassing spatial and spectral dimensions [21], feature extraction in 3D has emerged, notably through deep learning and tensor-based methods.Deep learning, exemplified by 3D convolutional neural networks (3DCNNs) and transformer-based networks like SpectralFormer and spectral-spatial transformer network (SSTN), has shown promise in extracting deeper features [24][25][26][27].Despite the substantial progress made by the aforementioned methods, they share common challenges inherent in deep learning, such as the need to determine a large number of hyperparameters and the lack of model interpretability.The tensor-based approach exploits the low-rank structure of HSI, with algorithms such as Tucker decomposition and CANDECOMP/PARAFAC (CP) efficiently extracting 3D features [28,29].Fu et al. [20] introduced tensor singular spectral analysis (TensorSSA) to extract global and low-rank features of HSI by constructing the global trajectory tensor of HSI.Nonetheless, the low-rank representation of the tensor is sensitive to initialization and parameter selection and is less universally applicable across all types of data, indicating its limitations.
One notable approach is truncated higher-order singular value decomposition (T-HOSVD), a form of Tucker decomposition that compresses tensors for denoising and approximate low-rank representation [30,31].However, T-HOSVD faces challenges in handling diverse and complex HSI data, along with the difficulty of determining optimal ranks [32].Therefore, it is necessary to explore an improved T-HOSVD method that can adaptively perform rank estimation while reducing data sensitivity and making the algorithm more robust for better 3D feature extraction of HSI.
To overcome these challenges, this paper proposes a novel method, Superpixelwise Multiscale Adaptive T-HOSVD (SmaT-HOSVD), which integrates local features and lowrank attributes of HSI for 3D feature extraction.Spectral and spatial features are extracted in 3D through superpixel and Multiscale Adaptive T-HOSVD to obtain more comprehensive low-rank information and reduce sensitivity to data, thereby enhancing classification tasks.
The primary contributions of SmaT-HOSVD include the following: (1) Introducing SmaT-HOSVD, a 3D feature extraction method utilizing superpixels and Multiscale Adaptive T-HOSVD, which fully utilizes spectral and spatial information and enhances the comprehensiveness of features; (2) Employing adaptive rank estimation based on the energy distribution of data, enhancing noise separation and inter-class distinction.In addition, a low-rank fusion strategy is utilized to generate multilevel features to enhance the comprehensiveness of the features and the robustness of the algorithm; (3) Demonstrating low time complexity and memory footprint, enhancing computational efficiency, and validating performance across multiple public datasets.

Related Works
We adopt italicized letters for scalars (e.g., x and X), bold letters for matrices (e.g., X), and calligraphic letters for tensors (e.g., X ).R denotes the real number fields.X ∈ R I 1 ×•••I n ×•••×I N denotes a third-order tensor.

Notations and Definitions
Definition 1 (n-mode flattening matrix).An n-mode flattening matrix is obtained by unfolding a tensor along its i n dimension and then flattening it to a two-dimensional array.For a tensor X ∈ R I 1 ×•••I n ×•••×I N , the resulting matrix is denoted as Definition 2 (n-mode product [33]).The n-mode product of a tensor X ∈ R This operation can also be expressed through the matrix: Definition 3 (singular value decomposition (SVD) [34]).For the matrix X ∈ R n 1 ×n 2 , the SVD of X is given by where U is an n 1 × n 1 complex unitary matrix,∑ is an n 1 × n 2 rectangular diagonal matrix, V is an n 2 × n 2 complex unitary matrix, and V * is the conjugate transpose of V.

T-HOSVD
The truncated higher-order singular value decomposition (T-HOSVD) [30] is a tensor decomposition technique that has been successfully applied to denoising [35], face recognition [36], and various other applications.Notably, T-HOSVD exhibits outstanding performance in HSI.
For HSI X ∈ R I 1 ×I 2 ×I 3 with a multilinear rank denoted by rank-(R 1 , R 2 , R 3 ), we can determine the optimal X that approximates X by solving an optimization problem: min where (R 1 , R 2 , R 3 ) is the reduced multilinear rank, and the ∥ • ∥ F is the Frobenius norm.
To solve this optimization problem, we can employ a simple method, namely, T-HOSVD.
The detailed steps of T-HOSVD are as follows.
Unfolding X obtains a matrix X n by n-mode, compute a rank-R n truncated SVD: and store the top R n left singular vectors U n ∈ R I n ×R n , then compute the core tensor G = X × 1 U 1 × 2 U 2 × 3 U 3 by the n-mode product.Finally, the approximation tensor can be reformulated as Tucker decomposition offers numerous advantages and finds applications across various fields [37].De Lathauwer and Vandewalle [38] have investigated the application of the Tucker decomposition in the field of signal processing.Vasilescu and Terzopoulos [39] were pioneers in introducing the Tucker decomposition for applications in computer vision.As a specific instance of orthogonal Tucker decomposition, T-HOSVD serves as a generalization of matrix singular value decomposition.While T-HOSVD does not offer an optimal multilinear rank approximation [40], in practical applications, it tends to produce satisfactory solutions when optimal solutions with small errors exist.However, T-HOSVD encounters two significant challenges in the field of HSI processing.Firstly, HSI often contain diverse targets with multilayered and complex information, making T- HOSVD susceptible to variations in data distribution and reducing its robustness.Secondly, determining the optimal rank for an HSI remains a challenging problem.Typically, the optimal rank is computed by transforming rank estimation into an iterative optimization problem with constraints.However, this approach is time-consuming, and tuning certain parameters can be difficult.To enhance the solution and minimize errors, we incorporate adaptive low-rank estimation and multiscale fusion strategies in our approach.These strategies contribute to a more comprehensive and robust extraction of features from HSI, as detailed in Section 3.2.

Methods
The flowchart of the proposed SmaT-HOSVD method is depicted in Figure 1, comprising the following three parts: (1) Superpixel segmentation; (2) Superpixelwise Multiscale Adaptive T-HOSVD; (3) Classification.The details of the proposed SmaT-HOSVD are described as follows.
Sensors 2024, 24, 4072 4 of 21 tain diverse targets with multilayered and complex information, making T-HOSVD susceptible to variations in data distribution and reducing its robustness.Secondly, determining the optimal rank for an HSI remains a challenging problem.Typically, the optimal rank is computed by transforming rank estimation into an iterative optimization problem with constraints.However, this approach is time-consuming, and tuning certain parameters can be difficult.To enhance the solution and minimize errors, we incorporate adaptive low-rank estimation and multiscale fusion strategies in our approach.These strategies contribute to a more comprehensive and robust extraction of features from HSI, as detailed in Section 3.2.

Methods
The flowchart of the proposed SmaT-HOSVD method is depicted in Figure 1, comprising the following three parts:

Superpixel Segmentation
Superpixel segmentation involves partitioning pixels into several irregular pixel blocks known as superpixels [41].The boundaries of these superpixels align relatively well with the actual boundaries of the image, facilitating subsequent task processing [42].Superpixel algorithms can be broadly categorized into two types: graph-based superpixel algorithms and gradient descent-based superpixel algorithms.Classic algorithms include entropy rate superpixel (ERS) [43] and simple linear iterative clustering [41], which are widely used in HSI preprocessing.
ERS transforms the superpixel segmentation problem into solving the objective function with the largest entropy rate on the graph.In ERS, images are mapped to a graph, where each pixel is represented as a vertex, and edge weights represent pairwise similarities [42].By solving the objective function with the largest entropy rate on the graph, k connected subgraphs are generated, where k is the number of superpixels.SLIC is akin to a clustering method, initializing the superpixel center and conducting local iterations

Superpixel Segmentation
Superpixel segmentation involves partitioning pixels into several irregular pixel blocks known as superpixels [41].The boundaries of these superpixels align relatively well with the actual boundaries of the image, facilitating subsequent task processing [42].Superpixel algorithms can be broadly categorized into two types: graph-based superpixel algorithms and gradient descent-based superpixel algorithms.Classic algorithms include entropy rate superpixel (ERS) [43] and simple linear iterative clustering [41], which are widely used in HSI preprocessing.
ERS transforms the superpixel segmentation problem into solving the objective function with the largest entropy rate on the graph.In ERS, images are mapped to a graph, where each pixel is represented as a vertex, and edge weights represent pairwise similarities [42].By solving the objective function with the largest entropy rate on the graph, k connected subgraphs are generated, where k is the number of superpixels.SLIC is akin to a clustering method, initializing the superpixel center and conducting local iterations based on the similarity of surrounding pixels, ultimately obtaining the superpixel.However, SLIC ignores global image properties and is insensitive to pixel content [42].Considering these factors, we opted to use ERS to obtain superpixels.In our method, principal component analysis (PCA) is applied to the HSI, and the first principal components (PCs), I f , obtained are utilized for superpixel segmentation.This is because the first PCA contains the primary information of the HSI.By applying ERS to the I f , the HSI can be segmented into superpixel blocks: where S denotes the number of superpixels, and X k is the kth superpixel.

Superpixelwise Multiscale Adaptive T-HOSVD
In reality, determining the rank T-HOSVD is not a straightforward task.Therefore, we prefer to employ a simpler method to adaptively determine the rank-(R 1 , R 2 , R 3 ).During the SVD of a matrix, the larger singular values often contain more intrinsic information of the matrix.Based on this principle, we propose a simple strategy to automatically estimate the rank R n .
For HSI X ∈ R I 1 ×I 2 ×I 3 , unfolding X obtains a matrix X n by n-mode.Let σ i be the singular values corresponding to matrix X n , arranged from large to small.Let C be the sum of singular values: The rank R n can be determined by where T is a given threshold, sum(σ k ), k = 1, 2 • • • i is the sum of the first i singular values of X n .Therefore, we can set the threshold-(T 1 , T 2 , T 3 ) based on experience to adaptively determine the rank-(R 1 , R 2 , R 3 ).
For a given HSI X , we can set different thresholds T m , m = 1, 2, • • • M,where M is the number of different scales.For T-HOSVD of different scales, different structures and characteristics will be revealed [44].Ranks of different sizes can reflect various levels of intrinsic information in HSI.Therefore, the rank of one scale often cannot entirely capture the information.It is necessary to integrate sufficiently complementary and relevant information.Multiscale fusion becomes an effective method.
Then, by superimposing the third dimension of X m , the result of Multiscale T-HOSVD can be determined by The X m ∈ R I 1 ×I 2 ×I 3 , m = 1, 2, • • • , M represents the T-HOSVD representation at various scales.However, simply fusing T-HOSVD representations of different scales together will inevitably have negative effects.On one hand, the contributions of different scales are treated equally, while the importance of different scales is ignored; on the other hand, stacking the third dimension will cause a dimensionality disaster, which may significantly impact the classification results [45].
In order to better extract information at different scales and enable multiscale fusion to better display the intrinsic structure and characteristics of HSI, we can perform PCA on Y to achieve dimensionality reduction: where W is the transformation matrix composed of the eigenvectors corresponding to the first n eigenvalues of the covariance matrix Y 3 Y T 3 .Finally, we can obtain the result Y ∈ R I 1 ×I 2 ×n for the Multiscale Adaptive T-HOSVD.
By integrating HSI representations from adaptive T-HOSVD at multiple scales, complementary and relevant information can be effectively captured.Figure 2 illustrates the Sensors 2024, 24, 4072 6 of 20 application of Multiscale Adaptive T-HOSVD to HSI.Among them, we use "Feature Image" color pictures to better represent the extracted features.Although HSI can capture a significant amount of useful information, its practical effectiveness is often limited due to the richness and complexity of the contained data.However, superpixel blocks derived from superpixel segmentation are highly homogeneous regions with similar structures and features.Therefore, applying Multiscale Adaptive T-HOSVD within these homogeneous regions tends to extract more effective features.We refer to this process as SmaT-HOSVD for superpixel blocks, as shown in Figure 3. and features.Therefore, applying Multiscale Adaptive T-HOSVD within these homogeneous regions tends to extract more effective features.We refer to this process as SmaT-HOSVD for superpixel blocks, as shown in Figure 3.
Since superpixels are typically irregular regions, preprocessing these irregularly shaped superpixels is necessary.For each superpixel, we need to identify the smallest bounding rectangle that encompasses the superpixel, which may include some adjacent pixels from other superpixels.All bands within this rectangular region are then extracted to form the 3D tensor required for Multiscale Adaptive T-HOSVD.This 3D tensor is processed using Multiscale Adaptive T-HOSVD, and the original superpixel is replaced with the resulting low-rank tensor.
This approach can effectively reduce intra-region differences while highlighting inter-class differences.Additionally, to the best of our knowledge, this is the first time an improved version of T-HOSVD has been applied to superpixel blocks.This method can adaptively estimate the optimal rank for different superpixel blocks and effectively utilize the spatial-spectral correlation of pixels in local images.

Classification
After feature extraction from the HSI to obtain the feature tensor, the next step is to feed it into the classifier for classification.Support vector machine (SVM) [46] is a classification algorithm whose basic idea is to map the data into a high-dimensional space and find a hyperplane in that space, maximizing the distance of all data points to the hyperplane.SVM has demonstrated good performance in the classification of remote sensing data [47].Therefore, we decided to use SVM as our classifier.
In addition, the key of SVM is to choose the type of kernel function, mainly linear Since superpixels are typically irregular regions, preprocessing these irregularly shaped superpixels is necessary.For each superpixel, we need to identify the smallest bounding rectangle that encompasses the superpixel, which may include some adjacent pixels from other superpixels.All bands within this rectangular region are then extracted to form the 3D tensor required for Multiscale Adaptive T-HOSVD.This 3D tensor is processed using Multiscale Adaptive T-HOSVD, and the original superpixel is replaced with the resulting low-rank tensor.
This approach can effectively reduce intra-region differences while highlighting interclass differences.Additionally, to the best of our knowledge, this is the first time an improved version of T-HOSVD has been applied to superpixel blocks.This method can adaptively estimate the optimal rank for different superpixel blocks and effectively utilize the spatial-spectral correlation of pixels in local images.

Classification
After feature extraction from the HSI to obtain the feature tensor, the next step is to feed it into the classifier for classification.Support vector machine (SVM) [46] is a classification algorithm whose basic idea is to map the data into a high-dimensional space and find a hyperplane in that space, maximizing the distance of all data points to the hyperplane.SVM has demonstrated good performance in the classification of remote sensing data [47].Therefore, we decided to use SVM as our classifier.
In addition, the key of SVM is to choose the type of kernel function, mainly linear kernel, polynomial kernel, Gaussian radial basis function (RBF), and so on.Following the suggestion from [48], we decided to use RBF as our kernel function.
Algorithm 1 for the proposed method is shown below.

1:
The first principal component X ′ is obtained by applying PCA to X .

2:
Apply Equation ( 7) to get superpixels I f Fill X k to get the standard tensor X ′ k 5: Apply Equations ( 8) and (9) to compute the optimal rank of X ′ k 6: Apply Equations ( 5) and (6) for X ′ k to obtain the low − rank representations Y 1 and Y 2 7: Apply Equation (10 Apply Equation ( 11) get results y 9: Update the superpixel X k with the result Y 10: k ← k+1 11: End while 12: Updating X ∈ R I 1 ×I 2 ×n with superpixels I f 13: Training and testing of X and L using SVM OUTPUT: Classification results

Datasets
In our experiments, we chose three publicly available hyperspectral datasets to evaluate the reliability of our model. (

Experimental Setup
We employed SVM classifiers to validate the effectiveness of SmaT-HOSVD.Following the methodology of previous work [49], we utilized five evaluation metrics: producer accuracy (PA), overall accuracy (OA), average accuracy (AA), kappa coefficient (Kappa), and running time (s).To mitigate systematic errors and reduce random errors, all experiments were independently conducted ten times, and the average values were computed.Both training and test sets were randomly selected without any repetition in either.The training set represented 2% (Indian Pines), 1% (Pavia University), and 1% (Salinas) of the categories in each sample, respectively.
To validate the effectiveness of the proposed method, we compared it with seven state-of-the-art HSI feature-extraction methods.Considering that SmaT-HOSVD utilizes superpixel segmentation and tensor low-rank representation, we selected two methods based on superpixel segmentation, i.e., SuperPCA and SpaSSA, and one tensor-based method, TensorSSA.In addition, due to the wide application of deep learning, we also consider two deep learning methods, i.e., SpectralFormer and SSTN.Moreover, we also selected T-HOSVD and SVM to highlight the effectiveness of SmaT-HOSVD.
(1) A support vector machine (SVM) is employed as a classifier with a Gaussian radial basis function (RBF) serving as its kernel function.To determine the hyperparameters, we utilize five-fold cross-validation.(2) We utilize T-HOSVD for feature extraction and compare it with our proposed SmaT-HOSVD.The classifier employed is SVM, and the rank-(R 1 , R 2 , R 3 ) is set to 60. (3) SpectralFormer [26] is a neural network based on transformer, utilizing a patch-wise version with a patch size of 1 × 3 and trained for 300 epochs.(4) SSTN is employed with a batch size of 32 and trained for 100 epochs, while keeping other hyperparameters consistent with the specifications in [27].(5) SuperPCA relies on dimensionality reduction through superpixel segmentation, employing the same parameters as specified in [22].(6) SpaSSA is grounded in feature extraction with superpixel segmentation, and the optimal parameters specified in [23] are utilized.( 7) TensorSSA extracts spectral-spatial features through a 3D approach, and the optimal parameters specified in [20] are employed.(8) For our proposed method, the optimal parameters are detailed in IV.C. SVM is employed for classification.
The experiments were configured on MATLAB R2021a on a Windows 10 64-bit platform with an AMD Ryzen 7 5800H (3.2 GHz) CPU and 16 GB of memory which is manufactured by Advanced Micro Devices, Inc. (AMD).Deep learning implementation was carried out on the Ubuntu 20.04 platform using the PyTorch 1.10.2framework with an NVIDIA GeForce RTX 3050Ti GPU.

Comparisons with State-of-the-Arts Models
Analyzing the classification results obtained using random selection and different training sets is crucial for understanding the impact of varying training data sizes on Sensors 2024, 24, 4072 9 of 20 model performance.In this subsection, we conduct extensive experiments to compare the proposed method with state-of-the-art methods as detailed in Section 4.2.
Figure 4 illustrates that the accuracy of all methods generally improves as th centage of training samples increases.Notably, the proposed SmaT-HOSVD consis outperforms seven comparative methods, particularly when dealing with small sa sizes (e.g., 1% IP, 0.5% PU, and 0.2% SD), achieving excellent performance.Com tively, state-of-the-art deep learning method SSTN and feature extraction method S PCA yield suboptimal results across the three datasets, falling short of SmaT-HO highlighting the effectiveness and advancement of SmaT-HOSVD.Additionally, methods exhibit varying performance across different datasets.For instance, Tenso performs well as the second-best method on the PU dataset after SSTN and S HOSVD, but its classification performance on the IP and SD datasets is mediocre.In trast, SmaT-HOSVD consistently achieves the best results across all three datasets, u scoring the robustness and efficacy of the proposed method.(2) Quantitative Evaluation: To thoroughly evaluate the effectiveness of the pro method, we provide a detailed comparison of producer accuracy (PA) for each gory in Tables 1-3, along with the three evaluation metrics OA, AA, and Kapp proposed SmaT-HOSVD consistently achieves the highest accuracy in most c and outperforms other methods in terms of overall metrics.Taking the IP data Table 1 as an example for analysis, firstly, in terms of OA, SmaT-HOSVD imp 27.83%, 27.78%, 33.08%, 4.24%, 1.81%, 14.35%, and 7.35% compared to SV HOSVD, SpectralFormer, SSTN, SuperPCA, SpaSSA, and TensorSSA, respect Figure 4 illustrates that the accuracy of all methods generally improves as the percentage of training samples increases.Notably, the proposed SmaT-HOSVD consistently outperforms seven comparative methods, particularly when dealing with small sample sizes (e.g., 1% IP, 0.5% PU, and 0.2% SD), achieving excellent performance.Comparatively, state-of-the-art deep learning method SSTN and feature extraction method SuperPCA yield suboptimal results across the three datasets, falling short of SmaT-HOSVD, highlighting the effectiveness and advancement of SmaT-HOSVD.Additionally, other methods exhibit varying performance across different datasets.For instance, TensorSSA performs well as the second-best method on the PU dataset after SSTN and SmaT-HOSVD, but its classification performance on the IP and SD datasets is mediocre.In contrast, SmaT-HOSVD consistently achieves the best results across all three datasets, underscoring the robustness and efficacy of the proposed method.
(2) Quantitative Evaluation: To thoroughly evaluate the effectiveness of the proposed method, we provide a detailed comparison of producer accuracy (PA) for each category in Tables 1-3, along with the three evaluation metrics OA, AA, and Kappa.The proposed SmaT-HOSVD consistently achieves the highest accuracy in most classes and outperforms other methods in terms of overall metrics.Taking the IP dataset in Table 1 as an example for analysis, firstly, in terms of OA, SmaT-HOSVD improves 27.83%, 27.78%, 33.08%, 4.24%, 1.81%, 14.35%, and 7.35% compared to SVM, T-HOSVD, Spec-tralFormer, SSTN, SuperPCA, SpaSSA, and TensorSSA, respectively.Also, on AA and Kappa, SmaT-HOSVD improved to varying degrees.It is worth noting that the PA of SuperPCA is not much different from SmaT-HOSVD, but the processing time of SuperPCA is shorter.This is because during the processing of SmaT-HOSVD, too many neighboring pixels are added to some small superpixel blocks, which introduces noise and results in a degradation of the effect.And compared to SuperPCA, the time complexity of SmaT-HOSVD is higher, but overall, from all three datasets, SmaT-HOSVD achieves better performance.On the PA of each class, SmaT-HOSVD s' accuracy is basically at the top of the list, above 80% for 5 classes and above 90% for 11 classes.The superior performance of the SmaT-HOSVD method is also observed in Tables 2 and 3 for the Pavia University and Salinas datasets, respectively.When comparing methods, both SuperPCA and SpaSSA demonstrate higher classification accuracy compared to the classical SVM.This improvement can be attributed to their utilization of local features, with SuperPCA exhibiting particularly notable performance in this regard.Unfortunately, SuperPCA does not perform well on the PU dataset, possibly due to the limitations of PCA-based feature extraction, especially for small targets.The deep learning method SpectralFormer is less effective than SVM, primarily because of the limited training samples that prevent adequate model training.In contrast, SSTN achieves better results by leveraging spectral-spatial features more comprehensively.The tensor-based method TensorSSA effectively extracts low-rank information but does not effectively utilize important local features, resulting in moderate performance.In contrast, SmaT-HOSVD demonstrates the best performance across all datasets by effectively extracting both local and spectral-space features.(a Our extracted SmaT-HOSVD is able to efficiently extract both spatial and spectral features thanks to the superpixel block and low-rank representation.As shown in Figures 5j, 6j and 7j, good classification results are basically obtained for targets in large or regular categories.This is due to the fact that SmaT-HOSVD effectively utilizes the spatial information through the superpixel blocks and obtains the main spectral information through the low-rank representation, which enhances the intra-class consistency and highlights the inter-class differences.However, the classification results are poor in some class edges and irregular mini-classes, which is due to the fact that too many pixels from other classes are added to introduce noise during processing, making the extracted features not characterize the intra-class information well.Intuitive analysis shows that the SmaT-HOSVD method can effectively reduce the noise and improve the classification accuracy compared with other methods.TensorSSA requires performing the decomposition of the trajectory tensor, which is a time-consuming process.In comparison to SuperPCA and T-HOSVD, SmaT-HOSVD requires more time due to the additional tasks involved.However, despite this, SmaT-HOSVD is relatively efficient and performs well in terms of computational efficiency compared to many other methods.
Overall, the proposed SmaT-HOSVD method strikes a balance between runtime efficiency and classification accuracy, making it a viable choice for HSI classification.Overall, the proposed SmaT-HOSVD method strikes a balance between runtime efficiency and classification accuracy, making it a viable choice for HSI classification.

Parameter Analysis
In the proposed SmaT-HOSVD method, certain parameters require manual configuration.These parameters include the number of superpixels (S) in ERS-based superpixel segmentation, the dimension (n) for PCA dimensionality reduction, and the threshold (T) in T-HOSVD.To evaluate the impact of these parameters and determine optimal values, we conducted parameter sensitivity experiments on SmaT-HOSVD.
(1) Number of superpixels (S): Through superpixel segmentation, we can obtain different homogeneous regions where similar pixels are more likely to fall into the same class [22].The parameter S determines the level of segmentation for the superpixels.
A larger S leads to finer segmentation with larger superpixels, whereas a smaller S results in coarser segmentation with smaller superpixels.The degree of superpixel segmentation significantly influences the extracted features.Due to variations in spatial resolution and dataset size, the optimal number of superpixels (S) differs for each dataset.Although ideal classification accuracy requires over-segmentation of the scene, due to the complexity of the distribution of land cover, specific analysis is still required [50].As shown in Figure 8, the best results are taken at 30, 25, and 90 for S in our approach for Indian Pines, Pavia University, and Salinas datasets, respectively.For example, in the IP dataset, a number of superpixel S below 50 achieves good results, which is due to the fact that the homogeneous regions in the IP dataset are around 30.When S is too large, it makes the image over-segmented, resulting in the

Parameter Analysis
In the proposed SmaT-HOSVD method, certain parameters require manual configuration.These parameters include the number of superpixels (S) in ERS-based superpixel segmentation, the dimension (n) for PCA dimensionality reduction, and the threshold (T) in T-HOSVD.To evaluate the impact of these parameters and determine optimal values, we conducted parameter sensitivity experiments on SmaT-HOSVD.
(1) Number of superpixels (S): Through superpixel segmentation, we can obtain different homogeneous regions where similar pixels are more likely to fall into the same class [22].The parameter S determines the level of segmentation for the superpixels.
A larger S leads to finer segmentation with larger superpixels, whereas a smaller S results in coarser segmentation with smaller superpixels.The degree of superpixel segmentation significantly influences the extracted features.Due to variations in spatial resolution and dataset size, the optimal number of superpixels (S) differs for each dataset.Although ideal classification accuracy requires over-segmentation of the scene, due to the complexity of the distribution of land cover, specific analysis is still required [50].As shown in Figure 8, the best results are taken at 30, 25, and 90 for S in our approach for Indian Pines, Pavia University, and Salinas datasets, respectively.For example, in the IP dataset, a number of superpixel S below 50 achieves good results, which is due to the fact that the homogeneous regions in the IP dataset are around 30.When S is too large, it makes the image over-segmented, resulting in the obtained superpixel blocks being fragmented and narrow, making the algorithm ineffective.

Ablation Study
To compare the effectiveness of the proposed SmaT-HOSVD against T-HOSVD, we incorporate two key enhancements: ERS for superpixel segmentation and a multiscale adaptive low-rank representation strategy for T-HOSVD.We aim to validate these operations across three datasets.In the pixel-processing stage, we evaluate two scenarios: using the raw HSI without superpixel segmentation versus using ERS for superpixel segmentation.For the low-rank representation stage, we compare the original T-HOSVD with T-HOSVD employing the multiscale adaptive strategy.Specifically, we set rank-1 2 3 ( , , ) R R R to 60 for the original T-HOSVD.However, due to varying sizes and shapes of superpixel blocks, smaller blocks may not achieve the length dimension of 60.Therefore, for the lowrank representation of superpixel blocks, we set rank-1 2 3 ( , , ) R R R to 10.In our experi- ments, we select 2% of samples from the IP, 1% of samples from the PU, and 1% of samples from the SD datasets for training.(2) Reduction dimensionality (n): The dimensionality reduction n directly influences the amount of information retained in the data.A larger value of n generally retains more information but may also increase noise; conversely, a smaller n emphasizes the main components of information, potentially leading to the loss of important details.Because of the diversity in data, the optimal dimensionality reduction n varies for each dataset.The ideal reduction n effectively filters out noise while preserving essential information.As shown in Figure 9, in our approach, the Indian Pines, University of Pavia, and Salinas datasets work best when the reduced dimension n is set to 20, 20, and 15, respectively.Based on the downscaling results of different datasets, our algorithm achieves the best classification performance in the range of downscaling dimensions from about 15 to 25.This shows that for the obtained low-rank tensor, the dimension reduction is usually around 20 to leave the most dominant information and remove the noise to extract the better features, which is consistent with the conclusion in [45].When the downscaling dimension is larger, the information retained is cumbersome and contains noise, making the effect decrease.At the same time, a downscaling dimension of around 20 on different datasets achieves better results, which further illustrates the stability of our algorithm.(3) Threshold T 1 and T 2 : The choice of different thresholds (T 1 and T 2 ) significantly affects the extracted features.The larger threshold-(T 1 , T 2 , T 3 ) results in a higher rank-(R 1 , R 2 , R 3 ), preserving more common features of the HSI.The level of low-rank features extracted varies with different T. As T increases, more dimensions are retained in the tensor decomposition, resulting in the preservation of a larger amount of information.This leads to extracted features that tend to be richer; however, they may also contain some noise.On the contrary, as T decreases, the tensor decomposition retains fewer dimensions, and thus some details and features may be lost.Furthermore, to simplify parameter settings, both T 1 and T 2 are set to the same value, while T 3 is taken relatively independently.And in order to obtain different levels of features, T 2 is generally set higher than T 1 , with T 2 as a larger threshold and T 1 as a smaller one.

Ablation Study
To compare the effectiveness of the proposed SmaT-HOSVD against T-HOSVD, we incorporate two key enhancements: ERS for superpixel segmentation and a multiscale adaptive low-rank representation strategy for T-HOSVD.We aim to validate these operations across three datasets.In the pixel-processing stage, we evaluate two scenarios: using the raw HSI without superpixel segmentation versus using ERS for superpixel segmentation.For the low-rank representation stage, we compare the original T-HOSVD with T-HOSVD employing the multiscale adaptive strategy.Specifically, we set rank-1 2 3 ( , , ) R R R to 60 for the original T-HOSVD.However, due to varying sizes and shapes of superpixel blocks, smaller blocks may not achieve the length dimension of 60.Therefore, for the lowrank representation of superpixel blocks, we set rank-1 2 3 ( , , ) R R R to 10.In our experi- ments, we select 2% of samples from the IP, 1% of samples from the PU, and 1% of samples from the SD datasets for training.From Figure 10, it is evident that the larger threshold value of T 2 has a more significant impact on the proposed method and is a key parameter, while T 1 serves as an auxiliary parameter with less impact.For different datasets, the optimal parameters for T 1 and T 2 are different.In our method, T 1 for the Indian Pines dataset is set to (0.8,0.8,0.9), and T 2 is set to (0.85,0.85,0.95);for the Pavia University dataset, T 1 is set to (0.8,0.8,0.8), and T 2 is set to (0.8,0.8,0.95); and for the Salinas dataset, T 1 is set as (0.8,0.8,0.8), and T 2 is set as (0.9,0.9,0.95).Our proposed method exhibits robustness to thresholding, as evidenced by the minimal variation in OA on Indian Pines, which is only 0.018.This result underscores the idea that different thresholds can offer complementary structural information.
Determining the number of superpixels and the optimal rank is a critical and open question in the proposed method, which aims to segment HSI and perform low-rank representation.In this paper, we experimentally set the number of superpixels and determined the optimal rank using an adaptive estimation strategy to achieve the best performance.The segmentation scale of HSI varies in terms of the number of homogeneous regions across different datasets.Therefore, in our experiments, we determined the optimal number of superpixel blocks using a continuous adjustment method to adapt to the specific characteristics of each dataset.In fact, we can determine the rank based on the percentage of total energy, as implemented in ( 8) and ( 9).The process of determining the optimal rank can be likened to setting a threshold for a given HSI.Given that the initial singular values in singular value decomposition typically contain the majority of the energy, we can fine-tune the threshold to identify the optimal rank.Thus, the above method can determine the number of superpixels and the optimal rank for different datasets.

Ablation Study
To compare the effectiveness of the proposed SmaT-HOSVD against T-HOSVD, we incorporate two key enhancements: ERS for superpixel segmentation and a multiscale adaptive low-rank representation strategy for T-HOSVD.We aim to validate these operations across three datasets.In the pixel-processing stage, we evaluate two scenarios: using the raw HSI without superpixel segmentation versus using ERS for superpixel segmentation.For the low-rank representation stage, we compare the original T-HOSVD with T-HOSVD employing the multiscale adaptive strategy.Specifically, we set rank-(R 1 , R 2 , R 3 ) to 60 for the original T-HOSVD.However, due to varying sizes and shapes of superpixel blocks, smaller blocks may not achieve the length dimension of 60.Therefore, for the low-rank representation of superpixel blocks, we set rank-(R 1 , R 2 , R 3 ) to 10.In our experiments, we select 2% of samples from the IP, 1% of samples from the PU, and 1% of samples from the SD datasets for training.
As shown in Table 4, the combination of superpixel segmentation and multiscale adaptive T-HOSVD (SmaT-HOSVD) achieves the highest classification performance.Comparing the raw HSI with superpixel blocks after using T-HOSVD separately shows that the latter has a higher classification accuracy across all three datasets.This improvement can be attributed to the tendency of superpixel blocks to contain similar or identical class targets, enabling low-rank representation to mitigate noise interference and enhance intra-class similarity.This makes the extracted features more different from one class to another, highlighting the gaps between different classes and facilitating the subsequent classification task.Regarding the enhanced T-HOSVD with adaptive rank estimation and multiscale fusion strategy, the classification accuracies are significantly improved compared with the original T-HOSVD.This is due to the fact that adaptive rank estimation can dynamically adjust the optimal rank according to the target information distribution, while enhancing the classification task by fusing features from different scales to obtain a more comprehensive representation.Moreover, this adaptive rank estimation and multiscale fusion does not add too much computation and has high efficiency.As depicted in Table 4, Superpixelwise Multiscale Adaptive T-HOSVD outperforms traditional T-HOSVD across the three datasets, showcasing improved classification performance.The combination of superpixel segmentation and low-rank representation can better remove HSI noise and extract effective features [51,52].The information contained in the superpixel blocks obtained by superpixel segmentation varies in shape and size, and a fixed rank setting may not accurately reflect the information distribution of each superpixel block.Therefore, the use of adaptive rank estimation and multiscale fusion is particularly favorable for the low-rank representation of superpixel blocks, which enables the extracted features to reflect the information of each homogeneous region, thus enhancing the effectiveness of the classification task.Overall, leveraging superpixel blocks and multiscale adaptive T-HOSVD (SmaT-HOSVD) estimation yields superior performance in HSI classification.

Analysis between SmaT-HOSVD and Classification
In this paper, we utilize a combination of SmaT-HOSVD and SVM classifiers to achieve ground object classification.SmaT-HOSVD is employed for 3D feature extraction to obtain spectral-spatial features, aiming to reduce inter-class noise and enhance intra-class consistency for improved classification effectiveness.SmaT-HOSVD operates as an unsupervised 3D feature extraction method on HSI superpixel blocks without the need for labeled training data.This method effectively enhances inter-class differences and promotes inter-class consistency, thereby enhancing classification performance.Whereas supervised featureextraction methods have been shown to further improve classification performance [53,54], and the sample prior information included in SmaT-HOSVD is also worth exploring.
However, there is a potential risk associated with SmaT-HOSVD.It may inadvertently diminish differences between similar objects while increasing agreement between dissimilar objects, which can introduce noise and lead to misclassification.The superpixel blocks obtained from superpixel segmentation may contain a higher number of different objects, and the introduction of noise in the algorithm can unintentionally reduce the differences between similar objects while increasing the consistency between dissimilar objects, leading to errors in classification.This requires us to avoid the superpixel blocks containing too much noise by finding a more appropriate superpixel segmentation method or designing an adaptive superpixel number-estimation algorithm.Meanwhile, for narrow and irregular superpixel blocks, dividing them into regular tensors may contain too many dissimilar pixels, resulting in the extracted features containing noise and leading to poor classification.In the subsequent improvement, we will seek an adaptive division of superpixels strategy instead of simple division to avoid excessive noise leading to the degradation of results.

Conclusions
HSIs, with their inherent three-dimensional nature, possess both spectral and spatial characteristics.Traditional methods often focus on either spatial or spectral features, or they attempt a simple fusion of the two.Consequently, there is a growing trend towards the utilization of 3D or tensor methods in HSI classification.However, these approaches frequently concentrate on global features and may overlook homogeneous regions, which could provide more effective spatial and spectral features.To address these limitations, this paper introduces the SmaT-HOSVD method for 3D feature extraction in HSI.
The proposed SmaT-HOSVD method classifies HSI into homogeneous clusters through superpixel segmentation to obtain superpixel blocks.It then performs adaptive rank estimation and multiscale fusion on these superpixel blocks to acquire low-order representations for each homogeneous cluster, facilitating 3D feature extraction of qualitative areas.SmaT-HOSVD effectively utilizes local features and low-order representations to capture the low-rank intrinsic features of HSI, thereby reducing redundancy and noise.This enhances the similarity within similar classes and widens the gap between different classes.By extracting features from HSI in 3D, SmaT-HOSVD obtains spectral-spatial information that facilitates subsequent HSI classification tasks.SmaT-HOSVD surpasses several state-of-theart methods on three public datasets with an impressive overall accuracy improvement of 12% to 20%.Moreover, even with very little training, SmaT-HOSVD can achieve good overall accuracy while maintaining low time complexity, which has great potential in practical applications.
In future work, we plan to explore enhanced methods for incorporating irregular superpixels into HOSVD.This involves avoiding the use of excessive noise pixels to fill irregular superpixels, ensuring appropriate handling of irregularities.In addition, in the deep-sea manganese nodule classification task, due to the spectral spatial changes of HSI and the instability of illumination in the deep sea [55], we will also study the application of SmaT-HOSVD in the deep-sea manganese nodule classification task to improve the classification accuracy.

Figure 2 .
Figure 2. Schematic diagram of Multiscale Adaptive T-HOSVD usage on HSI Global.Figure 2. Schematic diagram of Multiscale Adaptive T-HOSVD usage on HSI Global.

Figure 4 .
Figure 4. OA obtained by different methods with different training percentages over (a) IP, ( and (c) SD datasets.

Figure 4 .
Figure 4. OA obtained by different methods with different training percentages over (a) IP, (b) PU, and (c) SD datasets.

( 3 )
Qualitative Evaluation: A diagram categorizing IP, PU, and SD data is illustrated in Figures5-7.The analysis primarily focuses on the IP dataset presented in Figure5.Once again, the emphasis is on the Indian Pines (IP) dataset depicted in Figure5.SVM and T-HOSVD exhibit significant categorization noise across all categories, characterized by an exceptionally high number of classification errors.The performance of SpectralFormer is unsatisfactory, showing high accuracy only in a few classes while remaining extremely noisy in most classes.This outcome is likely attributed to limited training samples and underfitting of the model.SSTN is significantly more effective, reducing speckle-like classification errors, but still exhibits misclassifications in certain areas.SuperPCA achieves a high accuracy rate primarily because superpixel blocks provide localized features and enhance intra-class similarity.However, some classes are still affected by contamination from speckles.Both SpaSSA and TensorSSA also exhibit significant classification noise.

( 4 )
Analysis of Running Time: Tables 1-3 likewise give the running times for all the methods.From the table, it can be seen that T-HOSVD, SuperPCA, and SmaT-HOSVD are the fastest and basically belong to the same order of magnitude.The training time for SpectralFormer and SSTN is prolonged due to the model's complexity and the high number of training iterations, even when utilizing GPUs for training.SpaSSA, on the other hand, exhibits a longer runtime because the feature extraction needs to be carried out individually for each band, resulting in significant time consumption.

Figure 8 .
Figure 8.The influence of parameter S on overall accuracy (OA) concerning (a) IP with 2% training, (b) PU with 1% training, and (c) SD with 1% training.

Figure 9 .
Figure 9.The influence of reduction dimensionality n on overall accuracy (OA) concerning (a) IP with 2% training, (b) PU with 1% training, and (c) SD with 1% training.

Figure 10 .
Figure 10.The influence of parameter T 1 and T 2 on overall accuracy (OA) concerning (a) IP with 2% training, (b) PU with 1% training, and (c) SD with 1% training.

Figure 8 .
Figure 8.The influence of parameter S on overall accuracy (OA) concerning (a) IP with 2% training, (b) PU with 1% training, and (c) SD with 1% training.

Figure 8 .
Figure 8.The influence of parameter S on overall accuracy (OA) concerning (a) IP with 2% training, (b) PU with 1% training, and (c) SD with 1% training.

Figure 9 .
Figure 9.The influence of reduction dimensionality n on overall accuracy (OA) concerning (a) IP with 2% training, (b) PU with 1% training, and (c) SD with 1% training.

Figure 8 .
Figure 8.The influence of parameter S on overall accuracy (OA) concerning (a) IP with 2% training, (b) PU with 1% training, and (c) SD with 1% training.

Figure 9 .
Figure 9.The influence of reduction dimensionality n on overall accuracy (OA) concerning (a) IP with 2% training, (b) PU with 1% training, and (c) SD with 1% training.

Figure 10 .
Figure 10.The influence of parameter T 1 and T 2 on overall accuracy (OA) concerning (a) IP with 2% training, (b) PU with 1% training, and (c) SD with 1% training.

Figure 10 .
Figure 10.The influence of parameter T 1 and T 2 on overall accuracy (OA) concerning (a) IP with 2% training, (b) PU with 1% training, and (c) SD with 1% training.
1) Indian Pines: This dataset was acquired by the imaging spectrometer AVRIS over the Indian Pines test field in northwest Indiana.It has a spatial size of 145 × 145 pixels and encompasses a total of 224 bands of reflectance data.The spectral reflectance bands range from 400 nm to 2500 nm.Approximately two-thirds of the hyperspectral Collected by the ROSIS sensor during a flight over Pavia, northern Italy, this dataset comprises 610 × 240 pixels in 115 consecutive bands within the wavelength range of 430 nm to 860 nm.Noise affects 12 of the bands, leading to the selection of the remaining 103 bands for the study.(3) Salinas: Captured by the imaging spectrometer AVRIS which is manufactured by NASA's Jet Propulsion Laboratory (JPL), this dataset represents an image of the Salinas Valley in California, USA.The image size is 512 × 217, and it includes 224 bands.Bands that do not capture water reflections were excluded, resulting in the selection of the remaining 204 bands for the study.The dataset includes areas with vegetables, bare soil, and vineyards.
scene is dedicated to agricultural areas, while one-third is predominantly covered by forest or vegetation.To focus on relevant information, bands that do not capture water reflections were excluded, resulting in the utilization of the remaining 200 bands for the study.(2)Pavia University:

Table 1 .
Classification results obtained by different methods for the IP dataset (2% training percentage).
Bold represents the best result, and the color represents the different categories.

Table 2 .
Classification results obtained by different methods for the PU dataset (1% training percentage).
Bold represents the best result, and the color represents the different categories.

Table 3 .
Classification results obtained by different methods for the SA dataset (1% training percentage).
Bold represents the best result, and the color represents the different categories.

Table 3 .
Classification results obtained by different methods for the SA dataset (1% training percentage).
Bold represents the best result, and the color represents the different categories.

Table 4 .
OAs (%) of different combinations of pixel processing and low-rank representation on three datasets.