1. Introduction
Hyperspectral imaging (HSI) has revolutionized remote sensing by seamlessly combining imaging with subdivided spectroscopy [
1]. This technique provides a unique perspective on surface objects by simultaneously capturing data across numerous small spectral bands over a wide area [
1,
2]. Despite its wide-ranging applications, including military defense, atmospheric research, urban planning, vegetation ecology, and environmental surveillance [
3,
4,
5], HSI faces challenges like spectral data redundancy, limited annotated examples, and significant within-class variation. Addressing these issues is crucial for unlocking the full potential of HSI data analysis [
6].
The classification of HSI stands as a paramount challenge within remote sensing, primarily due to several critical obstacles. To begin with, the paucity of ground truth data or labeled samples poses a significant hindrance to accurate classification endeavors. Acquiring labeled samples for training classifiers proves especially daunting in remote or inaccessible regions, where obtaining ground truth data remains a formidable task. This scarcity not only hampers the development and validation of classification models but also undermines their generalization capabilities, leading to suboptimal performance in real-world scenarios. Additionally, the high dimensionality inherent in HSI data exacerbates classification difficulties, often resulting in sparse data distributions and the curse of dimensionality problem [
7]. With an extensive array of spectral bands, effectively modeling the data distribution becomes a daunting challenge, further compounded by the presence of spectral variability influenced by atmospheric conditions and environmental factors. Moreover, the challenge of spectral variability introduces additional complexity, as the spectral response of observed materials can be significantly altered by atmospheric variations, illumination changes, and environmental conditions. Addressing these challenges necessitates innovative approaches that integrate advanced machine learning techniques, such as deep learning architectures, to extract both spectral and spatial features, thereby enhancing the accuracy and reliability of HSI classification for diverse applications in remote sensing and beyond.
Within the rich tapestry of hyperspectral imaging, where each pixel encapsulates a symphony of over 100 spectral bands, lies the intricate challenge of the curse of dimensionality. This enigma, inherent in high-dimensional HSI data, poses significant obstacles to classification tasks, as it leads to sparsity and increased computational complexity. To address this, dimensionality reduction methods such as Linear Discriminant Analysis (LDA) [
8], Principal Component Analysis (PCA) [
9], and Minimum Noise Fraction (MNF) [
10] have been employed. LDA aims to maximize class separability, while PCA extracts orthogonal components capturing the maximum variance in the data. MNF focuses on noise suppression, enhancing the signal-to-noise ratio. However, these methods suffer from limitations such as information loss, especially in the case of PCA which prioritizes variance over class discrimination. Moreover, computational costs associated with these methods can be prohibitive, particularly for large-scale datasets. Hence, the development of efficient dimensionality reduction techniques is imperative, emphasizing the preservation of discriminative information while minimizing computational overhead. Novel approaches integrating machine learning, such as autoencoders and deep neural networks, offer promising avenues for dimensionality reduction in HSI analysis, striving to strike a balance between computational efficiency and preservation of essential spectral characteristics, thus facilitating accurate and scalable HSI classification.
Traditional methods predominantly rely on spectral information, neglecting the incorporation of spatial data, within the dynamic realm of HSI classification. Approaches such as Support Vector Machine (SVM) [
11] and Multinomial Logistic Regression (MLR) [
12] often restrict their analysis to spectral features, which hinders their ability to capture the comprehensive characteristics of the data. Despite their utility, these techniques exhibit deficiencies in terms of robustness and the completeness of feature extraction. However, the emergence of Convolutional Neural Networks (CNNs), emblematic of deep learning, marked a transformative shift in HSI classification. These intricate algorithms possess an innate capability to autonomously identify intricate patterns from raw data. Various CNN-based architectures, ranging from those equipped with multiple convolutional layers to 2D CNNs [
13], 3D CNNs [
14], and region-based CNNs, have demonstrated remarkable effectiveness in seamlessly integrating both spectral and spatial dimensions. This fusion has significantly enhanced the accuracy and precision of HSI interpretation, paving the path for revolutionary advancements in remote sensing applications. Nonetheless, challenges persist in the adoption of 2D CNN or 3D CNN architectures in HSI classification. While 2D CNNs excel in capturing spatial information, they often struggle to extract discriminative feature maps from spectral dimensions. Conversely, 3D CNNs, though promising, entail considerable computational costs due to extensive convolution operations. Furthermore, the deep variants necessitate larger training datasets, which are often inaccessible given the limited availability of publicly accessible HSI datasets. Additionally, the widespread use of stacked 3D convolutions in many 3D CNN architectures presents optimization challenges [
15], impeding direct estimation loss optimization through such nonlinear structures. These challenges underscore the necessity for ongoing research and innovation in developing more efficient and scalable deep learning architectures tailored to the intricate requirements of HSI classification.
Recent studies underscore the importance of integrating multi-scale spatial characteristics to enhance accuracy, particularly in basic RGB picture segmentation. Models like PSPnet [
16] and the Inception Module effectively amalgamate features across dimensions to capture intricate subtleties and bolster overall performance. Building upon this foundation, our investigation introduces the innovative concept of Spectral Dilated Convolutions (SDC) [
17], drawing inspiration from the Dilated Residual Network (DRN) [
18]. The aim of SDC is to broaden the range of captured wavelengths, while Multiple Spectral Resolution (MSR) [
19] incorporates various levels of detail within the spectral dimension. MSR modules employ a series of 3D convolutional branches meticulously tailored for specific spectral widths, extracting sophisticated and high-level spectral information from HSIs. This pioneering approach significantly enhances the efficiency and precision of HSI analysis by enabling the extraction of spectral properties at multiple scales.
While the incorporation of advanced classification techniques like multi-branch CNN models with HSIs undoubtedly signifies a significant leap forward in remote sensing capabilities, it is important to note that these methods still face challenges in training and classification with limited data [
20,
21]. Despite their effectiveness in combining spectral and spatial information, their performance may be hindered by the scarcity of labeled data available for model training, particularly in real-time applications. As such, while the continuous evolution of HSI technologies underscores the dynamic nature of remote sensing and its pivotal role in enhancing our understanding of the Earth’s surface, ongoing research and advancements in this field must also address the need for more robust methodologies that can effectively extract crucial features and classify data with limited labeled samples.
In our dedicated pursuit of overcoming the myriad challenges inherent in HSI classification, we present a groundbreaking model named Compact Multi-Branch Deep Learning (CMD). This meticulously crafted framework signifies a significant departure from conventional approaches, driven by our unwavering commitment to innovation and advancement. At its essence, CMD employs a sophisticated strategy to address the intricacies of HSI analysis, focusing on three key components crucial for enhancing our understanding of HSI data. A pivotal aspect of CMD lies in its innovative approach to simplifying and extracting valuable insights from HSI data through dimensionality reduction. By integrating two prominent methods, Factor Analysis (FA) and Minimum Noise Fraction (MNF), CMD seeks to harness the strengths of each technique to create a comprehensive and informative feature space. This integration allows CMD to leverage spectral features from two distinct dimensions, providing a richer and more informative dataset for classification tasks. By extracting top features from both FA and MNF and integrating them seamlessly, CMD not only streamlines the data but also enhances the discriminative power of the classification model, ultimately leading to improved accuracy and efficiency in HSI classification. This novel approach represents a significant advancement in HSI classification, offering enhanced capabilities for tasks such as remote sensing, environmental monitoring, and beyond.
Building upon this foundational dimensionality reduction synergy, CMD incorporates a meticulously tailored multi-branch deep learning model. This refinement is designed to enhance training efficiency by minimizing trainable parameters, thereby accelerating training durations without compromising classification accuracy. The deep learning model seamlessly integrates spectral and spatial attributes, uncovering intricate data patterns and relationships—a comprehensive approach poised to redefine the essence of HSI classification paradigms. The primary achievements of our study can be outlined as follows:
Our innovative approach harmonizes FA and MNF methodologies to effectively address high-dimensionality challenges in HSI classification. This transformative process preserves essential features, fostering efficient and accurate classification.
The CMD model redefines conventional architectures through strategic modification of deep learning, optimizing parameters for swifter training and enhanced capture of intricate HSI patterns. This refinement results in superior classification, marking a significant advancement in the field.
The CMD framework integrates dimensionality reduction and modified deep learning, culminating in heightened classification precision. Through the prioritization of crucial features and the minimization of noise, our method empowers accurate predictions, contributing to the overall robustness of HSI classification methodologies.
The subsequent structure of this paper unfolds in three key sections.
Section 2 elaborates on the methodologies used in the proposed CMD method, providing an in-depth literature review that contextualizes our innovative approach. Moving forward,
Section 3 delves into the dataset and experimental analysis, offering thorough descriptions of datasets, meticulous presentation of experimental hyperparameters and configurations, and an extensive analysis and discussion of the obtained results is provided in
Section 4. Finally,
Section 5 concludes the paper by succinctly summarizing the pivotal outcomes and outlining potential avenues for future research endeavors. This comprehensive structure aims to provide a holistic understanding of the CMD framework’s development, application, and implications in the realm of HSI classification.
2. Materials and Methods
2.1. Minimum Noise Fraction (MNF)
The MNF technique stands out as a crucial dimensionality reduction tool in the realm of hyperspectral imagery, offering a sophisticated approach to enhance data clarity and extract vital signal content [
22]. When applied to hyperspectral datasets, MNF serves as an invaluable preprocessing step, particularly due to its ability to mitigate noise interference.
To understand the intricacies of the MNF process, we begin with a hyperspectral data matrix denoted as
X, where the dimensions
m × n represent spectral bands and pixels, respectively. The initial step involves calculating the mean vector
μ across bands for each pixel, resulting in a mean-adjusted data matrix
. This adjustment ensures that the data are centered around their mean, facilitating a clearer representation of the underlying signal patterns. The subsequent step is the derivation of the covariance matrix
W from the transformed
, capturing complex interdependencies among spectral bands. The essence of MNF lies in the eigenvalue decomposition of
W, expressed as [
23]:
Here,
W is the covariance matrix,
V represents the matrix of eigenvectors, and
L is the diagonal matrix of eigenvalues. The transpose of the matrix
V is denoted as
Vt. The eigenvalues and eigenvectors obtained from this decomposition hold crucial information about the underlying spectral characteristics of the hyperspectral data. To further elaborate, MNF employs a transformation matrix
T, defined as:
This transformation matrix is applied to the original hyperspectral data matrix
X, yielding a set of transformed data vectors
Y:
The transformed data vectors
Y possess the property that the first few components have maximum variance, effectively highlighting essential signal patterns while suppressing noise. The transformed data can then be expressed as a product of two matrices:
Here,
Q is the matrix of transformed data vectors, and √
Λ is the square root of the diagonal matrix of eigenvalues. The inclusion of MNF in the representation is particularly noteworthy due to its effectiveness in reducing noise and enhancing the interpretability of hyperspectral data. MNF’s utilization of eigenvalue decomposition and transformation matrices underscores its sophisticated approach to dimensionality reduction in HSIs, rendering it a potent tool for enhancing data quality and enabling subsequent analyses. MNF stands out for its capability to capture intricate interdependencies among spectral bands and to accentuate crucial signal patterns while mitigating noise, thus addressing the challenges associated with extracting key informative features from hyperspectral datasets. Moreover, the presentation of the MNF algorithm’s pseudocode, along with Equations (1)–(4), in Algorithm 1 offers a comprehensive resource for implementing MNF-based dimensionality reduction for hyperspectral data. This preference for MNF is underscored by its ability to effectively address the complexities inherent in hyperspectral data analysis and its provision of a robust framework for extracting essential features crucial for subsequent analyses.
Algorithm 1: Pseudocode for dimensionality reduction using MNF (Dimensionality Reduction of HSI Data using MNF) |
- 1.
Input: Original hyperspectral data matrix of dimensions , where m represents spectral bands and n represents pixels.
|
- 2.
Initialization: Calculate the mean vector across bands for each pixel. Compute the mean-adjusted data matrix by subtracting μ from each pixel in
|
- 3.
Derive Covariance Matrix: Compute the covariance matrix capturing complex interdependencies among spectral bands:
|
- 4.
Eigenvalue Decomposition: Perform eigenvalue decomposition of to obtain eigenvectors and eigenvalues
|
- 5.
Apply Transformation: Compute the matrix product of and , yielding transformed data vectors
|
- 6.
Express Transformed Data Vectors: Compute the square root of the diagonal matrix of eigenvalues . Multiply the matrix of transformed data vectors by , expressing as a product of two matrices: and
|
- 7.
Output: Transformed data vectors .
|
2.2. Factor Analysis
Factor Analysis constitutes another statistical methodology that endeavors to reveal the underlying relationships among observed variables by expressing them in terms of latent variables, referred to as factors. The ultimate goal of this method is to reduce a large number of primary variables to a smaller set of components through the strategic transformation of those variables [
24]. This process not only captures the most common data variances from the original dataset but also enables a more succinct representation of the underlying structure, thereby enhancing the interpretability and efficiency of subsequent analyses. Fundamentally, FA measures the proportion of data variability attributable to shared factors [
25]. To illustrate, consider a collection of observable random variables represented as
W = (
W1, W2………Wn), accompanied by a corresponding mean vector
σ = (
σ1, σ2.……... σn). The fundamental equation governing FA takes the form [
26]:
In this scenario,
P = (
P1,
P2.……...Pn) represents a vector comprising latent factor scores,
= (
1, 2.…….. n) signifies a vector containing latent error terms, and
= (
1,
2, …,
n) denotes the factor loadings matrix. For the pursuit of FA, a distinct approach is employed to estimate the covariance matrix of the observable random variable
W:
Here, φ adopts the structure of a diagonal matrix. The summation of squared loading values within , forming the Pth diagonal element, is identified as the kth communality. This value signifies the proportion of variability that the common factors account for. Moreover, the Pth diagonal element of φ is recognized as the Pth specific variance, representing distinct characteristics inherent to the variable.
2.3. Proposed Crossover Dimensionality Reduction
In enhancing the effectiveness of machine learning models, our proposed crossover dimensionality reduction method combines the merits of MNF and FA. The motivation behind this integration stems from addressing challenges posed by singular dimensionality reduction approaches, such as PCA, which may encounter difficulties in extracting the most informative features. MNF is proficient in preserving the variance of the data, yet it may face limitations in maintaining correlations among features. Conversely, FA excels in preserving interrelationships among variables but may not be as effective in conserving the inherent variability in the dataset. Our proposed method strategically integrates MNF with FA to harness the complementary strengths of both techniques. Mathematically, this integration can be expressed as follows:
Here, X represents the original hyperspectral data matrix, A is the matrix from FA capturing interrelationships, B is the matrix from MNF capturing significant elements explaining data variation, and E represents the residual error matrix. The combination of A and B results in a reduced-dimensional representation that balances the preservation of interrelationships and variability in the dataset.
The efficacy of our proposed method has been empirically demonstrated to surpass that of both MNF and FA individually across diverse datasets. This superiority is attributed to the integration of the advantages offered by both MNF and FA. The utilization of a reduced set of characteristics, in comparison to the individual approaches, contributes to the mitigation of overfitting, leading to improved generalization performance. Moreover, the integrated method exhibits greater resilience to noise compared to its individual components, enhancing its robustness in the presence of background noise. The synergistic fusion of MNF and FA thus emerges as a potent strategy for dimensionality reduction in hyperspectral data, paving the way for more effective machine learning models.
2.4. Proposed Multi-Branch Deep Learning Approach
Among the deep learning models, the CNN represents a sophisticated approach to image processing, utilizing a sequence of filters to extract a diverse array of features from images. These features are then processed through multiple layers, enabling the CNN to effectively classify or segment images based on the features it has extracted. In the realm of HSI classification, CNN architectures play a pivotal role. For instance, SpectralNET [
27], a notable model, adopts a wavelet CNN architecture with 2D CNN in four levels of decomposition. This architecture aims to extract both spectral and spatial features from the HSI data. While 2D CNNs excel at capturing spatial information, they may not fully leverage the abundant spectral information inherent in HSI data. In contrast, 3D CNNs have the potential to extract both spectral and spatial information simultaneously, which could lead to more comprehensive feature extraction. However, the application of 3D CNNs in HSI classification has its challenges. To address these challenges, a model called Fast and Compact 3D CNN [
28] was proposed, integrating incremental PCA for spectral feature reduction. Despite these efforts, both incremental PCA and the 3D CNN architecture were found to be time-consuming and yielded suboptimal results, particularly when trained with limited samples. To mitigate these limitations, a strategy involving the fragmentation of the HSI data cube into overlapping 3D patches has been proposed. This approach enhances the efficiency and effectiveness of feature extraction, thereby improving classification accuracy, especially when dealing with limited training samples. To encompass the
window and all T spectral bands, a collection of 3D contiguous patches
has been devised. The equation presented illustrates the convolution operation of the 3D CNN across three dimensions:
represents the size of the spectral dimension for the 3-D kernel, while k denotes the count of kernels in the layer. The convolutional kernel is linked to the feature map in the rth position of the lth layer.
In a separate investigation, a three-branch Convolutional Neural Network (CNN) termed Tri-CNN was introduced for spectral-spatial feature extraction, coupled with PCA as a feature reduction technique. However, this approach encountered limitations as PCA struggled to effectively handle the nonlinear features present in HSI. Furthermore, within the deep learning architecture, the Tri-CNN model initially extracted spectral features, followed by spatial features, and eventually combined both in the three branches of the CNN. However, relying solely on spectral features proved insufficient to significantly impact classification outcomes. To address these deficiencies, a novel approach was devised, wherein the spatial feature extractor and the spectral-spatial feature extractor were amalgamated with a spectral-only feature extractor, forming a comprehensive three-branch feature fusion network. This architectural enhancement aimed to bolster the extraction of spectral feature characteristics and enhance the overall feature extraction process. Multi-branch CNNs represent a significant advancement from conventional CNNs by incorporating multiple convolutional branches, each specialized in learning distinct features from input images. This integration fosters a more comprehensive understanding of the data, consequently elevating prediction accuracy. As depicted in
Figure 1, the CMD model architecture initially captures spectral features and employs multiple convolution layers for subsequent spatial-spectral feature extraction.
Each block consists of three convolution layers with 8, 16, and 32 filters, respectively. The first block incorporates two convolution 3D layers and one convolution 2D layer, featuring kernel sizes of 3 × 3 × 5 and 3 × 3 × 1 for the first two layers and 3 × 3 for the third layer. The second branch comprises one convolution 3D and two convolution 2D layers, with kernel sizes of 3 × 3 × 5 for the first layer and 3 × 3 for the following two layers. The third block is dedicated to spatial features, housing three convolution 2D layers with kernel sizes of 3 × 3 for the initial two layers and 3 × 1 for the last layer.
Efficient feature extraction in the CMD model is achieved by strategically leveraging smaller convolution kernels, as outlined in previous research [
29]. Despite their compact size, these kernels play a crucial role in enhancing computational efficiency while capturing intricate patterns within hyperspectral data. As the model progresses through subsequent convolution blocks, outputs from different branches are concatenated and flattened, converting multidimensional features into one-dimensional vectors. This streamlined representation ensures a coherent flow of information, leveraging insights from both spectral and spatial dimensions. To address overfitting concerns, fully connected dense layers incorporate two dropout regularizations, preventing the model from relying too heavily on specific features. The final step in the CMD model’s execution is the classification process, where learned features are utilized to make accurate predictions. This comprehensive approach positions the CMD model as a powerful and efficient tool for HSI analysis, demonstrating its ability to distill complex information into meaningful classifications.
3. Dataset and Experimental Analysis
3.1. Dataset Details
In this study, a diverse set of HSI datasets has been meticulously chosen to ensure a comprehensive evaluation of the proposed CMD model. The Salinas Scene (SA), Pavia University (PU), Kennedy Space Center (KSC), and Indian Pines (IP) datasets collectively contribute to the richness and diversity of the data analyzed [
30].
The Salinas Scene dataset (SA) encapsulates a panoramic view with 16 distinct classes, providing detailed spectral information across its spatial dimensions. This dataset facilitates the exploration of various land cover categories, allowing for a thorough characterization and analysis of the scene. The Pavia University dataset (PU) introduces a unique perspective, offering valuable insights into the spectral characteristics of the university environment. With its set of distinct classes, PU contributes to the overall diversity of the study, enabling a nuanced examination of surface features within the university scene. The Kennedy Space Center dataset (KSC) captures the hyperspectral signature of the Kennedy Space Center area and encompasses 13 distinct classes. Each class in the KSC dataset represents different features and materials found within the Kennedy Space Center environment, contributing to a detailed understanding of the spectral signatures associated with various objects and surfaces. The Indian Pines dataset (IP) adds an additional layer of complexity and diversity to the study, featuring a total of 16 distinct classes. This dataset, derived from an agricultural area, provides insights into the spectral variations associated with different crop types and land cover features.
The inclusion of these four diverse datasets—SA, PU, KSC, and IP—ensures a robust evaluation of the proposed model across varying landscapes and class distributions. The comprehensive analysis leveraging these datasets enhances the generalizability and applicability of the study’s findings in the realm of HSI analysis. Further information is elaborated in
Table 1.
3.2. Experimental Hyperparameters and Configuration
In conducting our experiments for HSI classification, we leveraged the powerful computing capabilities of Google Colab, an accessible cloud-based platform. The experiments were conducted within a Python 3.8 environment, with specific attention given to version details for reproducibility. TensorFlow, a leading deep learning framework, was employed with a version of 2.4 to harness its latest features and optimizations. The utilization of GPU acceleration on Google Colab further expedited the model training process, capitalizing on parallel processing capabilities.
To ensure a fair and consistent comparison across experiments, we adhered to a standardized patch extraction process. Three-dimensional patches of uniform dimensions were systematically extracted from the input hyperspectral volumes. This spatial-spectral configuration allowed for a comprehensive analysis of local features within the data. The heart of our experimentation lies in a deep learning model crafted specifically for HSI classification. The model architecture featured three branches of convolutional layers, coupled with two fully connected layers. Notably, to optimize pixel-level data retention, we deliberately omitted pooling layers. The total number of trainable parameters for this model was precisely configured to 663,760, striking a balance between model complexity and computational efficiency.
For the intricate process of model training, we adopted the Adam optimizer, a popular choice for its adaptive learning rate capabilities. The mini-batch size was set at 256, striking a balance between memory efficiency and model convergence. A learning rate of 0.001, coupled with a decay rate of 10
−6, ensured the fine-tuning of model parameters over 100 epochs. This epoch count was meticulously chosen to achieve a convergence point while avoiding overfitting on the available data. Execution of the experiments was seamlessly orchestrated on the Google Colab platform, taking advantage of its user-friendly interface and convenient integration with Jupyter notebooks. The TensorFlow framework, optimized for GPU usage, facilitated the efficient execution of the model on the cloud-based environment. A concise overview of the CMD model and its hyperparameters can be found in
Table 2 and
Table 3.
In summary, our experimental setup on Google Colab encapsulated a judicious selection of configurations, encompassing Python version, TensorFlow version, GPU acceleration, patch extraction dimensions, model architecture, and training parameters. These configurations were meticulously chosen to ensure reproducibility, fairness in comparison, and optimal performance in the challenging task of HSI classification.