Next Article in Journal
Leveraging Machine Learning and Remote Sensing for Water Quality Analysis in Lake Ranco, Southern Chile
Previous Article in Journal
Assessing the Impact of Agricultural Practices and Urban Expansion on Drought Dynamics Using a Multi-Drought Index Application Implemented in Google Earth Engine: A Case Study of the Oum Er-Rbia Watershed, Morocco
Previous Article in Special Issue
Adaptive Background Endmember Extraction for Hyperspectral Subpixel Object Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

HyperKon: A Self-Supervised Contrastive Network for Hyperspectral Image Analysis

by
Daniel La’ah Ayuba
1,*,
Jean-Yves Guillemaut
1,
Belen Marti-Cardona
2 and
Oscar Mendez
1
1
Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, Surrey GU2 7XH, UK
2
Centre for Environmental Health and Engineering, University of Surrey, Guildford, Surrey GU2 7XH, UK
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(18), 3399; https://doi.org/10.3390/rs16183399
Submission received: 26 July 2024 / Revised: 4 September 2024 / Accepted: 10 September 2024 / Published: 12 September 2024
(This article belongs to the Special Issue Advances in Hyperspectral Remote Sensing Image Processing)

Abstract

:
The use of a pretrained image classification model (trained on cats and dogs, for example) as a perceptual loss function for hyperspectral super-resolution and pansharpening tasks is surprisingly effective. However, RGB-based networks do not take full advantage of the spectral information in hyperspectral data. This inspired the creation of HyperKon, a dedicated hyperspectral Convolutional Neural Network backbone built with self-supervised contrastive representation learning. HyperKon uniquely leverages the high spectral continuity, range, and resolution of hyperspectral data through a spectral attention mechanism. We also perform a thorough ablation study on different kinds of layers, showing their performance in understanding hyperspectral layers. Notably, HyperKon achieves a remarkable 98% Top-1 retrieval accuracy and surpasses traditional RGB-trained backbones in both pansharpening and image classification tasks. These results highlight the potential of hyperspectral-native backbones and herald a paradigm shift in hyperspectral image analysis.

1. Introduction

Hyperspectral images (HSIs), with their ability to capture detailed spectral information across hundreds of contiguous bands, have rapidly advanced the capabilities of remote sensing analysis in various domains, including agriculture, mineralogy, and environmental monitoring [1,2]. These high-dimensional data offer a rich representation of scenes, enabling finer material distinctions than traditional RGB and multispectral images [3]. However, the exploitation of HSIs, particularly using deep learning techniques initially designed for RGB images, presents considerable challenges [4].
Convolutional Neural Networks (CNNs), which dominate the computer vision landscape, are predominantly trained and evaluated on RGB data [5]. These models often struggle to generalize effectively to hyperspectral data due to the vast difference in spectral resolution and the unique characteristics of HSIs [5,6,7]. Moreover, the scarcity of native hyperspectral backbones necessitates extensive fine-tuning for their application in this domain.
Hyperspectral-native CNN architectures, including 3D-CNNs [8], spatial–spectral residual CNNs [9,10], and hybrid dilated convolution networks [11], were developed to effectively process spectral–spatial information in hyperspectral data. These models have made significant progress by utilizing attention-based mechanisms that focus on the most relevant spectral and spatial features, thereby enhancing interpretability and boosting performance in various downstream tasks [12]. Hybrid CNN–Transformer architectures, which combine the strengths of local and global feature analysis through spectral–spatial and self-attention layers [13], have also emerged as powerful tools in this domain. However, despite their promising performance, these hybrid architectures still face challenges when dealing with the high-dimensional nature of hyperspectral data. In contrast, existing CNN layers such as the Squeeze and Excitation Block (SEB) [14] and Convolutional Block Attention Module (CBAM) [15] offer efficient solutions for processing high-dimensional data but have been largely unexplored in the hyperspectral domain. Our work presents an ablation study that demonstrates how these layers not only excel at handling high-dimensional hyperspectral data but also bring the benefits of attention mechanisms into traditional CNN architectures.
Recent advancements in the field have led to the development of Remote Sensing Foundation Models (RSFMs), which represent significant progress in the processing and analysis of remote sensing data, including HSIs. These models, drawing inspiration from the achievements of large language models in natural language processing, aim to provide a flexible and reliable foundation for a range of remote sensing applications. One notable example is SpectralGPT [16], which, although trained on multispectral images with 12 channels, adapts the generative pretrained transformer (GPT) architecture for spectral data analysis. SpectralGPT’s training on only 12 bands likely reflects the challenges of scaling transformer architectures to the hundreds of channels typically found in HSIs. Scaling transformers to such a large number of channels leads to a quadratic increase in the model’s number of parameters, making them computationally expensive and challenging to implement effectively in hyperspectral contexts. In contrast, modern attention mechanisms that operate within CNN architectures, such as the SEB, offer similar advantages of attention found in transformers but with a significantly smaller number of parameters.
The emergence of self-supervised learning in remote sensing, exemplified by models like SeCo [17], has opened new avenues for leveraging large amounts of unlabeled satellite imagery. These approaches enable the learning of rich, transferable representations from diverse remote sensing data sources, potentially improving the performance of downstream tasks in HSI analysis. Furthermore, the development of multimodal foundation models for remote sensing, such as the works by [18,19,20], demonstrates the potential for integrating various data sources to create more comprehensive and robust models for Earth observation tasks.
Self-supervised contrastive learning has emerged as a powerful paradigm in remote sensing, addressing the challenge of limited labeled data. Recent works have adapted contrastive learning frameworks for remote sensing applications [21], demonstrating improved performance in land cover change detection tasks. In the hyperspectral domain, approaches like [22,23] have shown promising results in learning spectral–spatial features without relying on labeled data. The integration of contrastive learning with attention mechanisms [24] has further improved the ability to capture long-range dependencies in hyperspectral data, leading to more robust models.
Despite these advances, several challenges persist in HSI analysis. The high dimensionality and distinct properties of hyperspectral data continue to pose computational challenges, particularly in resource-constrained environments like onboard satellite systems. The increasing demand for real-time remote sensing data analysis, as exemplified by missions like Intuition-1 [25], necessitates the development of efficient, lightweight models capable of processing hyperspectral data in orbit. To address the constraints and scarcity of hyperspectral-native solutions, we introduce HyperKon, a self-supervised contrastive network trained solely on HSIs from Environmental Mapping and Analysis Program (EnMAP) [26]. HyperKon, unlike generic CNN backbones and newer RSFMs, is trained on HSI with 224 channels while maintaining a compact architecture.
The main contributions of this paper are:
  • HyperKon: a hyperspectral-native CNN backbone that can learn useful representations from large amounts of unlabeled data.
  • EnHyperSet-1: an EnMAP dataset curated for use in precision agriculture and other deep learning projects.
  • Hyperspectral perceptual loss: a novel perceptual loss function minimizing errors in the spectral domain.
  • Demonstration that the representations learned by HyperKon improve performance in hyperspectral downstream tasks.
The remainder of this paper is organized as follows. Section 2 describes the materials and methodology, including the HyperKon architecture and contrastive learning approach. Section 3 presents experimental results and evaluations. Section 4 offers discussions on the findings, and Section 5 concludes the paper with future research directions.

2. Materials and Methods

2.1. Dataset

This study introduces EnHyperSet-1, a hyperspectral dataset curated from the EnMAP mission [26] for deep learning applications. The dataset comprises 800 scenes (200 Level 1B, 200 Level 1C, 400 Level 2A), each with an average pixel size of 1300 × 1200 and 224 spectral bands ranging from 420 nm to 2450 nm. Each 30 m × 30 m pixel captures spectral information at a resolution of 6.5 nm to 12 nm. EnHyperSet-1 features diverse global urban, forest, and agricultural scenes, as summarized in Table 1. For analysis, we extracted 160 × 160 pixel patches using a sliding window with a 5% overlap buffer, with edge patches zero-padded when necessary. The choice of 160 × 160 pixel patches was driven by the need to maintain consistency with existing networks and backbones used in hyperspectral image processing, such as hyperspectral pansharpening [27], while also balancing spatial detail and computational efficiency. Although the dataset consisted of data from different processing levels, these levels were not conflated or used as one. Instead, they were strategically leveraged for contrastive learning, enabling the network to learn more robust representations. A detailed explanation of how these different processing levels were utilized in the contrastive learning process is provided in Section 2.3.2.
HyperKon was pretrained on EnHyperSet-1 using self-supervised contrastive learning with the Normalized Temperature-Scaled Cross Entropy (NT-Xent) loss [28]. The network was trained for 1000 epochs with a batch size of 32, using the Adam optimizer (initial learning rate: 1 × 10−4) and a StepLR scheduler. This methodology enabled effective representation learning from unlabeled hyperspectral data while addressing the unique challenges of different processing levels.

2.2. Hyperspectral Backbone Architecture

The HyperKon architecture (Figure 1) is based on ResNeXt’s multibranch cardinality [29] and employs parallel paths to improve representational and computational efficiency. The ResNeXt architecture has fewer parameters, making it less sensitive to learning rates and other hyperparameters, especially when compared to its predecessor, ResNet [30].
Developing an effective feature extractor for this network required overcoming two main challenges: capturing the complex, high-dimensional spectral and spatial features of hyperspectral data and addressing the computational constraints associated with such high-dimensional data. In our model, we fine-tuned the feature maps to better reflect channel interdependencies using a specialized architecture based on the SEB [14]. This approach processed an input feature map X R H × W × C to calculate channelwise statistics, allowing the model to more effectively highlight relevant spectral and spatial details within the hyperspectral data. This targeted recalibration ensured the network’s computations were directly aligned with the critical features of the data, leading to more efficient and accurate analysis.
The excitation operation then used a gating mechanism:
s c = 1 H × W i = 1 H j = 1 W x i j c , e c = σ ( β · s c + γ ) ,
where H and W represent the height and width of the input feature map, respectively. The term x i j c represents the value of the feature map X at spatial position ( i , j ) in channel c. Here, σ is the sigmoid activation function, and β and γ are trainable parameters. The recalibrated feature map X R H × W × C was then given by:
x i j c = e c · x i j c
This recalibration emphasized channels with high interdependencies and suppresses others. The feature extractor provided preliminary feature representations which, while powerful, might not be fine-tuned to the specific challenges presented by hyperspectral data. To address this, we introduced a projection head. Given the deep feature representation F R D from the feature extractor, the projection head transformed it into a new space Z R D :
Z = W 2 σ ( W 1 F + b 1 ) + b 2
where W 1 R D × D , W 2 R D × D , b 1 R D , and b 2 R D are trainable parameters, and σ is a non-linear activation function. This transformation aided in emphasizing discriminative features for hyperspectral data.

2.3. Contrastive Learning

2.3.1. Self-Supervised Contrastive Loss

While Triplet loss [31] and InfoNCE [32] can be effective, a careful selection of samples is especially challenging for hyperspectral data. The NT-Xent loss [28], which builds upon the InfoNCE concept with a softmax output layer, incorporates an L2 normalization of embeddings and typically functions as a symmetric loss. It encourages the network to discriminate between augmented versions (positive pairs) of the same hyperspectral sample and dissimilar samples (negative pairs) within a mini-batch. The primary motivation for selecting NT-Xent loss is its efficiency in contexts where generating explicit negative samples for HSIs is challenging. Positive pairs are typically generated through data augmentation techniques like random spectral scaling, spatial cropping, and random channel permutations [33]. While methods like MoCo [34] and SwAV [35] introduce momentum queues and memory banks to maintain a large set of negatives, NT-Xent relies solely on the current mini-batch for negative sampling. This computational efficiency is particularly beneficial for training on hyperspectral data, which often have high dimensionality due to the large number of spectral bands (≥200 bands). Furthermore, it is adaptable to varying batch sizes, uses cosine similarity for feature vector orientation, and incorporates a temperature parameter that controls the influence of the discrimination task on the learning process [36].
The NT-Xent loss is defined as:
( z i , z j ) = 1 N i = 1 N log exp ( s i m ( z i , z i ) / τ ) j = 1 2 N exp ( s i m ( z i , z j ) / τ )
where N is the number of samples, z i is the feature representation of sample i, s i m ( z i , z j ) = z i T z j is the cosine similarity between z i and z j , τ is a temperature parameter that controls the smoothness of the distribution, and z i refers to the positive feature representation for the query sample i.

2.3.2. HSI Contrastive Sampling

Self-supervised contrastive learning relies heavily on data augmentation to generate informative positive and negative sample pairs for training [37]. In our approach, we leveraged the unique characteristics of hyperspectral data by utilizing different processing levels (e.g., 1B, 1C, 2A) to create naturally augmented samples before applying traditional transformations.
Let us denote x i l I k h as a sequence of patches ( x i l ) taken from a series of HSIs ( I k h k = 0 S ) , where S represents the dataset size and l represents the processing level. For each anchor patch x i l 1 , we selected its positive pair x i l 2 from the same spatial region but in a different processing level. This approach ensured that positive pairs maintained spatial content while introducing processing-level variations.
These patches then underwent a series of spectral and spatial transformations denoted by A = a 1 , a 2 a n , which were enacted on the patches x i l as follows:
x i l = a n ( a 2 ( a 1 ( x i l ) ) )
To further enhance the learning process, we leveraged the concept of hard negative mining during triplet selection [38]. This strategy focused on choosing negative pairs within the same batch that were most similar to the anchor patch. Selecting such “hard negatives” forced the model to learn more discriminative features that distinguish between even subtle spectral variations, which is crucial for HSI data due to their high dimensionality and potential spectral redundancy.
Mathematically, we used an encoder function f to map each patch x to its corresponding embedding z = f ( x ) . The cosine similarity function s ( z i , z j ) then measured the similarity between embeddings z i and z j for a batch of patches. During hard negative mining, for a given anchor patch x a and its positive pair x + , the hardest negative patch x was selected as:
x = argmax x k { x 1 , x 2 , , x B } { x a , x + } s ( f ( x a ) , f ( x k ) )
This means the patch x k that gave the highest similarity score with the anchor patch x a was chosen from all patches in the batch, with the exception of the positive pair x + and the anchor patch x a itself (as depicted in Figure 2). By combining the use of different processing levels with hard negative mining and traditional augmentations, our approach emphasized the importance of targeted data augmentation in HSIs. This method leveraged the advantages of self-supervised learning, improving task performance by learning detailed representations from large, unlabeled datasets while preserving the unique spectral characteristics of hyperspectral data.

2.4. Hyperspectral Perceptual Loss

The HyperSpectral Perceptual Loss (HSPL) function is designed to quantify the spectral differences between a predicted hyperspectral image ( I ^ ) and its reference ( I r e f ) in the feature embedding space. Unlike traditional loss functions that operate directly on pixel values, the HSPL leveraged the rich spectral information captured by our pretrained HyperKon network. This approach allowed for a more nuanced comparison of hyperspectral data, capturing complex spectral relationships that may not have been evident in simple pixelwise comparisons.
The HSPL is computed as follows:
h s p l = l = 1 N 1 C l H l W l f h ( I r e f , l ) f h ( I ^ , l ) F
where f h ( · , l ) represents the feature maps extracted by the lth layer of the pretrained network, N is the total number of layers considered, and C l , H l , and W l are the number of channels, height, and width of the feature maps at layer l, respectively. The Frobenius norm · F quantifies the difference between the feature maps, effectively capturing the perceptual difference between the images in the hyperspectral domain.
Figure 3 illustrates the conceptual difference between the HSPL and traditional RGB-based perceptual losses. While RGB losses are limited to three broad spectral bands, the HSPL leverages the full spectral range of hyperspectral data, providing a more comprehensive measure of spectral similarity.
In terms of computational complexity, the HSPL does incur additional overhead compared to simpler pixelwise losses due to the need to compute feature maps across multiple layers. However, this increased complexity is offset by the improved performance and more meaningful loss calculations, especially for tasks that require accurate preservation of spectral characteristics.

3. Results

3.1. Ablation Study

We evaluated multiple versions of the model, each integrating different architectural components, to see how well each component of our HyperKon architecture performed and how it impacted the network’s capacity to learn meaningful representations from hyperspectral data.
Figure 4 presents the results of this ablation study, showing the Top-1 HSI retrieval accuracy for each model variant during the pretraining phase. We experimented with several key components: 3D convolutions to capture spatial–spectral relationships, depthwise separable convolutions (Depthwise Separable Convolutions (DSC)) for efficient feature extraction, convolutional block attention module (CBAM) to enhance feature refinement, and squeeze and excitation block (SEB) for adaptive feature recalibration. The results clearly demonstrated the impact of each component on the model’s performance. Notably, the version incorporating the SEB achieved the highest accuracy, surpassing other configurations. This suggested that the adaptive feature recalibration provided by the SEB was particularly effective in capturing the complex spectral–spatial relationships in hyperspectral data.

Band Attention

We investigated various dimensionality reduction techniques commonly applied to hyperspectral data [39,40,41], including Principal Component Analysis (PCA), manual, and full band selection. These techniques are crucial in HSI processing for streamlining computation and refining feature extraction by discarding redundant information.
As shown in Figure 5, the 3D convolution model outperformed both the SEB and CBAM models when dimensionality was reduced using PCA. While the SEB and CBAM generally excelled at selecting and prioritizing information across the entire spectrum of hyperspectral bands, their advantage diminishes when PCA was applied, potentially rendering their roles redundant. In contrast, 3D convolutions demonstrated superior performance when band selection was conducted a priori using PCA with 112 components, likely because 3D convolutions can effectively learn from both contiguous and non-contiguous spectral channels. This flexibility allowed 3D convolutions to adapt to the structure of the data, regardless of the continuity of the spectral bands.
As suggested in Figure 6, manual band selection gave 3D convolutions a distinct advantage. However, most notably, the SEB and CBAM clearly surpassed 3D convolution when no band selection was carried out. This observation supported our hypothesis that when band selection was not performed a priori, attention mechanisms such as the SEB and CBAM had the ability to focus on the most relevant spectral bands across the entire channel spectrum. This clearly demonstrated their robustness and ability to manage the high dimensionality common with hyperspectral data.

3.2. Super-Resolution

For the super-resolution evaluation, we utilized datasets such as the Pavia Center [42], Botswana [43], Chikusei [44], and EnMAP [26] datasets, each featuring different spectral bands. Specific bands from the EnHyperSet-1 dataset were selected to match the wavelength of each dataset, optimizing the training of HyperKon. The data preparation adhered to methodologies outlined in [27,45]. The results from HyperKon were benchmarked against traditional RGB-trained backbones using metrics [46,47,48] such as the Correlation Coefficient (CC), Spectral Angle Mapper (SAM), Root-Mean-Square Error (RMSE), Erreur Relative Globale Adimensionnelle de Synthèse (ERGAS), and Peak Signal-to-Noise Ratio (PSNR).
Table 2 shows the average quantitative results for the Pavia Center dataset. HyperKon achieved superior performance across multiple metrics compared to other methods, demonstrating its effectiveness in hyperspectral super-resolution. A broader comparison of the average quantitative pansharpening results for RGB-native vs. HSI-native perceptual loss across multiple datasets is presented in Table 3. HyperKon consistently achieved the best results, indicating the advantages of a hyperspectral-native approach. Figure 7 provides visual results generated by different pansharpening algorithms, including HyperKon, for the Pavia Center, Botswana, Chikusei, and EnMAP datasets. Mean Absolute Error (MAE) heatmaps show that HyperKon had the lowest error across all spectral bands, further highlighting its superior performance.
Figure 7. Visual results generated by different pansharpening algorithms (HyperPNN [49], DARN [45], GPPNN [50], HyperTransformer [51], HyperKon (ours), and ground truth) for Pavia Center [42], Botswana [43], Chikusei [44], and EnMAP [26] datasets. MAE denotes the (normalized) Mean Absolute Error across all spectral bands.
Figure 7. Visual results generated by different pansharpening algorithms (HyperPNN [49], DARN [45], GPPNN [50], HyperTransformer [51], HyperKon (ours), and ground truth) for Pavia Center [42], Botswana [43], Chikusei [44], and EnMAP [26] datasets. MAE denotes the (normalized) Mean Absolute Error across all spectral bands.
Remotesensing 16 03399 g007
Table 2. Average quantitative results: Pavia Center [42] *.
Table 2. Average quantitative results: Pavia Center [42] *.
Pavia Center Dataset
MethodCCSAMRMSEERGASPSNR
HySure [52]0.9666.131.83.7735.91
HyperPNN [49]0.9676.091.673.8236.7
PanNet [53]0.9686.361.833.8935.61
Darn [45]0.9696.431.563.9537.3
HyperKite [27]0.985.611.292.8538.65
SIPSA [54]0.9485.272.384.5233.65
GPPNN [50]0.9636.521.914.0535.36
HyperTransformer [51]0.98814.14940.98620.534640.9525
HyperKon0.98833.95510.93690.515241.9808
* Best values are in bold, 2nd best values are underlined. RMSE values are × 10 2 . ↑ Means higher value is better. ↓ Means smaller value is better.
Table 3. Comparison of the average quantitative pansharpening results for RGB-native vs. HSI-native perceptual loss *.
Table 3. Comparison of the average quantitative pansharpening results for RGB-native vs. HSI-native perceptual loss *.
Botswana DatasetChikusei DatasetPavia Dataset
MetricRGB-NativeHS-NativeRGB-NativeHS-NativeRGB-NativeHS-Native
CC↑0.91040.94110.98010.97770.98810.9883
SAM↓3.14592.57982.25472.41924.14943.9551
RMSE↓0.02330.01930.01230.01310.00980.0093
ERGAS↓0.67530.52490.86620.91930.53460.5152
PSNR↑27.392529.412836.886136.288940.952541.9808
* Best values are in bold. ↑ Means higher value is better. ↓ Means smaller value is better.

3.3. Transfer Learning Capability of HyperKon

In addition to the super-resolution task, the performance of the HyperKon network was evaluated on the HSI classification task using the Indian Pines, Pavia University, and Salinas Scene datasets [55]. These datasets, covering a wide range of crops, serve as an excellent benchmark for assessing the accuracy and proficiency of the HyperKon network in classifying different crops. The frozen backbone of the HyperKon network, which had been pretrained on a variety of hyperspectral data, was utilized for this task. For these transfer learning experiments, we employed a patch size of 25 × 25 pixels, which provided a suitable balance between spatial context and computational efficiency for the downstream classification task.
The results were juxtaposed with those of established methods such as SSAN [56], SSRN [57], RvT [58], HiT [59], SSFTT [60], and QSSPN [61]. The assessment utilized overall accuracy (OA), average accuracy (AA), and Kappa coefficient (Kappa) metrics. As shown in Table 4, HyperKon consistently matched or surpassed other networks, highlighting its robustness and adaptability to new data. The high accuracy demonstrated how the model could use discriminative features from its self-supervised learning phase to provide predictions for HSI classification that were reliable. The qualitative outcomes of the HSI classification can be viewed in Figure 8, and Figure 9 shows a zoomed-out prediction accuracy map for Indian Pines.

3.4. Model Efficiency Analysis

As shown in Table 4, HyperKon demonstrated higher classification accuracy across the three datasets, consistently outperforming other methods in overall accuracy, average accuracy, and Kappa coefficient. However, this performance came at a computational cost. HyperKon had a relatively high parameter count (5.54 M for Indian Pines, 4.08 M for Pavia University, 5.62 M for Salinas Scene) compared to more lightweight models like SSAN [56] and SSFTT [60]. Its FLOPs were also higher, ranging from 370.59 M to 1.32 G. Despite this, as evident from Table 5, HyperKon achieved near-perfect classification accuracy for most classes across all three datasets, with only a few classes in Indian Pines falling slightly below 100%. The model’s inference times were consistently low (around 0.0009 s per sample) across datasets, resulting in high throughput (1038–1088 samples/second). All experiments were carried out in a PyTorch [62] environment using an NVIDIA RTX 3090 GPU with 24 GB of memory.

4. Discussion

4.1. Architectural Considerations

The ablation study results (Figure 4) highlight the importance of carefully selecting architectural components for hyperspectral data processing. The superior performance of the SEB over other configurations suggests that adaptive feature recalibration is particularly effective for capturing the complex spectral–spatial relationships in HSIs.

4.2. Self-Supervised Learning for Hyperspectral Data

The success of HyperKon in learning meaningful representations from unlabeled hyperspectral data emphasizes the potential of self-supervised learning approaches in remote sensing. By achieving high Top-1 retrieval accuracy during pretraining (Figure 4), our model demonstrated its ability to capture important spectral–spatial features without the need for extensive labeled datasets. This is particularly valuable in the hyperspectral domain, where labeled data are often scarce and expensive to obtain [63,64].

4.3. Transfer Learning and Downstream Task Performance

The strong performance of HyperKon in downstream tasks, particularly in hyperspectral super-resolution (Table 2 and Table 3) and image classification (Table 4), demonstrates the transferability of the learned representations. This is a crucial finding, as it suggests that self-supervised pretraining on diverse hyperspectral data can lead to robust features that generalize well to specific tasks and datasets. The superior performance of HyperKon compared to RGB-native approaches in pansharpening tasks (Table 3) highlights the benefits of developing hyperspectral-native models. This aligns with previous hypotheses suggesting that models designed for RGB imagery may not fully exploit the rich spectral information available in hyperspectral data.

4.4. Hyperspectral Perceptual Loss

The introduction of the HSPL represents a novel contribution to the field of HSI processing. By focusing on spectral differences across all bands, rather than just pixelwise differences, the HSPL provides a more comprehensive measure of similarity for HSIs. The improved performance observed when using the HSPL (Figure 3) suggests that this approach could be valuable for a range of HSI processing tasks beyond super-resolution, such as image fusion or denoising [65].

4.5. Limitations and Future Work

While HyperKon demonstrates excellent performance across various tasks, there are several areas for potential improvement and future research. Future work could explore techniques such as network pruning or quantization to reduce computational requirements without sacrificing performance [66]. While HyperKon focuses on hyperspectral data, future research could explore integrating information from other sensor modalities, such as LiDAR or SAR, to create more comprehensive representations of Earth observation data. The current model does not explicitly account for temporal changes in hyperspectral imagery. Incorporating temporal information could enhance the model’s ability to capture dynamic processes such as vegetation phenology.

5. Conclusions

This study presented HyperKon, a self-supervised contrastive network developed for HSI analysis. HyperKon uses a unique hyperspectral-native convolutional architecture and the novel HSPL function to enhance performance in hyperspectral super-resolution and classification applications. The experimental results suggested that HyperKon outperformed traditional RGB-trained models and other state-of-the-art approaches, demonstrating its ability to preserve spectral integrity and capture complex spectral–spatial relationships. The successful application of self-supervised contrastive learning allowed for robust feature extraction from large volumes of unlabeled hyperspectral data.
The creation of the EnHyperSet-1 dataset, a comprehensive collection of high-resolution HSIs, contributed to advance research in this field. The dataset supports the development of models like HyperKon, which are capable of handling the high dimensionality and unique characteristics of hyperspectral data. Future research directions will involve integrating multi-modal data, improving model interpretability, and optimizing training and inference pipelines for resource-constrained environments. These efforts will ensure that HyperKon remains a valuable tool in the constantly evolving field of remote sensing.

Author Contributions

Conceptualization, D.L.A., B.M.-C. and O.M.; methodology, D.L.A.; software, D.L.A.; validation, D.L.A., B.M.-C. and O.M.; formal analysis, D.L.A.; investigation, D.L.A.; resources, O.M. and B.M.-C.; data curation, D.L.A. and B.M.-C.; writing—original draft preparation, D.L.A.; writing—review and editing, D.L.A., B.M.-C., J.-Y.G. and O.M.; visualization, D.L.A.; supervision, B.M.-C., J.-Y.G. and O.M.; project administration, O.M.; funding acquisition, O.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All EnMAP data are freely available through the EnMAP data access portal at the following link: https://www.enmap.org/data_access/. The EnMAP data are licensed products of The German Aerospace Center (DLR), all rights reserved. The utility tool to create EnHyperset-1 is available here: https://github.com/kleffy/enhyperset. Indian Pines, Pavia University, and Salinas dataset are avialble here—https://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes.

Acknowledgments

This research is part of a PhD study funded by https://sixteensands.com/ (Sixteen Sands Ltd.) The authors express their gratitude to Abayomi Awobokun for his generous support and insightful discussions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MAEMean Absolute Error
RMSERoot-Mean-Square Error
PSNRPeak Signal-to-Noise Ratio
CCCorrelation Coefficient
SAMSpectral Angle Mapper
ERGASErreur Relative Globale Adimensionnelle de Synthèse
CNNConvolutional Neural Network
DSCDepthwise Separable Convolutions
SEBSqueeze and Excitation Block
CBAMConvolutional Block Attention Module
PCAPrincipal Component Analysis
HSIHyperspectral image
NT-XentNormalized Temperature-Scaled Cross Entropy
HSPLHyperSpectral Perceptual Loss
RSFMRemote Sensing Foundation Model
EnMAPEnvironmental Mapping and Analysis Program

References

  1. Cheng, C.; Zhao, B. Prospect of application of hyperspectral imaging technology in public security. In Proceedings of the International Conference on Applications and Techniques in Cyber Security and Intelligence ATCI 2018: Applications and Techniques in Cyber Security and Intelligence; Springer: Berlin/Heidelberg, Germany, 2019; pp. 299–304. [Google Scholar]
  2. Brisco, B.; Brown, R.; Hirose, T.; McNairn, H.; Staenz, K. Precision agriculture and the role of remote sensing: A review. Can. J. Remote. Sens. 1998, 24, 315–327. [Google Scholar] [CrossRef]
  3. da Lomba Magalhães, M.J. Hyperspectral Image Fusion—A Comprehensive Review. Master’s Thesis, University of Eastern Finland, Kuopio, Finland, 2022. [Google Scholar]
  4. Yu, S.; Jia, S.; Xu, C. Convolutional neural networks for hyperspectral image classification. Neurocomputing 2017, 219, 88–98. [Google Scholar] [CrossRef]
  5. Signoroni, A.; Savardi, M.; Baronio, A.; Benini, S. Deep learning meets hyperspectral image analysis: A multidisciplinary review. J. Imaging 2019, 5, 52. [Google Scholar] [CrossRef]
  6. Shi, C.; Sun, J.; Wang, L. Hyperspectral image classification based on spectral multiscale convolutional neural network. Remote. Sens. 2022, 14, 1951. [Google Scholar] [CrossRef]
  7. Bouchoucha, R.; Braiek, H.B.; Khomh, F.; Bouzidi, S.; Zaatour, R. Robustness assessment of hyperspectral image CNNs using metamorphic testing. Inf. Softw. Technol. 2023, 162, 107281. [Google Scholar] [CrossRef]
  8. Li, Y.; Zhang, H.; Shen, Q. Spectral–spatial classification of hyperspectral imagery with 3D convolutional neural network. Remote. Sens. 2017, 9, 67. [Google Scholar] [CrossRef]
  9. Feng, F.; Wang, S.; Wang, C.; Zhang, J. Learning deep hierarchical spatial–spectral features for hyperspectral image classification based on residual 3D-2D CNN. Sensors 2019, 19, 5276. [Google Scholar] [CrossRef]
  10. Lu, Z.; Xu, B.; Sun, L.; Zhan, T.; Tang, S. 3-D channel and spatial attention based multiscale spatial–spectral residual network for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2020, 13, 4311–4324. [Google Scholar] [CrossRef]
  11. Li, C.; Qiu, Z.; Cao, X.; Chen, Z.; Gao, H.; Hua, Z. Hybrid dilated convolution with multi-scale residual fusion network for hyperspectral image classification. Micromachines 2021, 12, 545. [Google Scholar] [CrossRef]
  12. Gbodjo, Y.J.E.; Ienco, D.; Leroux, L.; Interdonato, R.; Gaetano, R.; Ndao, B. Object-based multi-temporal and multi-source land cover mapping leveraging hierarchical class relationships. Remote. Sens. 2020, 12, 2814. [Google Scholar] [CrossRef]
  13. Li, Q.; Zhong, R.; Du, X.; Du, Y. TransUNetCD: A hybrid transformer network for change detection in optical remote-sensing images. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 5622519. [Google Scholar] [CrossRef]
  14. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  15. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–28 October 2018; pp. 3–19. [Google Scholar]
  16. Hong, D.; Zhang, B.; Li, X.; Li, Y.; Li, C.; Yao, J.; Yokoya, N.; Li, H.; Ghamisi, P.; Jia, X.; et al. SpectralGPT: Spectral remote sensing foundation model. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5227–5244. [Google Scholar] [CrossRef]
  17. Manas, O.; Lacoste, A.; Giró-i Nieto, X.; Vazquez, D.; Rodriguez, P. Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 9414–9423. [Google Scholar]
  18. He, X.; Chen, Y.; Huang, L.; Hong, D.; Du, Q. Foundation model-based multimodal remote sensing data classification. IEEE Trans. Geosci. Remote. Sens. 2023, 62, 5502117. [Google Scholar] [CrossRef]
  19. Guo, X.; Lao, J.; Dang, B.; Zhang, Y.; Yu, L.; Ru, L.; Zhong, L.; Huang, Z.; Wu, K.; Hu, D.; et al. Skysense: A multi-modal remote sensing foundation model towards universal interpretation for earth observation imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2024; pp. 27672–27683. [Google Scholar]
  20. Yan, Z.; Li, J.; Li, X.; Zhou, R.; Zhang, W.; Feng, Y.; Diao, W.; Fu, K.; Sun, X. RingMo-SAM: A foundation model for segment anything in multimodal remote-sensing images. IEEE Trans. Geosci. Remote. Sens. 2023, 61, 1–16. [Google Scholar] [CrossRef]
  21. Dong, H.; Ma, W.; Wu, Y.; Zhang, J.; Jiao, L. Self-supervised representation learning for remote sensing image change detection based on temporal prediction. Remote. Sens. 2020, 12, 1868. [Google Scholar] [CrossRef]
  22. Hou, S.; Shi, H.; Cao, X.; Zhang, X.; Jiao, L. Hyperspectral imagery classification based on contrastive learning. IEEE Trans. Geosci. Remote. Sens. 2021, 60, 1–13. [Google Scholar] [CrossRef]
  23. Huang, L.; Chen, Y.; He, X. Spectral–spatial masked transformer with supervised and contrastive learning for hyperspectral image classification. IEEE Trans. Geosci. Remote. Sens. 2023, 61, 1–18. [Google Scholar] [CrossRef]
  24. Hu, X.; Li, T.; Zhou, T.; Liu, Y.; Peng, Y. Contrastive learning based on transformer for hyperspectral image classification. Appl. Sci. 2021, 11, 8670. [Google Scholar] [CrossRef]
  25. Nalepa, J.; Myller, M.; Cwiek, M.; Zak, L.; Lakota, T.; Tulczyjew, L.; Kawulok, M. Towards on-board hyperspectral satellite image segmentation: Understanding robustness of deep learning through simulating acquisition conditions. Remote. Sens. 2021, 13, 1532. [Google Scholar] [CrossRef]
  26. Storch, T.; Honold, H.P.; Chabrillat, S.; Habermeyer, M.; Tucker, P.; Brell, M.; Ohndorf, A.; Wirth, K.; Betz, M.; Kuchler, M.; et al. The EnMAP imaging spectroscopy mission towards operations. Remote. Sens. Environment 2023, 294, 113632. [Google Scholar] [CrossRef]
  27. Bandara, W.G.C.; Valanarasu, J.M.J.; Patel, V.M. Hyperspectral pansharpening based on improved deep image prior and residual reconstruction. IEEE Trans. Geosci. Remote. Sens. 2021, 60, 1–16. [Google Scholar] [CrossRef]
  28. Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, PMLR, Honolulu, HI, USA, 23–29 July 2020; pp. 1597–1607. [Google Scholar]
  29. Wu, P.; Cui, Z.; Gan, Z.; Liu, F. Three-dimensional resnext network using feature fusion and label smoothing for hyperspectral image classification. Sensors 2020, 20, 1652. [Google Scholar] [CrossRef] [PubMed]
  30. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26–30 June 2016; pp. 770–778. [Google Scholar]
  31. Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
  32. Oord, A.v.d.; Li, Y.; Vinyals, O. Representation learning with contrastive predictive coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
  33. Ohri, K.; Kumar, M. Review on self-supervised image recognition using deep neural networks. Knowl.-Based Syst. 2021, 224, 107090. [Google Scholar] [CrossRef]
  34. He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
  35. Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural Inf. Process. Syst. 2020, 33, 9912–9924. [Google Scholar]
  36. Le-Khac, P.H.; Healy, G.; Smeaton, A.F. Contrastive representation learning: A framework and review. IEEE Access 2020, 8, 193907–193934. [Google Scholar] [CrossRef]
  37. Purushwalkam, S.; Gupta, A. Demystifying contrastive self-supervised learning: Invariances, augmentations and dataset biases. Adv. Neural Inf. Process. Syst. 2020, 33, 3407–3418. [Google Scholar]
  38. Robinson, J.; Chuang, C.Y.; Sra, S.; Jegelka, S. Contrastive learning with hard negative samples. arXiv 2020, arXiv:2010.04592. [Google Scholar]
  39. Li, W.; Feng, F.; Li, H.; Du, Q. Discriminant analysis-based dimension reduction for hyperspectral image classification: A survey of the most recent advances and an experimental comparison of different techniques. IEEE Geosci. Remote. Sens. Mag. 2018, 6, 15–34. [Google Scholar] [CrossRef]
  40. Kumar, B.; Dikshit, O.; Gupta, A.; Singh, M.K. Feature extraction for hyperspectral image classification: A review. Int. J. Remote. Sens. 2020, 41, 6248–6287. [Google Scholar] [CrossRef]
  41. Zhang, L.; Luo, F. Review on graph learning for dimensionality reduction of hyperspectral image. Geo-Spat. Inf. Sci. 2020, 23, 98–106. [Google Scholar] [CrossRef]
  42. Plaza, A.; Benediktsson, J.A.; Boardman, J.W.; Brazile, J.; Bruzzone, L.; Camps-Valls, G.; Chanussot, J.; Fauvel, M.; Gamba, P.; Gualtieri, A.; et al. Recent advances in techniques for hyperspectral image processing. Remote. Sens. Environ. 2009, 113, S110–S122. [Google Scholar] [CrossRef]
  43. Ungar, S.G.; Pearlman, J.S.; Mendenhall, J.A.; Reuter, D. Overview of the earth observing one (EO-1) mission. IEEE Trans. Geosci. Remote. Sens. 2003, 41, 1149–1159. [Google Scholar] [CrossRef]
  44. Yokoya, N.; Iwasaki, A. Airborne Hyperspectral Data over Chikusei; Tecnical Report SAL-2016-05-27; University Tokyo: Tokyo, Japan, 2016; Volume 5. [Google Scholar]
  45. Zheng, Y.; Li, J.; Li, Y.; Guo, J.; Wu, X.; Chanussot, J. Hyperspectral pansharpening using deep prior and dual attention residual network. IEEE Trans. Geosci. Remote. Sens. 2020, 58, 8059–8076. [Google Scholar] [CrossRef]
  46. Singh, A.K.; Kumar, H.; Kadambi, G.R.; Kishore, J.; Shuttleworth, J.; Manikandan, J. Quality metrics evaluation of hyperspectral images. Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci. 2014, 40, 1221–1226. [Google Scholar] [CrossRef]
  47. Deborah, H.; Richard, N.; Hardeberg, J.Y. A comprehensive evaluation of spectral distance functions and metrics for hyperspectral image processing. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2015, 8, 3224–3234. [Google Scholar] [CrossRef]
  48. Chaithra, C.; Taranath, N.; Darshan, L.; Subbaraya, C. A Survey on Image Fusion Techniques and Performance Metrics. In Proceedings of the 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), IEEE, Coimbatore, India, 29–31 March 2018; pp. 995–999. [Google Scholar]
  49. He, L.; Zhu, J.; Li, J.; Plaza, A.; Chanussot, J.; Li, B. HyperPNN: Hyperspectral pansharpening via spectrally predictive convolutional neural networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2019, 12, 3092–3100. [Google Scholar] [CrossRef]
  50. Xu, S.; Zhang, J.; Zhao, Z.; Sun, K.; Liu, J.; Zhang, C. Deep gradient projection networks for pan-sharpening. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 21–25 June 2021; pp. 1366–1375. [Google Scholar]
  51. Bandara, W.G.C.; Patel, V.M. HyperTransformer: A textural and spectral feature fusion transformer for pansharpening. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1767–1777. [Google Scholar]
  52. Simoes, M.; Bioucas-Dias, J.; Almeida, L.B.; Chanussot, J. A convex formulation for hyperspectral image superresolution via subspace-based regularization. IEEE Trans. Geosci. Remote. Sens. 2014, 53, 3373–3388. [Google Scholar] [CrossRef]
  53. Yang, J.; Fu, X.; Hu, Y.; Huang, Y.; Ding, X.; Paisley, J. PanNet: A deep network architecture for pan-sharpening. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5449–5457. [Google Scholar]
  54. Lee, J.; Seo, S.; Kim, M. Sipsa-net: Shift-invariant pan sharpening with moving object alignment for satellite imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 21–25 June 2021; pp. 10166–10174. [Google Scholar]
  55. Green, R.O.; Eastwood, M.L.; Sarture, C.M.; Chrien, T.G.; Aronsson, M.; Chippendale, B.J.; Faust, J.A.; Pavri, B.E.; Chovit, C.J.; Solis, M.; et al. Imaging Spectroscopy and the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS). Remote. Sens. Environ. 1998, 65, 227–248. [Google Scholar] [CrossRef]
  56. Sun, H.; Zheng, X.; Lu, X.; Wu, S. Spectral–spatial attention network for hyperspectral image classification. IEEE Trans. Geosci. Remote. Sens. 2019, 58, 3232–3245. [Google Scholar] [CrossRef]
  57. Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral–spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote. Sens. 2017, 56, 847–858. [Google Scholar] [CrossRef]
  58. Heo, B.; Yun, S.; Han, D.; Chun, S.; Choe, J.; Oh, S.J. Rethinking spatial dimensions of vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 11936–11945. [Google Scholar]
  59. Yang, X.; Cao, W.; Lu, Y.; Zhou, Y. Hyperspectral image transformer classification networks. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
  60. Sun, L.; Zhao, G.; Zheng, Y.; Wu, Z. Spectral–spatial feature tokenization transformer for hyperspectral image classification. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
  61. Zhang, J.; Zhang, Y.; Zhou, Y. Quantum-Inspired Spectral-Spatial Pyramid Network for Hyperspectral Image Classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 9925–9934. [Google Scholar]
  62. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
  63. Ayush, K.; Uzkent, B.; Meng, C.; Tanmay, K.; Burke, M.; Lobell, D.; Ermon, S. Geography-aware self-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10181–10190. [Google Scholar]
  64. Ou, X.; Liu, L.; Tan, S.; Zhang, G.; Li, W.; Tu, B. A hyperspectral image change detection framework with self-supervised contrastive learning pretrained model. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2022, 15, 7724–7740. [Google Scholar] [CrossRef]
  65. Loncan, L.; de Almeida, L.B.; Bioucas-Dias, J.M.; Briottet, X.; Chanussot, J.; Dobigeon, N.; Fabre, S.; Liao, W.; Licciardi, G.A.; Simoes, M.; et al. Hyperspectral pansharpening: A review. IEEE Geosci. Remote. Sens. Mag. 2015, 3, 27–46. [Google Scholar] [CrossRef]
  66. Liang, T.; Glossner, J.; Wang, L.; Shi, S.; Zhang, X. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 2021, 461, 370–403. [Google Scholar] [CrossRef]
Figure 1. General Overview of HyperKon System Architecture.
Figure 1. General Overview of HyperKon System Architecture.
Remotesensing 16 03399 g001
Figure 2. Illustration of HSI contrastive sampling.
Figure 2. Illustration of HSI contrastive sampling.
Remotesensing 16 03399 g002
Figure 3. Conceptual comparison of HSI vs. RGB perceptual loss.
Figure 3. Conceptual comparison of HSI vs. RGB perceptual loss.
Remotesensing 16 03399 g003
Figure 4. Top-1 HSI retrieval accuracy achieved by various versions of the HyperKon model during the pretraining phase. The performance of each version is presented as a bar in the chart, illustrating how the integration of different components, such as 3D convolutions, DSC, the CBAM, and the SEB, affected the model’s accuracy. The chart underscores the superior performance of the SEB.
Figure 4. Top-1 HSI retrieval accuracy achieved by various versions of the HyperKon model during the pretraining phase. The performance of each version is presented as a bar in the chart, illustrating how the integration of different components, such as 3D convolutions, DSC, the CBAM, and the SEB, affected the model’s accuracy. The chart underscores the superior performance of the SEB.
Remotesensing 16 03399 g004
Figure 5. Top-1 HSI retrieval accuracy for the 3D Conv, SEB, and CBAM models following dimensionality reduction using PCA. The graph indicates a superior performance by the 3D convolution model.
Figure 5. Top-1 HSI retrieval accuracy for the 3D Conv, SEB, and CBAM models following dimensionality reduction using PCA. The graph indicates a superior performance by the 3D convolution model.
Remotesensing 16 03399 g005
Figure 6. Top-1 HSI retrieval accuracy for the 3D Conv, SEB, and CBAM models when manual band selection was employed. It shows an initial advantage for 3D Conv, but over time, the SEB and CBAM models catch up to similar levels of performance.
Figure 6. Top-1 HSI retrieval accuracy for the 3D Conv, SEB, and CBAM models when manual band selection was employed. It shows an initial advantage for 3D Conv, but over time, the SEB and CBAM models catch up to similar levels of performance.
Remotesensing 16 03399 g006
Figure 8. HyperKon image classification visualization for Indian Pines, Pavia University, and Salinas datasets. (a) Predicted classification map generated by HyperKon, (b) Predicted classification map with masked regions (showing only labeled areas), (c) predicted accuracy map: green for correct predictions, red for incorrect predictions, and black for unlabeled areas, (d) ground-truth classification map, and (e) original RGB image.
Figure 8. HyperKon image classification visualization for Indian Pines, Pavia University, and Salinas datasets. (a) Predicted classification map generated by HyperKon, (b) Predicted classification map with masked regions (showing only labeled areas), (c) predicted accuracy map: green for correct predictions, red for incorrect predictions, and black for unlabeled areas, (d) ground-truth classification map, and (e) original RGB image.
Remotesensing 16 03399 g008
Figure 9. Zoom-out predicted accuracy map for Indian Pines: green for correct predictions, red for incorrect predictions, and black for unlabeled areas.
Figure 9. Zoom-out predicted accuracy map for Indian Pines: green for correct predictions, red for incorrect predictions, and black for unlabeled areas.
Remotesensing 16 03399 g009
Table 1. A comparison to popular hyperspectral datasets *.
Table 1. A comparison to popular hyperspectral datasets *.
DatasetNumber of BandsSizeSpectral RangeNumber of ImagesSpatial ResolutionImaging LocationPlatform Type
Indian Pines200145 × 145400–2500 nm130 mIndiana, USAAirborne
Pavia Centre1021096 × 1096430–860 nm11.3 mPavia, ItalyAirborne
Salinas204512 × 217360–2500 nm13.7 mSalinas Valley, CA, USAAirborne
Harvard311392 × 1040420–720 nm50-Harvard, USAAirborne
Botswana1451476 × 256400–2500 nm130 mBotswanaAirborne
Chikusei1002517 × 2335263–1018 nm12.5 mChikusei, JapanAirborne
EnHyperSet-12241300 × 1200420–2450 nm80030 mGlobal, on demandSpaceborne
* Best values are in bold.
Table 4. Comparative analysis of hyperspectral image classification methods’ performance metrics and computational requirements for various hyperspectral image classification methods across three datasets *.
Table 4. Comparative analysis of hyperspectral image classification methods’ performance metrics and computational requirements for various hyperspectral image classification methods across three datasets *.
DatasetsMetricsSSAN [56]SSRN [57]RvT [58]HiT [59]SSFTT [60]QSSPN-3 [61]HyperKon
IPOA(%)89.4691.8583.8590.5996.3595.8798.77
AA(%)85.9981.5179.6786.7189.9996.4097.82
Kappa(%)88.0490.7381.6889.2795.8295.3498.60
Params.148.83 K735.88 K10.78 M49.60 M148.50 K910.50 K5.54 M
FLOPs7.88 M212.48 M17.83 M345.88 M3.66 M34.54 M1.27 G
PUOA(%)99.1599.6397.3799.4399.5299.7199.89
AA(%)98.7099.2995.8699.0999.2099.4399.76
Kappa(%)98.8799.5196.5299.2499.3699.6199.86
Params.94.63 K396.99 K9.77 M42.41 M148.03 K609.16 K4.08 M
FLOPs5.57 M108.04 M16.83 M190.85 M3.66 M10.25 M370.59 M
SAOA(%)98.9299.3198.1199.3899.5399.66100.00
AA(%)99.3399.7098.8399.7099.7299.81100.00
Kappa(%)98.8099.2397.9099.3199.4799.63100.00
Params.149.71 K750 K10.82 M50 M148.50 K926.90 K5.62 M
FLOPs7.97 M216.8417.80 M354.42 M3.66 M35.87 M1.32 G
* Best values are in bold, 2nd best values are underlined.
Table 5. Classwise accuracies and performance metrics for the HyperKon model on Indian Pines, Pavia University, and Salinas datasets.
Table 5. Classwise accuracies and performance metrics for the HyperKon model on Indian Pines, Pavia University, and Salinas datasets.
Indian PinesPavia UniversitySalinas
ClassAcc. (%)ClassAcc. (%)ClassAcc. (%)
Alfalfa100.00Asphalt100.00Broccoli_green_weeds_1100.00
Corn-notill99.72Meadows100.00Broccoli_green_weeds_2100.00
Corn-mintill99.16Gravel100.00Fallow100.00
Corn100.00Trees99.87Fallow_rough_plow100.00
Grass-pasture99.59Painted metal sheets100.00Fallow_smooth100.00
Grass-trees100.00Bare Soil100.00Stubble100.00
Grass-pasture-mowed100.00Bitumen99.85Celery100.00
Hay-windrowed100.00Self-blocking bricks99.95Grapes_untrained100.00
Oats95.00Shadows99.58Soil_vinyard_develop100.00
Soybean-notill99.69 Corn_senesced_green100.00
Soybean-mintill99.80 Lettuce_romaine_4wk100.00
Soybean-clean99.49 Lettuce_romaine_5wk100.00
Wheat100.00 Lettuce_romaine_6wk100.00
Woods99.29 Lettuce_romaine_7wk100.00
Buildings-grass-trees-drives100.00 Vinyard_untrained100.00
Stone-steel-towers100.00 Vinyard_vertical100.00
Performance Metrics
Total training time (s)2346.18Total training time (s)9251.00Total training time (s)12,172.18
Total test time (s)52.07Total test time (s)513.87Total test time (s)294.16
Avg. inference time (s)0.0009Avg. inference time (s)0.0009Avg. inference time (s)0.0010
Throughput (samples/s)1088.39Throughput (samples/s)1078.45Throughput (samples/s)1038.11
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ayuba, D.L.; Guillemaut, J.-Y.; Marti-Cardona, B.; Mendez, O. HyperKon: A Self-Supervised Contrastive Network for Hyperspectral Image Analysis. Remote Sens. 2024, 16, 3399. https://doi.org/10.3390/rs16183399

AMA Style

Ayuba DL, Guillemaut J-Y, Marti-Cardona B, Mendez O. HyperKon: A Self-Supervised Contrastive Network for Hyperspectral Image Analysis. Remote Sensing. 2024; 16(18):3399. https://doi.org/10.3390/rs16183399

Chicago/Turabian Style

Ayuba, Daniel La’ah, Jean-Yves Guillemaut, Belen Marti-Cardona, and Oscar Mendez. 2024. "HyperKon: A Self-Supervised Contrastive Network for Hyperspectral Image Analysis" Remote Sensing 16, no. 18: 3399. https://doi.org/10.3390/rs16183399

APA Style

Ayuba, D. L., Guillemaut, J. -Y., Marti-Cardona, B., & Mendez, O. (2024). HyperKon: A Self-Supervised Contrastive Network for Hyperspectral Image Analysis. Remote Sensing, 16(18), 3399. https://doi.org/10.3390/rs16183399

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop