1. Introduction
A hyperspectral image (HSI), which is acquired at a contiguous spectral wavelength of the electromagnetic spectrum (EM), is a rich data source for a wide range of real-world remote sensing applications, including agriculture, geology, mining, military surveillance, and others [
1,
2]. Moreover, an HSI is set up as a hypercube and often has hundreds of contiguous, narrow bands in the spectral image [
3,
4]. Due to the fact that each of these image bands contains varying intensities for the ground cover, they are each referred to as individual features [
5,
6,
7].
There are two dimensions of spatial information and one dimension of spectral information in an HSI, which comprise the three dimensions of spectral-spatial information in the HSI (see
Figure S1 in the Supplementary Files) [
5,
6]. Each spectral image is referred to as a feature for classification in this context, since it contains the distinct responses of the ground surface [
7]. Four essential obstacles to a successful classification task are present in a high-dimensional HSI (i.e., an HSI with hundreds of image bands or features). First, because the hyperspectral sensor collects the images in continuous and contiguous spectral ranges, the neighboring image bands are highly correlated and certain image bands carry less discriminating information [
5,
8]. Secondly, the spectral bands are not equally important, as the bands are captured in different wavelengths of the EM spectrum [
9]. Thirdly, there is a significant lack of training samples for some classes [
10], which, in turn, creates the Hughes phenomenon or curse of dimensionality problem [
11]. The Hughes phenomenon describes the fact that classification accuracy initially rises steadily as the number of spectral bands or dimensions rises, but falls sharply after the number of bands reaches a certain level. Finally, the computational cost of using the entire original HSI is highly expensive [
7].
Effective feature (band) reduction is necessary to lower high-dimensional HSIs and create a suitable subspace of the features in order to improve the classification results [
12,
13,
14,
15,
16] in order to address the aforementioned issues. For the accurate classification of HSIs, feature reduction (FR) techniques using feature extraction (FE) and/or feature selection (FS) might be used. FE maps the original HSI into a new space with a dimensionality of
K from the original space with a dimensionality of
, where
, using nonlinear or linear conversion [
3]. Unsupervised and supervised procedures are the two methods of reducing dimensionality that are used most frequently. While unsupervised procedures do not make any assumptions about the existing knowledge, supervised methods are intended to preserve previously known information (ground truth). The most widely used unsupervised linear FE approach is principal component analysis (PCA) [
17,
18,
19]. It is based on the idea that adjacent bands are highly correlated, and uses global statistics to eliminate the connections between bands [
20,
21]. It is often claimed that PCA is better for data compression purposes but is not suitable for extracting the most informative feature in the classification task [
22,
23,
24,
25,
26,
27,
28,
29]. The reasons for this are: (i) PCA may not catch the detailed local statistics, as it determines the overall characteristics of the entire HSI; (ii) the top principal components (PCs) or transformed features may not always contain the informative structure of the entire HSI (i.e., the tasks are biased in PCs with high variance); and (iii) PCA requires a high computational cost for high-volume hyperspectral data, as it considers the global statistics [
30,
31,
32].
To address the pitfalls of the classical PCA, correlation-based segmented PCA (SPCA) was presented in [
22], which applies conventional PCA to the bands’ subgroups. The entire dataset is divided into multiple segments using the image’s band-to-band correlation matrix. For a subgroup’s dataset, the contiguous strongly correlated bands are often assigned. However, this correlation-based segmentation strategy can only sufficiently reflect the linear relationships of the bands for making the subgroups. As such, correlation-based segmentation might not be feasible for performing classical PCA on large-volume HSIs with a huge number of bands for effective FE. Comparatively, mutual information (MI) is a dependence metric that has a built-in ability to manage the HSI in both linear and nonlinear connections [
9,
33]. With this motivation, we proposed a band grouping method of partitioning the spectral bands using a band-to-band normalized MI (NMI) matrix for effective FE, which is called band grouping-based PCA (BgPCA). The suggested FE method, BgPCA, first uses the NMI measure to divide the original bands into multiple groups and then applies conventional PCA separately to each subgroup of the original image bands at a minimum computing cost.
As segmented PCA is applied to the complete dataset, there is a need to apply feature selection to select the optimal number of features. For FS, the subspace of effective features extracted by our BgPCA transformation for classification is selected using the NMI values of the transformed features to a specific range, thus meeting the minimum redundancy and maximum relevance (mRMR) criteria. Accordingly, the complete FR approach is known as BgPCA-NMI, which significantly enhances the classification performance and minimizes the computational costs as well. Although the proposed method shows outstanding performance in terms of different performance measure metrics, it has some limitations. A user-defined threshold is used to effectively partition the complete HSI. It can be optimized adaptively, and our future goal is to use a network model that automatically selects the threshold value from the dataset. On the other hand, the proposed method only addresses the spectral features. However, data redundancy exists in the spatial domain of the HSI. As such, in the future, a deep learning-based approach could be used to extract the spectral-spatial information [
34,
35] alongside our proposed FR technique for further improving the classification outcome. To this end, the main contributions of this study are listed below.
We propose an MI-driven efficient FR approach for the effective classification of HSI.
We introduce an NMI-based band grouping strategy for intrinsic FE by applying classical PCA transformation to each group of bands independently for effective FE from HSI.
We propose an NMI-based mRMR FS method using the extracted features through our proposed transformation.
We performed extensive experiments on two widely used benchmark HSI datasets captured by the AVIRIS and HYDICE sensors to validate the superiority of our proposed FR approach.
We have organized the rest of the article as follows. In
Section 2, we first describe the insights of the proposed NMI-based band grouping strategy for applying classical PCA in a segmented manner. Next, the proposed FE called BgPCA is elaborately presented. After that, we discuss the NMI-based mRMR FS criteria on top of our BgPCA transformation. Lastly, we present the complete FR method called BgPCA-NMI at the end of
Section 2. In
Section 3, we intricately analyze the experiments conducted on two real HSI datasets using the proposed BgPCA-NMI FR approach and the state-of-the-art methods. Finally,
Section 4 summarizes the outcomes and concludes the article.
4. Conclusions and Future Work
Because an HSI is a high-dimensional data cube, effective FE is necessary to provide outstanding classification performance while decreasing the computing costs. In this study, we used the NMI measure because of its appropriate treatment of nonlinearity in partitioning the original HSI bands efficiently instead of using the correlation for the segmentation, as in the case of SPCA. For successful FE, PCA was performed on each subgroup of bands after the band-to-band NMI matrix of the HSI had been utilized to divide all the spectral bands into a number of groups. As a result, the proposed FE approach extracted useful features while taking the HSI dataset’s local characteristics into account, and the computational cost of extracting the features decreased greatly. After that, the NMI between each transformed feature and the ground truth was used for selecting the subspace of informative features using the mRMR scheme. In comparison with traditional PCA and correlation-based SPCA, BgPCA-NMI increased the classification accuracy, as shown by the classification performance and analysis of the results on two actual HSI datasets, Indian Pines and Washington DC Mall. Ultimately, the proposed method, BgPCA-NMI, effectively reduced the computational cost.
Effective partitioning of the whole HSI was achieved by using a user-defined threshold. We want to utilize a network model that automatically chooses the threshold value from the dataset in the future. It may be optimized adaptively. On the other hand, the proposed method just takes the spectral characteristics into account. However, there is data redundancy in the HSI’s spatial domain. As a result, in the future, our suggested FR technique, as well as a deep learning-based strategy, will be used to extract the spectral and spatial information to further improve the classification results. Finally, as well as our feature space analysis, other distance metrics or statistics, such as the Bhattacharyya distance, class compactness, etc., within the PC space and BgPCA space could be used in the future to quantify the separation better.