Recent advances in remote sensing technology make the simultaneous acquisition of hundreds of spectral bands for each image pixel a reality. The augmented image is termed Hyperspectral image (HSI), which in comparison with the conventional Red-Green-Blue (RGB) image and Multispectral image (MSI), can provide much higher spectral resolutions. This can be attributed to the increased number of bands and the decreased bandwidth of each spectral band. Consequently, a better discriminating ability is enabled in HSI, particularly for objects with similar spectral signatures in conventional images. As a result, they are attracting increasing attention in various thematic applications including ecological science (e.g., biomass estimation, land cover change detection) [1
] and precision agriculture [2
] (e.g., crop parameter estimation such as Leaf Area Index (LAI) and biomass, and crop health evaluation including drought, disease, grass and nutrition mapping).
In aforementioned applications, an image classification process is usually involved to convert raw image data into higher-level meaningful information. For example, in land cover classification, image pixels should be classified into different classes such as river, road, grass [1
]; while in precision agriculture, the field should be spatially divided into classes such as normal crop and abnormal crop under various stress levels caused by drought, disease or pest. In HSI, an individual pixel is usually represented as a vector, where each entry is a measurement corresponding to a specific band. Hence, the size of pixel vector equals to the number of spectral bands. The pattern/feature vector is usually extracted from the HSI pixel vector and further treated as an instance of data sample for training and classification in pixel-wise remote sensing image classification [6
HSI classification is never an easy task. In comparison with conventional images (e.g., RGB or MSI), the main challenges in HSI classification are caused by the high spectral dimensionality. Instead of having a few spectral bands as in RGB and MSI, there may have several hundreds of spectral bands of the same scene in HSI [7
]. The increased spectral dimension makes classification model parameter estimation very difficult. First the number of parameters to be estimated increases with the increased band dimension. Second the rate of convergence of statistical estimation algorithms decreases [8
]. Besides, extracting the right features in HSI classification is crucial because non-representative features can be harmful to the learning process as they require much more data to train [9
It is well acknowledged in machine learning that in achieving a successful classification, the following factors play the dominant rules:
Well defined training samples: the training data should well represent the data to be classified;
Feature determination: features should maximally reflect the data information while with an appropriate dimension;
Appropriate classification algorithms: the adopted algorithms should accommodate the volume of training data and the dimension of pattern vector.
Quite often, only a limited size of training dataset is available in HSI classification. This is because getting the ground truth data is labour intensive, expensive or time-consuming [10
]. Usually field work for collecting data with instrumentation or even expensive laboratory tests are required with careful experiment setting or necessary environment control (e.g., moisture of soil or vegetation) [7
]. Therefore reducing the reliance on large training dataset would significantly reduce the cost and time, and so increase the applicability and availability of HSI based remote sensing for a wide range of applications. In this paper, we assume a limited number of training data are given for HSI classification.
The Support Vector Machines (SVM) is chosen as the baseline algorithm for HSI classification. Its effectiveness has been validated in the area of remote sensing (see [7
] for a comparison against other algorithms such as neural network, multinomial logistic regression, random forest and deep learning). This is mainly because in comparison to other classifiers (e.g., parametric classifiers or deep learning), SVM generalizes well even with small training samples. Consequently, this paper is mainly focussed on how to appropriately define the feature vector for pixel-wise HSI classification.
Engineering a good feature space is a prerequisite to achieve a high performance in many machine learning tasks. In pixel-wise HSI classification, it is a common approach to treat all bands as features and leave the problem of identifying the useful feature sets to the learning model. This simplistic approach does not always work well, particularly with a large number of bands and a comparatively small training dataset [12
]. In classification tasks with high dimensional data, it is acknowledged that feature extraction and selection is of vital importance since in such applications many irrelevant (e.g., features with low correlation and noises) and redundant variables exist and comparably few training examples are available [13
]. The rationales for conducting feature selection and extraction for HSI classification are summarized as follows:
There are generally hundreds of bands in HSI (although not all of them are useful) while only a limited number of training samples are available;
There may be a high correlation between adjacent bands [14
], resulting in redundant features (see, Figure 1
Certain bands are dominated by sensor noises or may be not relevant to specific features so may contain little useful information for a task of concern.
Feature selection and extraction is a process to identify those features (feature selection) or their combinations (feature extraction) in dataset that are most useful/relevant for the predictive model construction. By doing so, the unneeded, irrelevant and redundant variables can be removed and thus providing cost-efficient (due to reduced dimensionality) predictors with a good or even better predictive accuracy while requiring less training data [15
]. Besides, in certain applications such as agricultural crop monitoring, the most relevant bands can also be pinpointed using feature selection algorithms [16
], providing better understanding for crop growing status. As a result, feature selection and feature extraction are drawing increasing research interest in various applications [15
] including HSI analysis [13
]. It was shown in [21
] that the accuracy of SVM classifier could be further increased by using data dimensionality. This idea was recently validated in HSI classification where different feature selection algorithms are compared [22
]. In addition, two types of feature extraction approaches were adopted in [23
] for agricultural land use classification using airborne HSI. However, there is still a lack of research comparing different feature selection and feature extraction algorithms using a limited number of training samples in the context of HSI classification. Moreover, little research has been done to quantitatively investigate the computation advantages of dimension reduction aided classifiers.
The objective of the paper is to investigate the advantages of incorporating dimension reduction techniques into classifiers (e.g., SVM) for HSI classification with a limited number of training samples. To this end, several feature selection techniques including Mutual Information (MI) and Minimal-Redundancy-Maximal-Relevance (MRMR) and feature extraction techniques including Principal Component Analysis (PCA) and Kernel PCA are elaborated and compared using real HSI datasets. To be more exact, the contributions of the paper are summarized as follows:
Different dimension reduction techniques including feature selection and feature extraction algorithms are compared for HSI classification, where the most suitable one is identified, namely SVM with PCA.
Comparatively experimental results on a real HSI dataset demonstrate that by augmenting SVM with dimension reduction techniques (i.e., PCA), good or even better classification performance can be achieved, particularly when the size of training data is small.
More importantly, it is discovered that reducing feature dimension can substantially simplify the SVM models and so reduce classification time while preserving performance, which is vital for real-time remote sensing applications.
5. Conclusions and Future Work
In this paper, the problem of Hyperspectral image (HSI) classification with limited training data is investigated. To remove the irrelevant and redundant data, different approaches are elaborated and compared including feature selection (i.e., mutual information and minimal-redundancy-maximal-relevance) and feature extraction (i.e., Principal Component Analysis (PCA) and kernel PCA). Comparatively experimental results on real HSI dataset (i.e., ROSIS data) demonstrate that PCA is the most effective dimension reduction approach in term of classification performance, algorithm complexity and computation load. After identifying PCA as a promising feature extraction approach, we further compare SVM with PCA and SVM using all bands under various sizes of training data. Further experimental results demonstrate that by augmenting SVM with PCA, the obtained predictors provide better prediction performance in terms of overall accuracy, average accuracy, kappa coefficient when the dataset is of a small size and comparable performance when there are sufficient training data in comparison with classic SVM algorithms. In all the cases, significant less computational time (particularly in testing) is achieved which is important in facilitating real-time classification using HSI, particularly for platforms with limited computational resources such as unmanned aerial vehicles. Considering SVM with PCA requires few feature dimension while guaranteeing good performance with a small size of training data, it is quite appropriate for real remote sensing applications.
This paper mainly focused on maximally exploiting the potential of pixel-wise (or spectral) HSI classification with limited training samples using feature selection and extraction techniques. With the advent of HSI with higher spatial resolution by using Unmanned Aerial Vehicles (UAVs) as the camera platform, more information will be explored in the future such as spatial or even contextual information. It should also be highlighted that the feature selection and extraction techniques discussed in this paper can be parallelly applied to efficiently capture the spatial or contextual features. Besides, hybrid approaches by simultaneously exploiting feature selection and feature extraction techniques can also be considered to further improve the performance.