Investigating the Effects of a Combined Spatial and Spectral Dimensionality Reduction Approach for Aerial Hyperspectral Target Detection Applications

: Target detection and classiﬁcation is an important application of hyperspectral imaging in remote sensing. A wide range of algorithms for target detection in hyperspectral images have been developed in the last few decades. Given the nature of hyperspectral images, they exhibit large quantities of redundant information and are therefore compressible. Dimensionality reduction is an effective means of both compressing and denoising data. Although spectral dimensionality reduction is prevalent in hyperspectral target detection applications, the spatial redundancy of a scene is rarely exploited. By applying simple spatial masking techniques as a preprocessing step to disregard pixels of deﬁnite disinterest, the subsequent spectral dimensionality reduction process is simpler, less costly and more informative. This paper proposes a processing pipeline to compress hyperspectral images both spatially and spectrally before applying target detection algorithms to the resultant scene. The combination of several different spectral dimensionality reduction methods and target detection algorithms, within the proposed pipeline, are evaluated. We ﬁnd that the Adaptive Cosine Estimator produces an improved F1 score and Matthews Correlation Coefﬁcient when compared to unprocessed data. We also show that by using the proposed pipeline the data can be compressed by over 90% and target detection performance is maintained.


Introduction
Remote sensing from aerial and satellite platforms has become increasingly prevalent and is an important source of information in areas of research including disaster relief [1], determining land usage [2] and assessing vegetation health [3]. Remote sensing platforms are also often deployed in military and security applications such as change detection [4,5], target tracking [6] and classification. Target Detection (TD) from airborne imagery is a major challenge and active area of research within the disciplines of signal and image processing [7][8][9]. There have been a wide range of TD algorithms of varying complexities developed over the last few decades [10], ranging from mathematical models to those based on more intuitive approaches such as angles or distances. The most notable difficulties in aerial TD are discussed in [11] and include sensor noise effects, atmospheric attenuation and subsequent correction which can both lead to variabilities in target signature. Depending on the system, remote sensing data can consist of high resolution RGB colour data, radar, multispectral, or hyperspectral images. The latter, while providing a great deal of useful information, often at wavelengths beyond the range of human vision, introduces a vast quantity of data which must be handled and processed. Dimensionality Reduction (DR) techniques offer methods of compressing and remapping this high dimensionality data into a reduced, and sometimes more informative, uncorrelated subspace.
As hyperspectral images contain high levels of redundancy they are easily compressed using sparsity-based approaches [12] or by applying DR methods. Coupling spectral DR with TD in order to improve detection and classification rates has been covered widely in the literature [11,[13][14][15][16][17][18][19] and has been shown to improve the performance of TD and classification algorithms.
In TD applications, often the targets are sparsely positioned in an imaged scene, therefore large amounts of spatial redundancy are exhibited. This spatial redundancy, like the spectral redundancy also present in hyperspectral images, can be exploited in order to attain increased performance and efficiency. In [18,19], we investigated using the Normalised Difference Vegetation Index (NDVI) as a spatial mask on the detected image in order to constrain the region of interest in the scene. In this paper, however, the spatial DR is applied prior to the calculation of the dimensionality reduced image in order to refine the subspace in which any TD is performed. NDVI and its variants are most often used in remote sensing applications to quickly and effectively assess vegetation health [3]. Other similar indices are used to detect water/snow in an image or for assessing how built upon an area is. However, such indices could be used to provide a measure of how informative a pixel may be or how likely it is to hold a target signature. Pixels are categorised as informative or non-informative with the non-informative pixels being discarded. By removing such pixels, the DR calculation can be simplified by reducing the number of samples, whilst also simplifying and suppressing the background class. As TD algorithms can be represented as a binary classification, improving the separation between target and background classes consequently improves TD performance [8]. While various information indices are commonly used in remote sensing tasks, to the best of the authors' knowledge, they have never been used to perform spatial DR or coupled with spectral DR in this way with the aim of improving hyperspectral TD applications.
In this paper, we investigate the use of coupled spatial and spectral DR for hyperspectral TD applications. With this approach, we aim to decrease both the spatial and spectral redundancy exhibited in hyperspectral images, improving the efficiency and performance of various benchmark TD algorithms. The proposed method was tested on two hyperspectral datasets containing multiple targets in varied scenes.

Materials and Method
In this section, we first introduce the notation used in this paper as well as the relevant background information on each of the datasets used. Secondly the various spectral DR methods used are introduced followed by the spatial DR method created for purpose of TD. Finally the various detection algorithms are described.

Notation
Hyperspectral images can most easily be represented as 3-dimensional datacubes, with two spatial dimensions and a third spectral dimension. Any hyperspectral image X can be represented as L individual greyscale images each exposed at a particular wavelength or spectral band λ l , X l : l ∈ {1, 2, ..., L}, where L represents the total number of spectral bands. Alternatively an image, X, can be thought of as N individual pixels each comprised of an L-dimensional vector as seen in Equation (1): x 1,1 x 2,1 · · · x i,1 x 1,2 x 2,2 · · · x i,2 . . . . . . . . . . . .
where i and j represent the number of columns and rows in the hyperspectral datacube X 3D , respectively. Generally when applying hyperspectral image processing algorithms to images, it is desirable for the image to be in a 2-dimensional matrix form, X. This is shown in Equation (2), where each column consists of a single pixel, x i : i ∈ {1, 2, ..., N}, represented by an L-dimensional vector, as seen in Equation (3).
The vector in Equation (3) represents a single hyperspectral pixel, or a single spectral measurement.

Image Acquisition
Images from two sources have been used to validate the techniques described here. The first dataset "OP7", provided by BAE Systems, consists of three images acquired on the 18 May 2014 from an aerial platform flying at an altitude of approximately 0.78 km. The platform used a Visible and Near-InfraRed (VNIR) hyperspectral sensor with a spectral range of roughly 400-1000 nm.
The second set of images were supplied by the UK Defence Science and Technology Laboratory (DSTL) as part of the University Defense and Research Collaboration (UDRC) from the Selene trial. Part of this trial collected airborne hyperspectral imagery of large numbers of spectrally varied targets across a two week period between the 4th and 15th of August 2014 at an altitude between 0.9 and 1.05 km. A common region from a selection of seven images captured over this period was used so as to exhibit varied targets under different environmental conditions. The camera used was also in the VNIR range with a similar spectral range of roughly 400 nm to 1000 nm with fewer spectral measurements but a much higher spatial resolution than the OP7 dataset.
Sample false-colour images from each of the datasets can be seen in Figure 1 along with cropped portions of the target area indicated by a red box.

Spectral Dimensionality Reduction Techniques
Due to the high correlation between successive bands in hyperspectral images, compression and DR techniques can be readily applied. In this section, we review four of the most common techniques which we have included in this analysis.

Principal Component Analysis
Principal Component Analysis (PCA) [20] is a classical method of DR. It seeks to remap highly correlated data into an uncorrelated space using a set of optimal orthogonal basis vectors, or Principal Components (PC), calculated from the input data. There are multiple ways of achieving this through both iterative and non-iterative algorithms, we have included two in this analysis, Eigenvalue Decomposition (EVD) and Non-linear Iterative Partial Least Squares (NIPALS). The EVD is a common method for performing PCA and consists of the matrix decomposition Σ = UΛU T , where the matrix Λ is a diagonal matrix containing the eigenvalues of Σ, i.e., Λ = diag{λ 1 , λ 2 , ..., λ L } and the matrix U contains the related eigenvectors [u 1 , u 2 , ..., u L ]. The eigenvalues in Λ are ordered such that λ 1 > λ 2 > ... > λ L , hence the first K largest eigenvalues correspond to the first K eigenvectors. The first K eigenvectors, or PCs, can be used as a set of basis vectors to transform the original data into an uncorrelated K-dimensional subspace, where K < L, which represents the most significant information contained in the data.
In some cases, such as those where the desired number of retained components is known, it is unnecessary and therefore preferable to avoid calculating every PC as is required in an EVD. In these cases, iterative techniques can be used to calculate each successive PC in turn until the required number, K, has been reached. The NIPALS algorithm can be used to achieve this and consists of the decomposition X = TP T , where X is some mean-centred matrix and the columns of T are the scores and the columns of P are the loadings. P forms an optimal transform matrix which can be used in an identical manner to the matrix of eigenvectors from an EVD in transforming input data into a dimensionality reduced subspace. An overview of the NIPALS algorithm can be found in [21].
In testing, both the EVD and NIPALS algorithms produced PCs with identical magnitudes but some which exhibited opposite polarity as orthogonality can take one of two directions. The EVD has no need to converge and is therefore faster while producing minimal error. With this in mind, only the EVD was used to perform PCA-based DR.

Maximum Noise Fraction
The Maximum Noise Fraction (MNF) [22] transform is similar in operation to PCA but also accounts for the noise present in input data [23]. Rather than ordering the PCs of an input image, X, by their variance, as in PCA, they are instead sorted by their estimated Signal-to-Noise Ratio (SNR). In MNF, it can be assumed that the covariance of the data, Σ, is a sum of the covariance of the signal, Σ s , and the covariance of noise, Σ n , i.e., Σ = Σ s + Σ n . The MNF transform seeks to maximise the calculated eigenvalues with respect to the estimated SNR and can be interpreted as two separate PCAs computed in turn, the first to noise whiten the data, and the second to calculate the PCs. The complete MNF algorithm is described in [22].

Folded Principal Component Analysis
With both PCA and MNF, as well as many other PCA-like methods, it is necessary to compute the full covariance matrix Σ. This covariance matrix is of size L × L where L is equal to the number of spectral bands in an image. Therefore, for images with high spectral resolution it can be computationally expensive and time-consuming to compute. In order to circumvent this challenge, Folded Principal Component Analysis (FPCA) [24] seeks to reduce the size of the covariance matrix and also incorporate the correlation within spectra into the calculation. In order to perform FPCA, each of the N mean-centred spectral vectors, x, are folded into an H × W matrix, A, where H × W = L for some positive integers H and W. A partial covariance matrix can be calculated as Σ = A T A and using each of these N partial covariance matrices the full covariance matrix, Σ FPCA , can be calculated as Images can be projected into the FPCA domain by performing the EVD, of Σ FPCA and using the resultant eigenvectors to project the input data into the PC space. Auxiliary target spectra can then be folded using the same H and W and projected using the eigenvectors of Σ FPCA , before being unfolded again to be processed in the FPCA domain.

Independent Component Analysis
Independent Component Analysis (ICA) is a common method for performing Blind Source Separation (BSS) used in DR. Unlike PCA, MNF or FPCA, ICA seeks to separate an ensemble of mixed signals into a set of finite distinct sources or Independent Components (IC). This is achieved by maximising the statistical independence of the calculated components [25]. As hyperspectral images are made up of a weighted sum of a set of finite pure spectra, or endmembers, it is possible to perform ICA to separate the mixed spectra into pure spectral endmembers. There are multiple algorithms used to calculate the ICs of a set of data, two of the most well used are the FastICA [26] algorithm and the Joint Approximation Diagonalization of Eigen-matrices (JADE) algorithm [27]. In this paper, the FastICA algorithm is used instead of the JADE algorithm as it reached convergence both faster and more reliably. In order to perform ICA based DR, the required number of ICs to represent the data needs to be calculated. This is achieved by using the notion of Virtual Dimensionality (VD) [28] which estimates the number of spectrally distinct sources in the image. Using the method from [29], ICA-DR can be achieved with K ICs.
PCA and MNF are both classified as second order statistics-based transforms which can be insufficient in some applications [29]. ICA preserves higher order moments, such as skewness and kurtosis, which can aid in applications which require characterisation of subtle differences in signature such as classification or detection of small/rare targets. While it is possible that second-order statistics may be insufficient in preserving such characterising information it has not been the case with this application. Although it performs favourably when compared to other ICA algorithms such as JADE, FastICA is much slower than the other, non-iterative, methods for DR listed here. This is due to the need for multiple iterations to reach a convergence and is therefore another important consideration in its choice in any practical application.

Spatial Dimensionality Reduction Using Vegetation Indices
As well as exploiting the spectral redundancy exhibited in hyperspectral images, the spatial redundancy can also be utilised for TD through compression or by creating new features. By investigating the spectral properties of the scene, spatial areas of interest can be selected and areas of non-interest can be discarded from further processing, often saving on large computational costs. Vegetation Indices (VI) such as NDVI and its variants are of particular interest in TD applications as they offer simple and effective methods to discriminate between vegetative and non-vegetative pixels. Three NDVI variants were selected and tested in discriminating between the desired background of vegetation and the foreground of synthetic materials to which the target objects of interest belong. Each of the methods used in this work are listed in Table 1.

Target Detection Algorithms
In this paper, five common classical methods for TD and Anomaly Detection (AD) are investigated for use in combination with spatial and spectral DR. These are the Adaptive Cosine Estimator (ACE) [34], Constrained Energy Minimisation (CEM) [13], the Spectral Angle Mapper (SAM) [35], Spectral Information Divergence (SID) [36], and the Reed-Xiaoli Detector (RXD) [37]. Each method, with the exception of the latter are TD algorithms and, as such, they require a priori information about the targets to be detected in the form of a reference or ground truth spectra. The final method however, the RXD, does not require prior information about a target and finds outlying or anomalous pixels within the image and is cited as the benchmark AD algorithm [11]. Whilst other TD algorithms such as Orthogonal Subspace Projection (OSP) [38,39] are often used to good effect [40][41][42], such methods require prior knowledge of the background which may not be fully known and as a result hinder the performance in a TD application hence they are left out of this analysis. ACE in particular has been shown to achieve favourable results in similar comparisons with other TD algorithms [11,14,17,43].

Performance Measures
In order to asses the performance of each of the TD algorithms a number of measures are used in this paper. Each of the various TD and AD algorithms used return a probability or confidence measure as to whether each pixel contains a target. By varying the threshold above which a pixel is classified as a target, the various behaviours and performance of a TD algorithm can be assessed. Both Receiver Operator Characteristic (ROC) curves [44] and Precision-Recall (PR) curves [45,46] are useful measures in determining an optimal operational threshold in order to maintain an acceptable False Alarm Rate (FAR). The Area Under the Curve (AUC) is a useful measure for comparing the ROC and PR behaviours of various algorithms. The ROC curve can be created by plotting the Probability of Detection (P d ), against the Probability of False Alarm (P fa ), at a series of thresholds.
Although ROC curves are a simple and effective way of rapidly visualising the performance of a classifier, it has been shown that ROC analysis can be flawed for unbalanced classes, as is the case for TD applications. In [45] it is shown that PR, curves are more informative for unbalanced classes as they correctly evaluate the fraction of True Positive Along with these graph-based methods, four other methods of assessing each of the TD algorithms were used. Three measures commonly used in assessing binary classifier performance, the F1 score [46], Matthew's Correlation Coefficient (MCC) [47] and balanced accuracy [48] were used. As TD algorithms can be represented as a binary classification between a positive target class and a negative background class, these measures are applied to assess how each algorithm performs. The final metric used in this work is the visibility measure [14]. Visibility is an indication of how distinct a target is from its background. This is useful in assessing how the detection can be affected by applying DR to input image data.

Proposed Methodology
In this paper, we are proposing a pipeline to improve TD in hyperspectral images by combining both spatial and spectral DR methods. This is achieved by performing a spatial DR on an input image, removing any vegetative, and therefore, non-target pixels, before projecting the subset of the image into a subspace using more traditional spectral DR methods. Any relevant ground truth target spectra can also be projected into the same subspace using the forward transform of each DR method. The TD can then be performed in the dimensionality reduced subspace. This pipeline is shown in Figure 2.
In previous work, [18,19], both NDVI and PCA were combined to improve the performance of a hyperspectral Hit-or-Miss Transform (HMT) for use as a TD algorithm. By reducing the spatial and spectral redundancy the computational overhead of the proposed algorithm was reduced. NDVI was used to mask the already dimensionality reduced data. However, this meant that the NDVI had no influence over the performance of the detection. When it is already known that the target is non-vegetative, the application of NDVI masking prior to the use of spectral DR improves the performance of TD algorithms because a much more informative subset of pixels is exploited. Rather than using the spectral information of vegetation in the DR calculation, which can skew the resultant basis vectors away from representing desirable signatures, it is instead overlooked. The DR is targeted towards representing potentially more informative pixels. By suppressing the vegetative part of the background class, an improved separation between the target and remaining background can be achieved in the DR subspace. The aim is, that by reducing the number of samples in this way, the calculation of the dimensionality reduced data is not only simplified but also more useful information is retained in potentially fewer components.

Experimental Results
In this section, we will first investigate the optimal method of removing vegetative pixels and then discuss the achievable compression rates when combining both spectral DR and NDVI-based spatial DR. We then select the optimal detection algorithm for use with the proposed spatio-spectral DR pipeline shown in Figure 2. Then, we present a subset of the results gathered using both the OP7 dataset and the UDRC Selene Trial data. Finally, we investigate the effects of the various spatial and spectral DR schemes combined with the chosen TD algorithm. Each of the spectral DR methods, PCA, MNF, FPCA and ICA, are tested with K = 20 components retained and "Raw" refers to the full dimensionality image where L = 100 for the OP7 data and L = 80 for the Selene trial images.

Selection of the Optimal Vegetation Index for Spatial Dimensionality Reduction
In order to assess which VI gave the best separation between vegetative and nonvegetative pixels, the ground truth spectra of multiple green targets from the Selene dataset as well as the average spectra of a patch of vegetation were investigated. Figure 3 shows the test image used as well as the results of each of the three VIs. All three of the VIs are able to identify a good separation between vegetation and most other non-vegetative background pixels. However some of the green targets present in the scene, despite exhibiting distinctly non-vegetative spectra, can produce a ratio similar to that of the surrounding grass, this is most apparent when using the basic NDVI. The regions investigated are indicated by the blue and orange elements in Figure 3, matching the colour of the plotted spectral signatures in Figure 4. Figure 4 shows the target spectra, background spectrum, and VI bands used to calculate the ratio of each VI result, respectively. Two of the targets from Figure 3a, green perspex (circled in blue) and green ceramic (circled in orange) were investigated for separation from the background when using VIbased spatial DR. Each of the three VIs investigated produce a ratio between the intensity of a pixel at two bands, the two targets produced VI values shown in Table 2. From Figure 3b-d, Figure 4 and Table 2, it is possible to see that NDVI and the Red Edge Normalised Difference Vegetation Index (RENDVI) have lower separability between the "green perspex" target and the background when compared with that of the Normalised Difference Vegetation Index (red-edge) (NDVI re ). In fact, it can be observed that the green perspex target, pinpointed by the blue arrows in Figure 3b-d, is near indistinguishable from the background in Figure 3b with only six of the seven targets having a low enough NDVI value to be reliably distinguished from the background. Despite having a distinct spectral profile, as shown in Figure 4a, the green perspex has an almost identical NDVI value to the background (0.48 vs. 0.53) indicating the ratio between the two NDVI bands is nearly the same. By altering the Near-InfraRed (NIR) band to be placed in the red-edge portion of the spectrum, as is the case when using NDVI re , a much greater separation is achieved (0.09 vs. 0.39). This is due to the red-edge phenomenon, when the intensity of the background spectra rises sharply, reflecting NIR light. RENDVI, whilst successfully segmenting all seven targets in this example, creates a lower contrast between background and target when compared with NDVI re . As NDVI re provides the best separation between the most difficult targets and the background it is used to implement spatial DR in this paper.

Combining Spatial and Spectral DR for Hyperspectral Compression
Here we briefly discuss the effects on image size and compression when combining spatial and spectral DR techniques. NDVI re is used as a spatial mask, selecting pixels that are relevant and can be used in subsequent spectral DR and TD processes. By masking certain pixels they can be discarded from further processing, reducing the sample size. Then, by performing spectral DR, retaining K components from L spectral bands the sample size is reduced further. By combining the remaining spatial and spectral components, a compressed representation of the relevant data is retained for further processing. Table 3 details the size of each of the images used in this paper, as well as their compressed spatial and spectral sizes along with the percentage of the original data retained after compression. The OP7 dataset images are first able to be compressed to 1.72% of their original size on average as NDVI re selects a small proportion of the total pixels to process further. By retaining K = 20 components in the subsequent spectral DR stage, this is reduced further to an average of 0.34% of their original size. The images in the Selene trial have a much higher spatial resolution and a larger sample is retained after using the NDVI re spatial mask as a large proportion of the pixels represent non-target and non-vegetative materials, as shown in Figure 1. The pixels retained after NDVI re represent an average of 18.45% of the original image and applying spectral DR, with K = 25, reduces this to 4.61% on average.

Comparison of the TD Algorithms Used
Each of the detection algorithms used were individually tested for their suitability when combined with the spatial and spectral DR schemes selected. In order to validate which algorithm performed optimally, the proposed method was applied to a subset of the Selene data. First, a ROC analysis was performed with examples of ROC curves for each combination of TD and DR algorithms are shown in Figure 5 for the full spatial scene and in Figure 6 when combined with NDVI re . Figures 5 and 6 show the upper left quadrant of the ROC curves in order to highlight the differences between each of the methods used. As previously stated, ROC analysis, in isolation, is insufficient for comparing unbalanced binary classifiers [45]. However, it is interesting to note the disparity between the ROC curves from each of the TD algorithm outputs. In Figure 5, each of the algorithms used have near ideal ROC curves regardless of which spectral DR scheme is used when working on the full spatial scene. However when spatial DR is employed, only the ACE and CEM algorithms remain near ideal as seen in Figure 6. The AUC of the ROC curves increase for each spectral DR scheme when combined with NDVI re -based spatial DR and the ACE algorithm, as shown in Figure 6a. By simplifying the background, and therefore improving the covariance estimate, the ACE algorithm can achieve better separation between the known target and the estimated background. Similarly, by suppressing the background, the FIR filter weight estimation that is necessary for the CEM algorithm is simplified. This is reflected in the increased AUC values of the ROC curve when using CEM with NDVI re -based spatial DR, as shown in Figure 6b.
As well as ROC curves, PR curves were generated for each of the combinations of TD and DR algorithms with and without the NDVI re spatial DR. The PR curves of each of the TD algorithms when considering both the full spatial scene and with the application of NDVI re -based spatial DR are shown in Figure 7.
Investigating the PR curves shown in Figure 7 and the corresponding AUC values in Table 4 we see that using ACE, CEM and SAM generally all yield high AUC values for each of the spectral DR schemes used. When NDVI re -based spatial DR is used in combination with the spectral DR the AUC increases in almost every case, including on the raw data where no spectral DR is used. SID, when used on the full dimensionality data, provides an average AUC which is once again improved when using NDVI re -based spatial DR. The RXD performs well when using the full data and each of the spectral DR algorithms with the exception of PCA where it fails to discriminate target materials. This is due to the fact that, mathematically, PCA is the inverse operation of the RXD [49]. PCA exploits the redundancy of hyperspectral images by only retaining the PCs corresponding to the largest eigenvalues whereas the RXD works by investigating the anomalous data attributed to smaller eigenvalues which have been discarded.  Both the ROC and PR analysis were performed on a per-target basis. The results shown in Figures 5-7 and Table 4 are from the detection of a single target however they are generally representative of the performance over every target present in the scene. Along with the ROC and PR curves, the other performance measures detailed in Section 2.6 were calculated for each of the targets in the scene. These measures were then averaged in order to obtain an overview of each TD algorithm's general performance, the results of which can be seen below in Figure 8. Similar to the results drawn from Figures 5-7, each of the TD algorithms perform well when considering the AUC of the ROC curves. ACE and CEM give the highest AUC of the PR curves with ACE, CEM, and SID each performing better when combined with spatial DR. Generally using the spatial DR reduces the visibility with the exception of CEM and the RXD where it slightly increases. ACE gives the highest visibility when considering both the full scene and when using spatial DR indicating it is the best at separating the background from the target of the algorithms investigated. ACE and SID display the best precision, with both methods improving when using spatial DR. ACE also displays the highest balanced accuracy, F1 score and MCC of each of the detectors tested. For these reasons, the remaining results in this paper are generated using the ACE algorithm solely. It is interesting to note that, as well as reducing the sample size for increased efficiency in each of the detection algorithms, the performance after the application of spatial DR is generally as good or an improvement over using the full scene, as seen in Figure 8.

Results on the OP7 Dataset
The first of two datasets used in this paper was provided by BAE System. It consists of three images of a forest scene each portraying a common target area from overlapping views. The target area contains three calibration panels, one grey, one black and one white, which were used as the targets of interest. Figure 9a shows a false colour representation of one of the images with all three targets present in the scene. Figure 9c shows the same image masked using the NDVI re method detailed in Section 2.4. Figure 9b,d are enlarged views of the target areas of Figure 9a,c, respectively. Of the two datasets, OP7 is simpler as it contains fewer distinct materials and objects in the scene compared to the images from the Selene Trial dataset. The OP7 images also have a lower spatial resolution when compared to the Selene Trial data, with a Ground Sample Distance (GSD) of approximately 1 metre. As a result roughly nine pixels per target contain pure spectra.
In order to assess how each TD algorithm's behaviour varied with the number of components retained using each DR scheme, the F1 score, MCC, balanced accuracy and visibility were calculated at various values for K between K = 10 and K = L, where L = 100 for the OP7 data as shown in Figure 10. It must be noted that, when using FPCA, the dip in performance in each measure is a consequence of an implementation limitation which results in the creation of a singular matrix. This restricts the choice of the number of retained components and is discussed further in Section 4.
As seen in Figure 10, both balanced accuracy and visibility are largely invariant to the number of components, K, retained. Although interestingly, at lower values of K, the visibility using each spectral DR methods is greater than that of the raw data. Conversely, the F1 score and MCC both vary as the number of components increase to be equal to the original spectral dimensionality of the data, both with and without the application of spatial DR. This is to be expected, in the case where K = L, the data is functionally identical, although it has been remapped, and no information has been lost in the DR operation. By using spatial DR prior to spectral DR, both the F1 score and MCC are increased above what is achieved using the raw full dimensionality data without spatial DR. When comparing TD performance on the full spatial dimensionality images with that of the NDVI re masked images, both with and without the application of spectral DR, the F1 score and MCC both increase. However when spatial DR is applied, the average F1 score and MCC are considerably higher. The removal of the vegetative background discards highly disparate observations and simplifies the problem of separating background from target. This increases the precision of the detection as seen in Tables 5 and 6. By reducing the complexity of the background, the targets, although more similar to the remaining background, can be separated in the subspace more easily.
The MCC, in comparison to the F1 score, is slightly higher in both spatial DR cases as it takes into account the correct identification of the true negative class. The balanced accuracy drops slightly when NDVI re is applied. As the balanced accuracy is the average of the True Positive Rate (TPR) and True Negative Rate (TNR), the decrease in the size of the True Negative (TN) class, without a corresponding proportional decrease in FP, results in a lower balanced accuracy. Despite the increase in False Positive Rate (FPR) when using NDVI re , the absolute number of FP detections decreases. It can also be seen in Figure 10 that, by removing the easily separated vegetative background using NDVI re , the visibility of the targets decreases. This occurs because the materials which remain are, on average, more similar to the targets.
Further comparisons were made by retaining 20 components from each of the spectral DR methods as this provided a good balance of performance and compression. As shown in Figure 10, K = 20 components also gave clear improvements over the raw, full dimensionality, scene when combined with spatial DR. The improvement in detection when using spatial DR can be seen in Figure 11 where there is less confusion in the detection map where NDVI re is applied, Figure 11d. The target is the brightest object in the scene in each case, indicating good separability from the background. In order to quantify this improvement, the ROC and PR curves for both the full and spatial dimensionality reduced images are shown in Figure 12 for each spectral DR method where K = 20. The ROC curves in Figure 12a,b are of the ACE detection statistics on the full scene and NDVI re masked scene, respectively. The AUC of the ROC curves alone is not significant as, regardless of the spatial and spectral DR used, it remains nearly identical. The AUC of the PR curves (Figure 12c,d) when using the raw uncompressed data, PCA, MNF or ICA dimensionality reduced data increases when spatial DR is applied. However, when applying FPCA the AUC falls slightly.
The results from Figures 10-12 are all calculated from a single target in order to display an example of the performance achieved. The average results for each target are shown in Table 5 when considering the full scene and in Table 6 when spatial DR has been applied.  In general, as shown in Tables 5 and 6, the AUC of the ROC curves are similar regardless of the spectral and spatial DR used. The AUC of the PR curves varies with the spectral DR used with each of the methods providing an average AUC. Generally, employing spectral DR maintains the performance when considering the full spatial scene but when combined with spatial DR there is a slight decrease in the AUC of the PR curves. Applying NDVI re -based spatial DR improves the AUC when considering the full dimensionality data. The precision of the spatial DR coupled methods is increased in comparison to using the full spatial scene as certain false positives are removed either directly via the masking operation or indirectly by improving the spectral DR calculation. The recall drops slightly, however this may not be significant in TD applications as one pixel on target can be sufficient for the identification and classification of an object of interest. Figure 10 shows a drop in the visibility and balanced accuracy measures when applying spatial DR which is consistent for each of the targets, as shown in Figure 13. The visibility drops significantly when using the spatial DR as the highly dissimilar vegetative background is removed, making the average background and target spectra more similar. The balanced accuracy falls when using NDVI re as the TN class decreases without a corresponding drop in FP detections. The F1 score and MCC both increase when using NDVI re -based spatial DR when applied to the full dimensionality data as well as for each spectral DR scheme used. In nearly all of the measures tested, the full spectral dimensionality image with and without spatial DR performed the best of all methods on average with the application of spatial DR tending to improve the performance. Each of the spectral DR methods employed retained only 20 components of the original 100, reducing the computational complexity and cost of performing the TD while maintaining similar performance.

Results on the UDRC Selene Dataset
The second of the two datasets used in this paper was provided by DSTL. It consists of seven images of a different forest scene with a large concrete area with metal containers, vehicles and other objects captured over the course of two weeks in August 2014. Each image covers a different view of this common target area containing between five and seven calibration panels of various colours and materials with a GSD of roughly 0.3 m. Figure 14c shows the image masked using the NDVI re method detailed in Section 2.4 with Figure 14b,d providing an enlarged view of the target area from Figure 14a,c, respectively. The same process of plotting the F1 score, MCC, balanced accuracy and visibility of a target from the OP7 data against the number of components, as in Figure 10, was applied to one of the target materials (green ceramic) present in the images from Figure 14. These graphs can be seen in Figure 15. As in Figure 10 using the OP7 data, the average F1 score and MCC both increase with the number of retained components until K = L. ICA and FPCA both perform well on average at K = 20 whereas both PCA and MNF require more components to represent the data fully. Applying the spatial DR to each of the spectral DR methods improves both their F1 score and MCC regardless of the number of components retained. Similar to the results from Section 3.4, the balanced accuracy and visibility are lowered when using spatial DR because of the reduced TN class and more similar average background signature. As in the results gathered from the OP7 dataset, applying spectral DR improves the balanced accuracy and visibility of the full spatial scene at lower values of K. The remaining results were obtained with K = 20 as it provided a good balance between detector performance and compression. The results shown in Figure 15 also indicate that improved performance could be obtained at K = 40 at the expense of compression rate. It must again be noted that FPCA requires more careful consideration when selecting the value of K in order to avoid the creation of a singular matrix and subsequent drops in performance as seen in Figure 15. This is discussed in detail in Section 4.
Similar to the results obtained on the OP7 dataset in Figure 11, removing the vegetation and simplifying the background class improves separation between the synthetic background and targets. Whilst there is an overall decrease in target visibility, as the average spectra is more similar to the desired targets, there is less varied information to be represented, either in the full dimensionality image or in a dimensionality reduced subspace. This leads to less confusion in the detection image, as shown in Figure 16d, where the clutter present in the scene is less likely to be misidentified as a target, when compared to Figure 16b. The ROC curves in Figure 17a,b are of the ACE detection statistics on the full scene and NDVI re -based spatial DR scene, respectively. The two sets of ROC curves are almost identical and do not provide definitive results, but indicate a small improvement when using the spatial DR. Comparing the PR curves in Figure 17c,d shows that when each spectral DR scheme is used in conjunction with spatial DR the AUC is increased by 10-15%. The average results for each target in the Selene dataset are shown in Table 7 when considering the full scene and in Table 8 when spatial DR has been applied. The average performance of the ACE detector when combined with each spatial and spectral DR method used are shown in Figure 18.   In general, from Tables 7 and 8, the AUC of both the ROC and PR curves is similar regardless of the spectral and spatial DR used. By applying NDVI re -based spatial DR, the precision of the spatial DR coupled methods increases in comparison to using the full spatial scene with the recall dropping slightly. As seen in Figure 15, there is a decrease in visibility of the target when using the spatial DR as the highly dissimilar vegetative background is removed. Figure 18 shows that, on average, the visibility drops slightly for each of the spectral DR methods when NDVI re is applied when K = 20 components are retained. The balanced accuracy also decreases slightly due to the reduction in the size of the TN class. Both the F1 score and MCC are improved when using spatial DR in all methods tested. The full dimensionality images with and without spatial DR have the best performance. However, of the spectral DR methods used, MNF, FPCA and ICA perform similarly despite retaining the equivalent of only 25% of the total spectral components. When combined with spatial DR both ICA and FPCA maintain similar levels of performance compared to the full dimensionality image with no spatial DR applied. Applying the proposed method to the Selene dataset ( Figure 18 and Tables 7 and 8) allows for improved results, however these improvements are not as significant as those achieved from the processing of the OP7 dataset. This can be attributed to the increased complexity of the Selene trial images when compared to the OP7 data. The performance can be improved further by retaining additional DR components, as shown in Figure 15, albeit at the expense of compression and therefore at an increased computational cost.

Discussion
The proposed NDVI re -based spatial DR is relatively simple, requiring information from only two wavelengths and can be readily applied to TD and other similar applications. By using NDVI re it is possible to detect varied spectral targets composed of metals, plastics and other synthetic materials against a vegetative background. NDVI variants allow for the discrimination between vegetative and non-vegetative pixels due to known material characteristics in the red-edge portion of the spectrum. Other VIs, whilst not considered here, as exploiting the red-edge portion was determined the key component of this method, may provide alternative insights and allow for the more optimal detection of additional materials in alternative environments. By combining both spatial and spectral DR, the computational complexity and memory requirements can be reduced whilst maintaining, or in some cases improving upon, detection performance as shown in Figures 13 and 18. Using spatial DR had little effect on the AUC of the ROC or PR curves, the main improvements came from the increased F1 score, MCC and precision. On average, there is a slight reduction in recall and balanced accuracy, however, one correctly detected and classified pixel per target may be sufficient for certain applications.
The complexity and performance of the spectral DR methods utilised varies. PCA is the simplest method used but also requires the most spectral components to be retained in order to be competitive. Applying the spatial DR and simplifying the background prior to performing spectral DR improved the performance of all methods but most notably when using PCA, which was competitive in both datasets with the addition of spatial DR. MNF can be conceptualised as two PCAs, one for noise reduction and the second to transform the noise whitened data into the reduced subspace. This extra noise removal step offers a distinct advantage when compared to PCA and allows it to perform similarly to FPCA and ICA. FPCA performed favourably in both datasets and is efficient given the simplification when calculating the partial covariance matrix. However, when using FPCA the choices of the number of components, K, and the height, H, and width, W, of the folded matrix are far more sensitive than the other methods and are subject to two rules:

1.
K must be a factor of the total number of wavelengths L or 2.
When selecting the folding parameters H and W, L > (H − 1)W In any case where the first rule is true, the expression in the second rule will automatically be valid. H was selected to be half the value of K in order to adapt with the changing number of components. However, the folded array must be padded with zeros in order to fulfil the expression H × W = L, if these zeros formed an entire row of the covariance matrix they will form a zero component in both the projected image and target. When these interact in each of the TD algorithms, usually by inner product, it forms a singular matrix. As inverse matrices are prevalent in the implementations of the TD algorithms used, singular matrices completely suppress the detection. This phenomenon caused the undulating behaviours present in Figures 10 and 15 and informed the choice of the number of DR components in order to compare each TD algorithm. ICA is the most complicated and computationally expensive method, but performed well on both datasets. Only using the full dimensionality data, with and without spatial DR, was an improvement over the ICA based methods. In general the spectral DR methods, whilst increasing the balanced accuracy and visibility when smaller numbers of components are retained, decrease the F1 score and MCC when compared with the raw full dimensionality data. Both FPCA and ICA offer consistent and improved detection when combined with ACE and NDVI rebased spatial DR. In general, the most impressive results are obtained using the ACE TD algorithm which corroborates the conclusions of other similar works investigating this topic [11,14,17].
The methods detailed here offered improvement to the TD performance on both datasets considered. However greater improvements were obtained on the simpler dataset. Increasing the number of spectral DR components retained to account for the increased variability in the Selene dataset would improve the performance. This is at the expense of the compression rates that can be achieved at lower values of K. On average applying NDVI re -based spatial DR increases precision and slightly decreases the recall of the TD algorithms used. The visibilities of the targets decrease as background pixels which are dissimilar to the targets are not considered. The average background signature, after applying the NDVI re -based spacial DR, becomes more similar to the target signatures. However, applying spectral DR and mapping the data into a more informative subspace can alleviate this issue.

Conclusions
DR is a tool often employed in various hyperspectral imaging applications, usually to reduce the number of spectral bands present in an image due to its high spectral redundancy. However, known spatial redundancies are rarely exploited. This paper provides an investigation into how spatial DR can be utilised in a TD application. We have shown that in each case tested using multiple spectral DR schemes, the addition of a spatial DR pre-processing stage improved the performance of the TD algorithm considered. By applying both spatial and spectral DR the complexity of the data is reduced and computational cost and memory requirements can be lowered.
We used robust, classical TD/AD and DR algorithms in order to assess the proposed method. The provision of a priori information gives the TD algorithms an advantage over AD algorithms like the RXD for example. Whilst the RXD correctly identifies the anomalous pixels, it fails to discriminate between specific target spectra resulting in low precision. Therefore, AD is insufficient for the application we are proposing. Of the detection methods tested, the ACE algorithm performs the best both when considering the full spatial scene and when applying the NDVI re -based spatial DR-especially when combined with the FPCA and ICA DR algorithms.
We have shown that the proposed pipeline can compress an input image by >90% whilst maintaining the detector performance seen in the processing of the raw images. This pipeline is readily applicable in TD scenarios where the predominant background is comprised of vegetative pixels. The proposed method may be adapted to suppress other, highly predictable, background signatures given an appropriate index. Indices such as the built-up index could provide the inverse to NDVI and its variants masking non-vegetative pixels directly, or alternatively providing auxiliary features. Additionally, multiple indices can be generated rapidly and combined to provide additional information about the pixels in a scene. Existing indices could also be used in the detection of camouflaged objects or bespoke alternative measures may be developed.
Potential future work includes using an adaptive method for selecting the optimal number of components, K, to retain in each DR method. In PCA, MNF and FPCA, variations on scree plots [20] can be used to find the elbow point. Alternatively, the value of K at which the number of components represent a sufficient percentage of the variance in the data could be chosen. Similarly for ICA, VD [28] can be used to estimate the number of spectrally distinct sources in the image and allows for the automation of this approach.
Although the proposed spatial DR approach has been tested on classical DR and TD/AD algorithms more state-of-the-art approaches to spectral DR could be considered as well as more complex detection algorithms. While the visibility of the target generally dropped when using spectral DR, the detection was improved and so a measure which can determine how distinctive the target is within the reduced subspace would be of benefit. Along with spectral DR other methods of spatial DR could be considered.
In order to avoid saturation of tables and results, the most informative and interesting results were included here. The full set of results generated from this work will be available online at a later date.