Siamese Network Ensembles for Hyperspectral Target Detection with Pseudo Data Generation

: Target detection in hyperspectral images (HSIs) aims to distinguish target pixels from the background using knowledge gleaned from prior spectra. Most traditional methods are based on certain assumptions and utilize handcrafted classiﬁers. These simple models and assumptions’ failure restrict the detection performance under complicated background interference. Recently, based on the convolutional networks, many supervised deep learning detectors have outperformed the traditional methods. However, these methods suffer from unstable detection, heavy computation burden, and optimization difﬁculty. This paper proposes a Siamese fully connected based target detector (SFCTD) that comprises nonlinear feature extraction modules (NFEMs) and cosine distance classiﬁers. Two NFEMs, which extract discriminative spectral features of input spectra-pairs, are based on fully connected layers for efﬁcient computing and share the parameters to ease the optimization. To solve the few samples problem, we propose a pseudo data generation method based on the linear mixed model and the assumption that background pixels are dominant in HSIs. For mitigating the impact of stochastic suboptimal initialization, we parallelly optimize several Siamese detectors with small computation burdens and aggregate them as ensembles in the inference time. The network ensembles outperform every detector in terms of stability and achieve an outstanding balance between background suppression and detection rate. Experiments on multiple data sets demonstrate that the proposed detector is superior to the state-of-the-art detectors.


Introduction
Hyperspectral imaging is a developing area in remote sensing in which a hyperspectral spectrometer collects hundreds of narrow contiguous bands over a wide range of the electromagnetic spectrum [1]. Different from target detection in natural images [2], hyperspectral target detection aims to distinguish specific target pixels from the background in given HSIs with few prior spectral information of the target, which has been the focus of the remote sensing interpretation research [3].
In the past few decades, several classic hyperspectral target detectors have been proposed [3]. The spectral angular mapper (SAM) [4] and spectral information divergence (SID) [5] perform detections based on distance measurements. According to the background modeling, Zhang et al. [6] divided the traditional detection model into structured background models and unstructured background models. Structured background models include the constrained energy minimization (CEM) [7], the orthogonal subspace projection (OSP) [8], target-constrained interference minimize filter (TCIMF) [9], etc. Unstructured background models regard the background as samples from a multivariate Gaussian distribution, such as the generalized likelihood ratio test (GLRT) [10], adaptive coherence/cosine estimator (ACE) [11], and adaptive matched filter (MF) [12]. Most traditional detectors are generated target and background spectra, supervised target detection methods [21,34,39,40] realized novel performance. However, convolutional-based feature extraction modules are designed with many convolutional layers, introducing much computation burden. In addition, the suboptimal initialization of these supervised methods will impact the performance of networks, and initializing networks with random parameters will lead to fluctuations in performance [41].
Siamese networks are capable of recognition with little available data with multiple networks sharing the parameters, proved by the scholars of [42], which is propitious to target detection with the single prior spectrum. This paper proposes a supervised Siamese fully connected target detector (SFCTD) composed of nonlinear feature extraction modules (NFEMs) and cosine angle distance-based classifiers. Two NFEMs, which extract discriminative spectral features of input spectra-pairs, are based on fully connected layers for efficient computing and share the parameters to ease the optimization. We utilize the cosine angle value of SAM measurement as the differential criterion to optimize the parameters of NFEMs. The cosine angle distances of spectral feature pairs represent the similarities of the input spectral pairs, serving as the target confidences of the test spectra. To solve the few samples problem, we propose a pseudo data generation method based on the linear mixed model and the assumption that background pixels are dominant in HSIs. For avoiding the impact of suboptimal initialization and achieve stable detection, we optimize several Siamese detectors independently and detect targets with the network ensembles.
The contributions of our work are summarized as follows.
(1) A Siamese fully connected hyperspectral target detector (SFCTD) is proposed, consisting of nonlinear feature extraction modules and cosine angle distance based classifiers. (1) A pseudo data generation method is proposed to create numerous positive and negative spectral pairs with discrete similarity labels, i.e., 0 or 1. The SFCTD is effectively optimized with the generated spectral pairs. (3) A detection ensemble method is proposed for improving detection performance and stability. The Siamese detector ensembles outperform other state-of-the-art algorithms regarding the accuracy, recall, and background suppression, validated on multiple complex HSI data sets.
The remainder of this paper is organized as follows: Sections 2.2-2.4 introduce the methods of the proposed hyperspectral detector. Section 2.4 and 2.6 introduce the information of experimental data sets and implementation details. Section 3 presents the experimental results and ablation studies of the proposed method. The discussion and conclusion are drawn in Sections 4 and 5.

Abbreviations Define
For the convenience of the subsequent description, let x i ∈ R l denote the i-th spectral vector of the HSI X ∈ R n×l with prior target spectrum x prior ∈ R l , where n is the number of pixels in the HSI and l is the number of spectral bands. The generated training data set D consists of positive spectral pairs, where y i , y i represent spectral pairs associated with test spectrum x i . The batch size of the training data set is denoted b and number of mini batches of an HSI with n pixels equals n/b. The NFEM is denoted f , and the test data pairs are denoted D test .

Siamese Hyperspectral Target Detector
As shown in Figure 1, the proposed Siamese detector consists of two NFEMs with shared structure and parameters. Each input spectral pair consists of a prior spectrum and a test spectrum. We separately feed the spectral pairs into each NFEM and compute the cosine angle distance of the transformed output features. The cosine angle values of SAM measurement represent the probabilities of the test spectra belonging to the target category.
To extract the features of the spectra effectively, i.e., 1-D vectors, the proposed NFEMs utilize fully connected networks instead of 1-D convolutional networks. Although convolutional networks have fewer parameters than fully connected networks because of the weight sharing of convolution operation, they have much more computation and random access memory burden. Specifically, each NFEM comprises a single batch norm layer and two fully connected blocks (FC blocks). Each FC block consists of a fully connected layer, a batch norm layer, and a nonlinear activation layer. We will illustrate each component successively in the following paragraph. Figure 1. The training stage pipeline of the Siamese detector consisting of training data generation, spectral feature extraction, cosine distances and losses computation. x prior , y + i , and y − i represent the prior, target, and background spectrum, respectively. c i , represents c − i or c + i , is the similarity label of the input spectral pair, (x prior , y − i ) or (x prior , y + i ). Data generation provides spectral pairs with similarity labels. Each spectral pair is separately fed to weight-shared feature extraction modules. We optimize the network parameters with the cross-entropy loss between angle distance and similarity labels.
The amplitudes and waveforms of spectra in an HSI vary in positions because of different imaging and surface conditions, as shown in Figure 2a. Assuming that all the spectra belonging to one category are independently sampled from the same multidimensional random distribution, the distributions of target and background spectra are different because of their different physical property. However, the significant variance and mean shift of the test spectral distribution may impact the effectiveness of the feature extraction module. Hence, we preprocess the input spectral distribution at the beginning of feature extraction with a batch norm layer to reduce the shift and optimization difficulty. Instead of normalizing the spectral distribution with zero mean and unit variance, which may be unideal for optimizing loss functions, we use batch normalization (BN) to transform the original spectral distributions to distributions with learnable statistical parameters. Specifically, for the spectra mini batch . . bm}, m ∈ {1, 2 . . . , n/b}, BN normalizes the distribution of the spectra batch, B m , and transforms it to a distribution with a learnable mean β and variance γ 2 . The target spectra after BN is shown in Figure 2b. The equation of the BN process is: where µ m , σ 2 m are the mean and variance of input spectra and Y + i , Y − i are the positive and negative spectral pairs for supervised learning optimization. The detailed method and purpose of generating these spectral pairs are illustrated in Section 2.3. In the training stage, µ m and σ 2 m change with the forward propagation of each mini-batch while β and γ 2 are optimized by the loss function in the backward times. In the testing stage, all the parameters are fixed for every process of each spectral pair. Experiments in Table 1 validate that BN enlarge the distribution difference between target and background spectra. We note that the learned parameters of the batch norm layers change with the HSIs and prior spectra.
(b) Spectra after batchnorm layers. Figure 2. Visualization of the spectral distribution normalization. The batch norm layer mitigates the mean and variance shift of the input spectra-pairs. Each spectrum before and after the batch norm layers are painted with the same colors in (a,b). Different colors represent different spectral samples. After the BN operation, spectral pairs are separately fed to two FC blocks to generate discriminative spectral features. In each FC block, spectra are fed to the fully connected layer, batch norm layer, and nonlinear activation layer successively. A fully connected layer with weight W = [w 1 , w 2 , . . . , w l ], w ∈ R l transforms a spectrum with l band to a low-level feature space with l dimension. Each vector w k , k ∈ [1, l ] serves as a liner classifier for the spectra detection of the test HSI, which highlights background or target spectra to improve the spectral discriminability. Notably, the batch norm layers of the FC block play different roles to that of the preprocessing layers. The BN of the FC block converts the spectral features into the unsaturated interval of the activation function, which is usually operated before the nonlinear activation layers. The Sigmoid layer helps the NFEMs extract nonlinear spectral features for accurate detection. Finally, we obtain discriminative spectral features for cosine angle distance computation through spectra transformation of preprocessing and two FC blocks.
For the input spectral pair (y i , y i ), the transformed spectral vector pair is ( f (y i ), f (y i )). Different from [34,39], both of which utilize a fully connected layer to classify the feature subtraction of the input pairs, we derive a simple cosine angle distance classifier from SAM measurement. Specifically, we utilize the cosine angle distances of the two output vector pairs as the classification confidence, which equals the cosine angle values of SAM measurement. The angle distance is simple to compute and easy for derivation. The formula of the cosine angle distance-based classifier is as follows: where c i is the cosine distance, f (y i ) 2 is the Euclidean norm of vector f (y i ), and | · | represents the inner product of two vectors. Compared with subtraction of spectral pairs in [34] and spectral features subtraction in [39], the cosine angle distance of the proposed method is magnitude invariant. The similarity label of the spectral pair is a discrete value, 0 or 1, which is the supervised label of the cosine distance. Considering the target detection as a classification problem, we utilized binary cross-entropy (BCE) to measure the distance between the similarity labels and cosine similarities. The optimization function L m of mini batch B m is: where c + i denotes similarity of positive spectral pair, and c − i indicates the similarity of a negative spectral pair, and b represents the batch size. It is worth noting that the mini-batch B m includes b positive spectral pairs and b negative spectral pairs generated by the identical spectra of the HSI. A detailed description is illustrated in Algorithm 1.

Forward and backward propagation of the Siamese detectors:
1: Shuffle the order of the spectral pairs of data set D; . . , bm}, m ∈ {1, 2 . . . , n/b} to the Siamese network, obtaining transformed feature f s (B m ); 3: Compute the cosine angle distances of each transformed feature pairs following . . , bm}; 4: Compute the cross entropy distance between the detection results and labels with BCE follow Equation (3); 5: Optimize the parameters of D s with the BCE loss.

Pseudo Data Generation Method
We generate numerous pseudo data with positive and negative spectra-pairs to optimize the Siamese detectors. The negative spectral pairs comprise prior and background spectra, which are utilized to optimize the detector to filter the background pixels. In addition, the positive spectral pairs comprise prior and target spectra, which help to optimize the detector to distinguish target pixels with spectral variations. However, the known target and background spectra are not directly obtainable from the HSI. To solve this lack of data problem, we generate numerous background and target spectra based on the dominant background pixels and an LMM, as illustrated in Figure 3. To obtain background spectra from the test HSI, which contains both target and background spectra, we assume that background pixels are dominant in the HSIs. Based on this assumption, we consider each spectrum of the test HSI as a background spectrum and is different from the prior spectrum. The combination of background spectra and prior spectra make up negative spectral pairs Y − i , the specific formula of which is: Although a few target spectra may be labeled as background spectra mistakenly, this does not reduce the detection performance because the target pixels are far fewer than the correctly labeled background spectra. The effectiveness of the background spectra generation method is demonstrated by experiments conducted on several real data sets, as shown in Figure 4. To create multiple target spectra in addition to single prior spectra, we generate simulated target spectra by mixing up prior spectra and background spectra based on the LMM. The LMM assumes that the mixed spectrum x is a linear combination of target spectrum e t and background spectrum e b with abundance coefficients a t and a b , respectively. The formula is as follows: Since test spectra are different in amplitude, which may be much larger or smaller than the prior spectrum, we uniform the test spectra and adjust their amplitudes to those of the prior spectrum. The adjusted test spectra with small random weights multiplied are linearly mixed with the prior spectra, generating simulated target spectra, x mixed i . The visualization of background spectra and its associated simulated target spectra are exhibited in Figure 5, and the formula of the simulated target spectra generation is: where λ is the ratio of background spectrum, and we set it as 0.1 for all the data sets. The abundance value of the target and background endmembers are 0.9 and 0.1, which means the resulting spectra is dominated by the target spectrum and can be seen as target spectra.
It is worth noting that our target spectra generation method does not need to estimate the specific categories of the background spectra. Each spectrum e b is regarded as the spectral noise added to the single prior spectrum. After obtaining the simulated target spectra, we combine the prior spectrum and target spectrum generating positive data pairs Y + i , as follows: The training data set, composed of positive and negative data pairs, is divided into m = n/b mini-batches with batch size b. In each mini-batch, we use the identical spectra from the HSI to generate an equal number of positive and negative samples. Although the prior spectra in each mini-batch are the same, all the prior spectra are fed to the feature extraction module in the training stage for proper parameter updating of the batch norm layers.
(b) Simulated spectra. Figure 5. Visualization of simulated target spectra generation. Normalized background spectra are linearly mixed with prior spectra, generating augmented spectra. Each background spectra and associated simulated target spectra are painted with same colors in (a,b). Different colors represent different spectral samples.

Detection Ensemble Method
The performance of deep-learning-based detectors varies with model initialization, and suboptimal parameter initialization will impact the optimization and performance of the proposed detector. Specifically, detectors with stochastic initialization and data set shuffling will perform better than the average level. Although the probability of obtaining ideal parameter initialization is moderate, it is not easy to find the specific distribution of the ideal initialization. To achieve stable detection, we propose a simple but effective ensemble method, as shown in Figure 6. Relying on the moderate probability of ideal stochastic initialization, we optimize and aggregate multiply Siamese detectors, D 1 , . . . , D N , to obtain a high-performance detector with a higher probability. Specifically, the final detection map C is generated by averaging the detection map of each single Siamese detection following Equation (8). Experiments validate that the Siamese detector ensembles outperform every single detector, which means the multiple independently optimized detectors complement each other. As shown in Table 2, the ensemble result also shows better stability than each single detection result.
In the training stage, each Siamese detector is initialized with different stochastic parameters and trained with varying shuffles of the data sets, making sure multiple Siamese detectors are independent. Compared with other convolutional-based detectors [34,39,40], our proposed detector is based on the fully connected neural networks and the computing burden of which is much lower than that of convolutional-based detectors. Hence, we could parallelly optimize multiple detectors to improve detection stability using the parallel computing capability of GPU. In the testing stage, we follow the pipeline illustrated in Algorithm 2. Before the detection of N siamese detectors, we generate test spectra pairs by combining the prior spectrum x prior and each test spectrum x i . Then, spectral pairs of test data set Y test are fed to N Siamese detectors and generate detection maps, C 1 , . . . , C N . We ensemble all the results through the bagging approach to obtain the high-performance detection results without ground truth labels and manual intervention.  Compute the cosine similarity of the transformed features, obtaining each cosine similarity C s , following Equation (2); 3: Average the similarity predictions of the N detectors to obtain the final results, C, following Equation (8).

Information of Experimental Data Sets
We used six real data sets and one synthetic data set to validate the proposed method, and the pseudo-color images are shown in the first row of Figure 7. All the real HSIs were captured by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS). All the data sets we selected are used as the experimental data by other published hyperspectral target detection data sets. Six test data sets and ground truth maps. The first row shows the pseudo-color images of the test data sets. The second and third rows are the ground truth maps and prior spectra candidate maps. A 3 × 3 morphological erosion operation is conducted to the ground truth maps to obtain the prior spectra candidate maps. The Cuprite and Synthetic data sets have pure endmember spectra as priors.
All the data sets provide the target ground truth maps, but only two data sets (Cuprite and Synthetic) provide the prior spectra according to the USGS Digital Spectral Library [43]. For the data sets without prior spectra, scholars usually follow Manolakis et al. [3] to average the target spectra according to the ground truth maps as the prior simulated spectra. It is worth noting that the spectra of target boundary pixels reflect the interference of the target and background, which are different from the pure target spectra and are impractical to obtain in actual application scenarios. Therefore, we did not average all the pixels of the ground truth map to generate the prior spectra.
Except for the Cuprite and Synthetic data sets, which provide pure endmember spectra as prior spectra, we conducted a morphological erosion operation to the ground truth maps to obtain the prior spectra candidate maps, and their average spectra represent the prior spectra. The Erosion operation is one of two basic operators in mathematical morphology. The erosion operation using B on binary ground truth map G is defined as: where E is defined as Euclidean space and B z is B shifted by z. This paper uses a positive 3 × 3 kernel as B. The morphological erosion is operated on the ground truth maps to discard the edge pixels. The ground truth annotation maps G, and the annotation maps after corrosion G B operations are shown in the last two rows of Figure 7, respectively. The captured places and images details are listed below: • San Diego Airport: The San Diego Airport data set was captured at San Diego with a 200 × 200 pixel size, which contains six airplanes and several backgrounds such as buildings and the parking apron. We selected three planes with the same size as targets, and target pixels annotation is the same as previous for a fair comparison, and the number of the target pixels is 134. Due to water-vapor absorption and atmospheric effects, we selected 189 bands from a total of 224 bands for the experiments. • San Diego Beach: This hyperspectral data set was captured in San Diego with a atial size 100 × 100 pixels. The scene of the data set is the beach, and the target pixels number is 202. The target annotation map refers to the data set from [21,38,44]. • Texas Coast 1, Texas Coast 2: These two hyperspectral data sets were captured along the Texan coast with a spatial size of 100 × 100 pixels. Several storage tanks were selected as targets from the urban scene, consisting of 67 and 155 pixels. The target annotation maps refer to the data set from [21,38,44]. The band number of the two data sets are 204 and 205. • Synthetic: The spectra of the synthetic data set were generated from the USGS Digital Spectral Library, and there are 15 endmember spectra. For the comparison, we used the Labradorite HS17.3B endmember as the detection target, which is the same as [15]. Since all the methods used for comparison achieved 1 AUC in the clean data set, Gaussian white noise signal-to-noise ratios (SNRs) of 15 dB and 20dB were added to the original images. • Cuprite: The Cuprite data set was captured in the Cuprite mining district of Nevada in 1997, where a subset of the images have a spatial size of 250 × 191 pixels. We selected buddingtonite from 14 kinds of minerals as the target for a fair comparison. The band number utilized is 188, and the spectra of buddingtonite in the United States Geological Survey (USGS) Digital Spectral Library was selected as a single prior spectrum.

Implementation Details
The experiments in the paper are run through Pytorch in Python on a computer with an Intel(R) Core(TM) i9-9900X CPU 3.50 GHz, GTX Titan Xp and 32G memory. We utilized the Adam optimizer for all the experiments and set the learning rate to 0.0005 and weight decay to 0.0005. Before training, the fully connected layers are initialized with a mean of 0 and a standard deviation of 0.001. We initialized the batch norm layers with β of 0 and γ of 1.

Results
In this section, we introduce the results of the experiments. First, we conduct the hyperparameters sensitive experiments and select the proper parameters. Then, we conduct detection performance comparisons with seven state-of-the-art (SOTA) methods in terms of two-dimension receiver operating characteristic (2-D ROC) curves, area under the curve (AUC) values. Since the 2-D ROC curve could not reflect the background suppression capabilities of the detectors, we supplement the box plots of detection confidences of all the comparison detectors for quantitative comparison and detection map visualizations for qualitative comparison. We use four Siamese detector ensembles for all the comparison experiments for consistency. The compared methods include four traditional methods (SAM, MF, ACE, and CEM), two advanced CEM-based methods (HCEM, ECEM), and one deep learning detector (TSCNTD) similar to our methods. Finally, we conduct an ablation experiment to study each component of the proposed method.

Hyper-Parameters Sensitive Experiments
Hyper-parameters are the parameters that are set before the training of networks. Ideal hyper-parameters could improve the performance of trained networks. Before comparing our method with other detectors, we first apply sensitive experiments of the hyperparameters of the proposed method and select the proper hyper-parameters.
We set a different batch size, training epoch number, background spectra abundance λ, learning rate, and ensemble number and evaluate the detector performance under different hyper-parameters with AUC values in five test data sets except for the Cuprite data. For the setting of each parameter, we repeat ten times and compute the means and standard deviations of AUC values. The hyper-parameters candidates and experiment results are illustrated in Figure 8. The experimental results can prove that background spectra abundance λ is not sensitive, and we select λ = 0.1 for all the data sets. The experimental results of epoch number and ensemble number reveal that the detection performance of the proposed Siamese detector is better and more stable with more training time and ensemble numbers. To make a trade-off between performance and efficiency, we optimize four Siamese detectors for ten epochs using the parallel computing capability of GPU. As for the learning rate, the detector's performance with a learning rate larger than 5 × 10 −3 is much more fluctuant. Therefore, we set the learning rate as 5 × 10 −4 on all the data sets. Since the proposed detector is trained with fixed epoch numbers on all the data sets, the network may not be trained well with a large batch size because the iteration numbers are small. We set the batch size as 128 for the Cuprite data set because of its large image size and batch size 32 for all the other data sets.

Background SuppressionComparision
In Figure 9, six detection maps are visualized for the background suppression qualitative comparison. A higher visualization contrast-detection maps mean better background suppression capabilities. Among all the detectors, the ACE, HCEM, ECEM, TSCNTD, and proposed detector show higher visualization contrast than the SAM, MF, and CEM. However, the targets' integrity of our detection results is better than that of ACE, HCEM, and ECEM. Take, for example, the experiment on the San Diego Airport data set, the left and top planes of the ACE detection map are a bit blurred. The detection results of HCEM and ECEM fail to detect the margins of the target. Furthermore, the false alarm detection rate of the Siamese detector with NFEMs is better than that of TSCNTD; the latter detects many background pixels as the target.  Figure 10 shows a antitative comparison of background suppression. The red and green boxes reveal the confidence distributions of target and background pixels. Specifically, the wider a green box is, the larger the standard deviation of the background confidence distribution, which means background pixels' confidence is in an extensive range. The lower a green box is, the smaller the mean of background confidence distribution is. Generally, the methods with good background suppression have flat and low green boxes and the red boxes whose lower quartiles are far away from the green boxes, which means most target pixels have higher confidence than background pixels. The box plot results prove the point consistent with that detection maps visualization results prove. CEM, SAM, and MF have wider and higher green boxes than the other methods. For the methods with low detection rates, such as HCEM, ECEM, and TSCNTD, the lower quartiles of their target boxes are close to zero in the data sets except for Synthetic, because many target pixels fail detect and their confidences are low. Although their green boxes are flat and low, the lower quartiles are close to the green boxes. For our method, the background boxes are flat and close to zero. Meanwhile, the lower quartiles of our target boxes are far away from the upper limits of the background boxes.
To sum up, our method achieves an outstanding balance between background suppression and target detection recall and outperforms the other comparison methods.

Roc Curves Comparision
ROC curves results are shown in Figure 11. ROC curve reflects the detection results in terms of detection rate and precision. The black line of each ROC curve represents the ensembled detection results of our method. The proposed Siamese detector shows a competitive detection rate in most false-positive rate thresholds for all the data sets. For the San Diego Airport data set, our method achieves a competitive detection rate under a low false-positive rate (0-10 −3 ) and the best detection rate as the growth of the false positive rate. Except for the San Diego Airport data set, our method surpasses all the comparison methods under almost all the false-positive rate thresholds. Especially for two data sets captured at Texas Coast, our method's curves are more than 20% higher than the other curves under 10 −5 false-positive rate. For the San Diego Beach and Cuprite data sets, our curves outperform other methods almost 25% under the low false-positive rate between 10 −3 -10 −1 .  Table 3 exhibits the AUC value results for all the test hyperspectral data sets except the Synthetic data set. The proposed Siamese network achieves the best AUC values in all the data sets except the Texas Coast 1 data set and surpasses the other methods by large margins, especially for the San Diego Beach, Cuprite, and Texas Coast 2 data sets. Specifically, our method outperforms the second-best methods by 0.008, 0.162, and 0.061 on the San Diego Beach, Cuprite, and Texas Coast 2 data sets. Since the Synthetic data set has random white noise, we repeated the test 10 times and calculated the mean and standard deviation of the AUC values, illustrated in Table 4. The proposed Siamese detector surpasses all the comparison methods under two noise conditions. The Siamese detector's lowest standard deviation of AUC values reflects its excellent detection stability under noise interference.

Comparison with TSCNTD
To validate the superiority of our proposed Siamese detector to TSCNTD, we optimize the TSCNTD and Siamese detector with the same training data and compare the performance in terms of AUC values, test time, and stability. We repeat the training and testing ten times and compute the two methods' AUC value means and standard deviations. The experiment results are exhibited in Table 5.
According to the experimental results in Section 3.2 and AUC means of Table 5. The proposed Siamese detector outperforms the TSCNTD in detection recall and precision. As shown in the standard deviation results in Table 5, the Siamese detector ensembles outperform the TSCNTD in terms of stability with the help of the detection ensemble method. Moreover, by using a few fully connected layers rather than many convolutional layers, our method's test speed is six times faster than TSCNTD. Since the batch size and training epoch numbers are different, we only compare the test times for fair. In conclusion, the proposed Siamese detector ensembles are superior to the TSCNTD in performance, stability, and efficiency detection. Table 5. Detection performance and efficiency comparison of the two-stream convolutional networkbased target detector (TSCNTD) and our method. The best results are reported in bold.

Ablation Study
In this section, we study the effectiveness of batch norm layers, Sigmoid layers, generated positive spectra pairs, and detection ensemble methods. We don't present the detection performance optimized without negative spectral pairs because the positive data failed to optimize the network alone. Figure 4 shows the 2-D ROC curves of the single Siamese detector without batch norm layers, Sigmoid layers, and positive spectra pairs compared with the normal one in two selected data sets. The ROC curves of the norm single Siamese detector are much better than that of the detector without Sigmoid layers and batch norm layers. Table 1 shows the AUC values of the curves in Figure 4. The Siamese detector with all the contributions has the largest AUC values.
To demonstrate the effectiveness of the proposed detection ensemble method. We optimize four randomly initialized Siamese detectors independently and compare their AUC values with ensembles. We repeat the experiment 10 times and exhibit the results in Table 2. The ensemble result surpasses all the other detectors in terms of standard deviations and means of AUC values, which validates the stability improvement of the proposed detection ensemble method.

Discussion
There have a few supervised learning-based detectors similar to our method, CNNTD [40], HTD-Net [34] and TSCNTD [39]. These three methods adopt convolutional networks for spectral extraction and employ no nonlinear activation layers for the network structures. CNNTD and HTD-Net input the network with spectral differences, reducing feature discriminability. The TSCNTD cleverly designed two-stream networks that separately apply to the prior and other spectra and solve the problem in CNNTD and HTD-Net, which makes TSCNTD superior to CNNTD and HTD-Net [39]. However, TSCNTD uses two convolutional networks with nine layers to process the spectra pairs, making it slower than our method. Our proposed detector derives from the Siamese network, which is capable of recognition with little available data [42]. Similar to the network structures in [21,35,38], we design the Siamese detector with fully connected layers and nonlinear activation layers. Since Zhu et al. [39] has proved the superiority of TSCNTD to CNNTD and HTD-Net in terms of performance and speed, we only compare the proposed detector with TSCNTD.
Although TSCNTD has achieved great performance, the computation burden of convolutional networks is heavy. Moreover, the parameters are redundant and introduce optimization difficulty. Specifically, the upper stream is only responsible for the feature extraction of the prior spectrum, a constant vector, which makes up almost half the parameters. Hence, Zhu et al. [39] proposed a regularized cost to optimize these numerous parameters. Our method solves these problems with a Siamese detector that comprises two fully connected network sharing parameters. Parameters sharing reduces the number of parameters and reduces the difficulty of optimization. We also introduce nonlinear layers to improve the feature extraction capability. Experiments in Table 5 validate that the proposed detector is more effective than TSCNTD.
Shi et al. [41] propose a semisupervised domain adaptive few-shot learning (SDAFL) model and exhibit the standard deviation results to prove the detection stability of SDAFL. Other deep learning-based methods [34,39,40] do not give the standard deviation results. To study the detection stability of TSCNTD, we repeat the experiments of TSCNTD ten times and find the stability is unsatisfactory, as shown in Table 5. This paper pays attention to the detection stability and improves it with a classical machine learning method, ensemble learning. The detection ensemble method improves both the stability and performance but introduces computation.
For a Given HSI and prior spectrum, non-learning methods will give specific detection results, such as SAM, CEM, MF, ACE, HCEM, and ECEM. Our proposed Siamese detector outperforms in performance with the help of neural networks' excellent feature extraction capability. However, the parameters initialization of the neural networks introduces fluctuation in performance. Therefore, the repeatability of non-learning methods is better than TSCNTD and SFCTD.

Conclusions
This paper proposes a Siamese fully connected network-based hyperspectral target detector, denoted as SFCTD, consisting of two nonlinear feature extraction modules (NFEMs) and a cosine angle distance-based classifier. Two Siamese structured NFEMs share the parameters and extract the discriminative features of prior and test spectra, respectively. The cosine angle distances of spectral feature pairs measure the confidence of test spectra regarded as the target. The SFCTD is effectively optimized by the generated pseudo positive and negative spectral pairs. To mitigate the performance fluctuation caused by the random initialization of parameters, we parallelly optimize several SFCTD, and the network ensembles are more stable than single SFCTD. Experiment results validate that the proposed SFCTD outperforms non-learning detectors in performance, such as SAM, MF, CEM. In speed and performance, the SFCTD is superior to the similar supervised learning method, TSCNTD. The SFCTD has a lightweight structure and achieves an excellent trade-off between detection accuracy and computational cost, suitable for conditions with insufficient preliminary data. In the future, we will investigate the SFCTD's capability to detect the same targets in similar HSIs with different imaging conditions. Author Contributions: X.Z. and J.W. conceived and designed the study. X.Z. constructed the model, implemented the experiments, and drafted the manuscript. Z.H. and H.W. contributed to improving the manuscript, and P.W. collected the hyperspectral data sets. K.G. provided the overall guidance to this work and reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.