Camouﬂaged Target Detection Based on Snapshot Multispectral Imaging

: The spectral information contained in the hyperspectral images (HSI) distinguishes the intrinsic properties of a target from the background, which is widely used in remote sensing. However, the low imaging speed and high data redundancy caused by the high spectral resolution of imaging spectrometers limit their application in scenarios with the real-time requirement. In this work, we achieve the precise detection of camouﬂaged targets based on snapshot multispectral imaging technology and band selection methods in urban-related scenes. Speciﬁcally, the camouﬂaged target detection algorithm combines the constrained energy minimization (CEM) algorithm and the improved maximum between-class variance (OTSU) algorithm (t-OTSU), which is proposed to obtain the initial target detection results and adaptively segment the target region. Moreover, an object region extraction (ORE) algorithm is proposed to obtain a complete target contour that improves the target detection capability of multispectral images (MSI). The experimental results show that the proposed algorithm has the ability to detect different camouﬂaged targets by using only four bands. The detection accuracy is above 99%, and the false alarm rate is below 0.2%. The research achieves the effective detection of camouﬂaged targets and has the potential to provide a new means for real-time multispectral sensing in complex scenes.


Introduction
Hyperspectral target detection is widely used in industry, agriculture, and urban remote sensing [1]. Hyperspectral images (HSI) are often presented as spectral data cubes measured for each pixel as a spectral vector. The elements in the spectral vector correspond to the reflectance or radiation value in different spectral bands [2]. Therefore, it is possible to detect the camouflaged targets based on the spectral characteristic of the target materials [3]. Manolakis, Shaw, and other researchers [4][5][6][7] in MIT's Lincoln laboratory summarized the detection algorithms that exploit spectral information and argued that the "apparent" superiority of sophisticated algorithms with simulated data or in laboratory conditions did not necessarily translate to superiority in real-world applications. From the perspective of military defense and reconnaissance applications, they proposed to improve the performance of detection algorithms by solving the problem of the inherent variability target and background spectra (i.e., the mismatch between the spectral library and in-scene signatures) [5,7]. In addition, many other scholars made relevant contributions. Yan [8] and Hua [9] et al. successfully used the constrained energy minimization (CEM) [10] algorithm and the adaptive coherence estimator (ACE) [6] algorithm to detect the camouflaged targets in real-world scenarios. Kumar et al. [11] used unsupervised target detection algorithms, including the K-means classification method, the Reed-Xiaoli (RX) algorithm, and the Iterative Self Organizing Data Analysis Technique Algorithm (ISODATA) to detect various camouflaged targets in mid-wave infrared HSIs.
However, the slow imaging speed of currently available spectral imagers and the "Hughes phenomenon", which is induced due to the high dimensionality of data, limit the usage of HSI in applications with a high real-time requirement [12]. To address the problem of data redundancy in HSI, many researchers worked on dimensionality reduction (band selection) in HSI [13][14][15][16]. The main idea of band selection is to identify a representative set of bands among the spectral components with high correlation. The optimal clustering framework (OCF) [14], the multi-objective optimization band selection (MOBS) method [15], and the scalable one-pass self-representation learning (SOP-SRL) framework [16] are effective band selection methods.
Different from preserving the spectral features of substances through band selection, some researchers [17][18][19][20] focused on including the spatial features in the image analysis, which can be used for modeling the objects in the scene and increasing the discriminability between different thematic classes. Kwan and Ayhan et al. [21,22] demonstrated the effectiveness of Extended Multi-Attribute Profiles (EMAP) through extensive experiments. They also applied the Convolutional Neural Network (CNN)-based deep learning algorithms to spectral images enhanced with the EMAP, which achieve better land cover classification performance using only four bands as compared to that using all 144 hyperspectral bands [22]. However, this method generated additional augmented bands, consuming large amounts of computing resources and increasing the running time [21,22].
Additionally, the data redundancy problem also can be solved by determining several characteristic bands. Zhang [23], Liu [24], and Tian [25] et al. experimentally determined the characteristic bands of jungle camouflaged materials and white snow camouflaged materials as 680-720 nm and 330-380 nm, respectively. Afterward, the rapid identification of camouflaged targets was achieved with a common camera and the corresponding spectral filters. This method has high real-time performance and simple implementation. However, it requires more necessary prior knowledge to successfully detect specific targets and has narrow applications.
In order to achieve real-time spectral imaging, a filter array consisting of interferometric devices in multispectral imagers was designed to obtain spectral images containing corresponding bands [26,27]. A snapshot multispectral camera was manufactured by XIMEA, in conjunction with the Interuniversity Microelectronics Centre (IMEC) to quickly obtain multispectral images (MSIs). It integrates the pixel-level Fabry-Pérot filter mosaic array on the CMOS chip of an existing industrial camera, which improves the imaging speed. Contrary to the traditional multispectral imaging equipment, which obtains MSI at discrete time points, this snapshot multispectral camera extends spectral imaging to a broader field, such as dynamic video [28]. However, it is still a challenging problem to accurately detect the targets by limited multispectral bands.
Usually, a balance between the detection accuracy and the characteristic band numbers is of great significance in the application of targets detection in urban scenarios. In this work, based on the snapshot spectral imaging, we introduce the OCF algorithm to determine several general-purpose bands for the detection of camouflaged targets, so as to reduce the consumption of computing resources. We also propose a camouflaged target detection method based on CEM and an improved maximum between-class variance (OTSU) algorithm (t-OTSU). The CEM algorithm requires only prior spectral information to efficiently detect different camouflaged targets with low computational resource consumption, which applies well to our requirement. The proposed t-OTSU algorithm that overcomes the defect of the traditional OTSU algorithm is used to adaptively segment the target region from the output of the CEM detector in different scenarios. To eliminate the influence of misjudged regions contiguous to the target region, the object region extraction (ORE) algorithm is also proposed to obtain a complete contour of a camouflaged target.

Methodology
In this section, a detailed introduction of the proposed algorithm is presented. Figure 1 shows the flow chart of the proposed algorithm.

CEM Detector
The CEM [10] detector is equivalent to an adaptive filter to pass the desired target with a specific gain, while the filter output resulting from unknown signal sources can be minimized, which can be used in camouflaged target reconnaissance. The MSI with N spectral vectors and D bands is expressed as a DN  , is implemented based on the MSI X and the prior known target spectrum vector d . Therefore, the output of CEM is described as the inner product of the spectral vector x and the FIR filter W as: The average energy output is expressed as:

Calibration
The snapshot multispectral camera (MQ022HG-IM-SM5X5-NIR, XIMEA) with a spectral range of 670-975 nm is used to acquire the raw multispectral images (I raw ). Then, the raw images (I raw ) are corrected by using the white and dark references to reduce the spatial intensity variation of light and the dark current effect of the CMOS camera. The corrected multispectral image (I MSI ) can be calculated as follows: where I black_re f and I white_re f represent a dark and a white reference, respectively.

CEM Detector
The CEM [10] detector is equivalent to an adaptive filter to pass the desired target with a specific gain, while the filter output resulting from unknown signal sources can be minimized, which can be used in camouflaged target reconnaissance. The MSI with N spectral vectors and D bands is expressed as a D × N matrix X = [x 1 , x 2 , x 3 , . . . , x N ], and the spectrum of the target is represented as a D × 1 vector d. The main idea of the CEM algorithm, i.e., designing a finite impulse response (FIR) linear filter W = [ω 1 , ω 2 , . . . , ω N ] T , is implemented based on the MSI X and the prior known target spectrum vector d. Therefore, the output of CEM is described as the inner product of the spectral vector x and the FIR filter W as: The average energy output is expressed as: where R = (1/N)XX T denotes the correlation matrix of the MSI and y = [y 1 , y 2 , . . . , y N ] represents the output of the CEM detector. By highlighting the target pixel while suppressing the "energy" of the background pixel, the CEM detector in the spectral space can separate the target from the background.
The filter is obtained as: The resultant target detection is expressed by the following mathematical expression.
After CEM treatment, the target is separated by using a threshold segmentation [8]. However, we observe that the results are significantly affected by the spectrum resolution. It is difficult to obtain a complete camouflaged target when the spectral bands are significantly reduced. To effectively address this problem, we propose an improved OTSU algorithm (t-OTSU).

Improved OTSU Algorithm (t-OTSU)
The traditional OTSU is an adaptive threshold segmentation algorithm, which is sensitive to the object size [29,30]. It is noteworthy that in the long-distance reconnaissance scenes, the proportion of the target pixels is much smaller than the background. Therefore, a large number of interferential areas are generated when OTSU is directly applied to the output of the CEM detector. The t-OTSU algorithm was proposed to eliminate the influence of object size. At the same time, by setting the value of the parameter t, we can avoid the phenomenon of false detection in the scene without camouflage target, to alert us to change the scene to search for targets.
Based on the traditional OTSU algorithm, we set the parameter t to determine the final adaptive threshold. We divide the pixels in the output of the CEM detector into two categories (i.e., C 0 and C 1 ) based on a threshold T (1 ≤ T ≤ M), where the output of the CEM detector (I) is represented in M gray levels. C 0 represents the set of background pixels, and C 1 represents the target. The probability of background and target pixels is p 1 and p 2 , respectively. This is mathematically expressed by the following expression.
where p i denotes the probability of the i th gray-level pixels, and where n i denotes the number of pixels at the i th gray level and N (N = n 1 + n 2 + . . . + n M ) represents the total number of pixels. Suppose m 1 and m 2 denote the mean values of the two categories, respectively; then, the overall mean m G of the image is expressed as: Similar to variance, the between-class variance is computed by the following relation.
Based on Equations (7) and (8), the σ 2 is further expressed as: The optimal threshold T * is defined as: The parameter t is set to limit the value T to avoid the interference caused by the target size. The final result is expressed as follows: The proposed t-OTSU algorithm is designed to segment the target region while minimizing the misclassified background pixels. However, there are still some misclassified regions around the target region. To solve this problem, we propose the following object region extraction (ORE) algorithm.

Object Region Extraction (ORE)
The opening operator can be used to separate the regular target region and the connected cluttered background areas, which does not affect the contour integrity of the target [31]. Thus, the ORE algorithm based on morphological operations [32] is proposed to eliminate the interferential regions. In our work, we use a disk structural element of radius 1 to separate the target region from the connected cluttered background area without affecting the contour integrity of the target.
The morphological "opening" is the combination of "erosion" and "dilation", which is utilized to smooth the contours of large object regions while eliminating small free regions.
The erosion is similar to the convolution operation. We slide the structural element (SE) on the binary image I t−OTSU , and the minimum value of the pixels in the overlapping area of the SE and I t−OTSU is assigned to the corresponding anchor point position as the output value. This is expressed as follows: where (x, y) denotes the corresponding position of anchor points, which is the center of SE. Contrary, in dilation, the maximum value of pixels in overlapping areas is used as the output of the corresponding anchor point position as: The morphological opening is the superposition of erosion and dilation and is expressed as follows: The total number of independent regions in the binary image (I o ), denoted as A, is determined based on the "n-Neighborhood" theory. Then, the regions are marked as R i (i = 1, 2, 3, . . . , A). Then, we record the number of the salient points within each area, Remote Sens. 2021, 13, 3949 6 of 23 and the area R J with the most pixels is selected. The salient points in the region R J are retained, whereas the rest of the areas are excluded to obtain a complete object region without interferential areas.

Evaluation Metrics
We use the false alarm (F a ), accuracy (A C ), and F 1 measure (F 1 ) as the evaluation metrics to quantitatively evaluate the performance of algorithms [33,34], which are expressed as follows: where N TN and N FP denote the number of background pixels that are correctly and incorrectly classified, respectively. N TP denotes the number of target pixels classified correctly, and N FN denotes the number of misclassified target pixels. F 1 is mathematically expressed as: where P re and R call represent the precision and recall, respectively. These quantities are mathematically expressed as: Now, the F 1 is simplified as: The ideal value of the index A C , which indicates that the proportion of correctly classified pixels is 1. Contrary, F a is a negative evaluation index, which is expected to approach 0 indefinitely. The F 1 is similar to A C ; however, it is considered more accurate.

Experimental Scenarios
We obtain the original images by using a snapshot multispectral imager (MQ022HG-IM-SM5X5-NIR, XIMEA), as depicted in Figure  We select the anti-spectral reconnaissance camouflaged net (ASRC-net) with a small mosaic pattern and the anti-infrared reconnaissance camouflaged net (AIRC-net) with a deformed camouflaged pattern as the target, as shown in Figure 2c,d, respectively. We used a hyperspectral camera (GaiaField-V10E, Dualix) with a spectral response range of 400-1000 nm and a spectral resolution of 2.8 nm to obtain the spectra shown in Figure 3. It should be noted that the target spectrum is affected by many factors such as seasons, vegetation growth status, and light conditions. Figure 3 shows that the camouflage characteristics of the two camouflaged nets are significantly different even under the same conditions. In the range of 660-975 nm (the spectral range of the snapshot multispectral camera), the spectrum of the ASRC-net is close to that of the vegetation, which increases the difficulty of camouflaged target detection. The AIRC-net effectively reduces the radiation of the targets in the near-infrared band, which has the ability of anti-infrared reconnaissance. Therefore, we use these two camouflaged nets to fully verify the feasibility of the proposed algorithm.  We select the anti-spectral reconnaissance camouflaged net (ASRC-net) with a small mosaic pattern and the anti-infrared reconnaissance camouflaged net (AIRC-net) with a deformed camouflaged pattern as the target, as shown in Figure 2c,d, respectively. We used a hyperspectral camera (GaiaField-V10E, Dualix) with a spectral response range of 400-1000 nm and a spectral resolution of 2.8 nm to obtain the spectra shown in Figure 3. It should be noted that the target spectrum is affected by many factors such as seasons, vegetation growth status, and light conditions. Figure 3 shows that the camouflage characteristics of the two camouflaged nets are significantly different even under the same conditions. In the range of 660-975 nm (the spectral range of the snapshot multispectral camera), the spectrum of the ASRC-net is close to that of the vegetation, which increases the difficulty of camouflaged target detection. The AIRC-net effectively reduces the radiation of the targets in the near-infrared band, which has the ability of anti-infrared reconnaissance. Therefore, we use these two camouflaged nets to fully verify the feasibility of the proposed algorithm.
acteristics of the two camouflaged nets are significantly different even under the same conditions. In the range of 660-975 nm (the spectral range of the snapshot multispectral camera), the spectrum of the ASRC-net is close to that of the vegetation, which increases the difficulty of camouflaged target detection. The AIRC-net effectively reduces the radiation of the targets in the near-infrared band, which has the ability of anti-infrared reconnaissance. Therefore, we use these two camouflaged nets to fully verify the feasibility of the proposed algorithm.  To verify the feasibility of the proposed algorithms, we focus on several urban-related scenarios where camouflaged nets appear frequently but are difficult to detect, i.e., (i) Lawn, (ii) Bush and Tree (BT), (iii) Bush and Fountain (BF), and (iv) Unmanned Aerial Vehicle (UAV) scene, as shown in Table 1. The experiment was carried out under different light intensities (1370-19,648 lux) to study the effect of spectrum changes caused by light intensity on the algorithm performance. The detection distance is around 20-50 m. In addition, the focal lengths of 35 and 16 mm are applied in BF to generate a different field of view (FOV) at the same imaging distance. The integration time varies from 0.1540 to 5 ms. To verify the feasibility of the proposed algorithms, we focus on several urban-related scenarios where camouflaged nets appear frequently but are difficult to detect, i.e., (i) Lawn, (ii) Bush and Tree (BT), (iii) Bush and Fountain (BF), and (iv) Unmanned Aerial Vehicle (UAV) scene, as shown in Table 1. The experiment was carried out under different light intensities (1370-19,648 lux) to study the effect of spectrum changes caused by light intensity on the algorithm performance. The detection distance is around 20-50 m. In addition, the focal lengths of 35 and 16 mm are applied in BF to generate a different field of view (FOV) at the same imaging distance. The integration time varies from 0.1540 to 5 ms.

Results of the Band Selection
Data redundancy is a critical issue in MSI. To address this problem, a band selection algorithm named OCF is introduced. The OCF [14] constructs an optimal cluster structure on HSI, and the discriminative information between the bands is evaluated based on a cluster ranking strategy for the achieved structure to determine the optimal band combination. Figure 4 shows the flow chart of the OCF algorithm.

Results of the Band Selection
Data redundancy is a critical issue in MSI. To address this problem, a band selection algorithm named OCF is introduced. The OCF [14] constructs an optimal cluster structure on HSI, and the discriminative information between the bands is evaluated based on a cluster ranking strategy for the achieved structure to determine the optimal band combination. Figure 4 shows the flow chart of the OCF algorithm. As shown in Table 2, we determine a series of subsets of bands, where the number of bands ranged from 4 to 25. In each band subset, the specific bands are selected by ranking the occurrence frequency of the characteristic bands. The above characteristic bands are obtained by applying the OCF algorithm to the abundant data of the first three urban scenarios.

Compared Methods
The effectiveness of the proposed methods is evaluated by comparing the proposed algorithm with six commonly applied algorithms, namely ACE-T, CEM-T, HCEM-T, ACE-OTSU, CEM-OTSU, and HCEM-OTSU. The ACE-T, CEM-T, and HCEM-T are the combination of the ACE algorithm [6], the CEM algorithm [10], the HCEM algorithm [3], and a fixed threshold T, respectively. In addition, ACE-OTSU, CEM-OTSU, and HCEM- As shown in Table 2, we determine a series of subsets of bands, where the number of bands ranged from 4 to 25. In each band subset, the specific bands are selected by ranking the occurrence frequency of the characteristic bands. The above characteristic bands are obtained by applying the OCF algorithm to the abundant data of the first three urban scenarios.

Compared Methods
The effectiveness of the proposed methods is evaluated by comparing the proposed algorithm with six commonly applied algorithms, namely ACE-T, CEM-T, HCEM-T, ACE-OTSU, CEM-OTSU, and HCEM-OTSU. The ACE-T, CEM-T, and HCEM-T are the combination of the ACE algorithm [6], the CEM algorithm [10], the HCEM algorithm [3], and a fixed threshold T, respectively. In addition, ACE-OTSU, CEM-OTSU, and HCEM-OTSU are the combinations of OTSU [29] and the three previously mentioned algorithms, respectively. To analyze whether the performance improvement of the algorithm is brought by the t-OTSU algorithm or the ORE algorithm, we also combine CEM with t-OTSU as an additional comparison. In the HCEM detector, the CEM detector is used as a basic detector in each layer. The spectrum of a target is iteratively updated based on the output of the upper layer. Meanwhile, the parameter t of the t-OTSU algorithm is set to 0.3, while the fixed threshold T is set to 0.5. The remaining parameters of HCEM are kept as the default. It should be noted that the input spectrums we use in the algorithms are obtained by the snapshot multispectral camera. Due to the diversity of colors of two camouflaged nets, we use the average spectrums of multiple regions on their surfaces as input spectrums of the algorithms.
The data were processed with the Matlab 2018 under a win 10 system on a desktop computer with an i5 six-core CPU and 16GB memory.

Experimental Results with the Lawn Scene
As shown in Figure 5a, the ASRC-net and the AIRC-net are spread on a lawn with sufficient light. The false-color image (Figure 5b) shows that the AIRC-net can be easily Remote Sens. 2021, 13, 3949 9 of 23 identified from the background of the lawn. On the other hand, the special design of the ASRC-net makes it barely distinguishes from the lawn. The reference images of the ASRC-net and AIRC-net are shown in Figure 5c,d, respectively.
spectrums of the algorithms.
The data were processed with the Matlab 2018 under a win 10 system on a desktop computer with an i5 six-core CPU and 16GB memory.

Experimental Results with the Lawn Scene
As shown in Figure 5a, the ASRC-net and the AIRC-net are spread on a lawn with sufficient light. The false-color image (Figure 5b) shows that the AIRC-net can be easily identified from the background of the lawn. On the other hand, the special design of the ASRC-net makes it barely distinguishes from the lawn. The reference images of the ASRCnet and AIRC-net are shown in Figure 5c and Figure 5d, respectively. The detection results of the ASRC-net and AIRC-net are exhibited in Figures 6 and 7, respectively. It is notable that as the number of bands decreases, the misjudged areas all increase with all five comparison algorithms except HCEM-T. With inadequate band numbers, the contour of the target is unclear with the algorithm of HCEM-T, which affects our judgment of the target area. As discussed earlier, the performance of the OTSU algorithm is influenced by the target size [30]. When the target is relatively small, a large number of background pixels are misclassified as the target, which was the reason for a large number of misjudged areas in the detection results of HCEM-OTSU, CEM-OTSU, and ACE-OTSU algorithms. However, the improved t-OTSU algorithm can avoid this phenomenon well, as can be seen from the ASRC-net detection results of CEM-t-OTSU ( Figure  6). It is noteworthy that the performance of the recently proposed HCEM is even worse than the original CEM algorithm. The main reason is that based on the principle of HCEM, the reweighted target spectrum in each layer gradually deviates from the real target spectrum. It is possible to exhibit better performance when the target spectra occupy a larger proportion of data [1].
Wang [10] et al. discussed that the decrease in spectral resolution caused by the reduction of bands has a serious impact on the performance of comparison methods. Figure  8 (ASRC-net) and Figure 9 (AIRC-net) compare the C A , a F , and 1 F corresponding to the detection results of the seven algorithms, respectively. As compared with several other algorithms, the proposed algorithm maintains relatively high target contour integrity due to the combination of the improved t-OTSU algorithm and the ORE algorithm. When the number of bands reduces from 25 to 4, the C A values corresponding to the ASRC-net The detection results of the ASRC-net and AIRC-net are exhibited in Figures 6 and 7, respectively. It is notable that as the number of bands decreases, the misjudged areas all increase with all five comparison algorithms except HCEM-T. With inadequate band numbers, the contour of the target is unclear with the algorithm of HCEM-T, which affects our judgment of the target area. As discussed earlier, the performance of the OTSU algorithm is influenced by the target size [30]. When the target is relatively small, a large number of background pixels are misclassified as the target, which was the reason for a large number of misjudged areas in the detection results of HCEM-OTSU, CEM-OTSU, and ACE-OTSU algorithms. However, the improved t-OTSU algorithm can avoid this phenomenon well, as can be seen from the ASRC-net detection results of CEM-t-OTSU ( Figure 6). It is noteworthy that the performance of the recently proposed HCEM is even worse than the original CEM algorithm. The main reason is that based on the principle of HCEM, the reweighted target spectrum in each layer gradually deviates from the real target spectrum. It is possible to exhibit better performance when the target spectra occupy a larger proportion of data [1].
Wang [10] et al. discussed that the decrease in spectral resolution caused by the reduction of bands has a serious impact on the performance of comparison methods. Figure 8 (ASRC-net) and Figure 9 (AIRC-net) compare the A C , F a , and F 1 corresponding to the detection results of the seven algorithms, respectively. As compared with several other algorithms, the proposed algorithm maintains relatively high target contour integrity due to the combination of the improved t-OTSU algorithm and the ORE algorithm. When the number of bands reduces from 25 to 4, the A C values corresponding to the ASRC-net and AIRC-net are above 0.996 and 0.995, respectively, which are higher than the compared algorithms. The F a values corresponding to the proposed algorithm are below 0.0015, which is slightly higher than the HCEM-T algorithm. The F 1 value considers both the precision and the recall, which reflects the performance of the camouflaged target detection algorithm more comprehensively [33]. When the number of bands is above 5, the F 1 value corresponding to the ASRC-net result obtained by the proposed algorithm stays above 0.9. It has to be pointed out that the highest value corresponding to other algorithms is only 0.81 (CEM-T), as presented in Figure 8c. When the proposed algorithm is applied to detect AIRC-net in MSIs with four bands, the result F 1 is around 0.91. The specific analysis of detection results and quantitative metrics shows that in a simple urban background (Lawn) with sufficient light (light intensity: 19,618 lux), the proposed method shows good performance and robustness in detecting different camouflaged targets. above 0.9. It has to be pointed out that the highest value corresponding to other algorithms is only 0.81 (CEM-T), as presented in Figure 8c. When the proposed algorithm is applied to detect AIRC-net in MSIs with four bands, the result 1 F is around 0.91. The specific analysis of detection results and quantitative metrics shows that in a simple urban background (Lawn) with sufficient light (light intensity: 19,618 lux), the proposed method shows good performance and robustness in detecting different camouflaged targets.

Experimental Results in the BT Scene
As compared to the Lawn scene, the background of the BT scene in the city is more complicated, as it includes bushes (with both brown and green color), trees, lawn, and building. The positions of the two camouflaged nets are marked in Figure 10a. In order to distinguish the targets from the background, a corresponding false-color image is provided in Figure 10b. The reference images of camouflaged targets are shown in Figure 10c,d, respectively. Figure 11 (ASRC-net) and Figure 12 (AIRC-net) show that the performance of our algorithm is barely affected in the BT scenes in insufficient light conditions, and complete target contours are obtained in the MSI of different bands. However, we notice that when the number of bands is reduced to four, the detection results of the AIRC-net are greatly affected. A possible reason is that imaging noise caused by poor lighting conditions aggravated the mismatch between the prior spectrum and the in-scene signature, causing a degradation in the performance of the algorithms [5,7]. In this experimental scenario, we find out that the optimization of the t-OTSU algorithm is not obvious. This is because the targets account for a large proportion and the difference between the thresholds obtained by the t-OTSU algorithm and those obtained by the OTSU algorithm is small. Remote Sens. 2021, 13, x FOR PEER REVIEW 11 of 23   As can be seen in the comparison of evaluation metrics in Figure 13 (ASRC-net) and Figure 14 (AIRC-net), the performance of several compared algorithms (CEM-T, HCEM-T, CEM-OTSU, and HCEM-OTSU) are relatively stable when the number of bands ranges from seven to 25. The CEM-t-OTSU and CEM-OTSU algorithms also exhibit very similar performance due to the similarity of the thresholds. However, the ACE algorithm is more sensitive to the change of bands number. On the contrary, our algorithm maintains a stable performance, as the A C and F 1 remain above 0.99 and 0.85, respectively, and the F a remains stable around 0.0015. By analyzing the results in the BT scene, our algorithm can well overcome the impact of imaging noise caused by the insufficient light intensity (1370 lux).

Experimental Results in the BT Scene
As compared to the Lawn scene, the background of the BT scene in the city is more complicated, as it includes bushes (with both brown and green color), trees, lawn, and building. The positions of the two camouflaged nets are marked in Figure 10a. In order to distinguish the targets from the background, a corresponding false-color image is provided in Figure 10b. The reference images of camouflaged targets are shown in Figure  10c,d, respectively.   Figure 12 (AIRC-net) show that the performance of our algorithm is barely affected in the BT scenes in insufficient light conditions, and complete target contours are obtained in the MSI of different bands. However, we notice that when the number of bands is reduced to four, the detection results of the AIRC-net are greatly affected. A possible reason is that imaging noise caused by poor lighting conditions aggravated the mismatch between the prior spectrum and the in-scene signature, causing a degradation in the performance of the algorithms [5,7]. In this experimental scenario, we find out that the optimization of the t-OTSU algorithm is not obvious. This is because the targets account for a large proportion and the difference between the thresholds obtained by the t-OTSU algorithm and those obtained by the OTSU algorithm is small.

Experimental Results in the BT Scene
As compared to the Lawn scene, the background of the BT scene in the city is more complicated, as it includes bushes (with both brown and green color), trees, lawn, and building. The positions of the two camouflaged nets are marked in Figure 10a. In order to distinguish the targets from the background, a corresponding false-color image is provided in Figure 10b. The reference images of camouflaged targets are shown in Figure  10c,d, respectively.   Figure 12 (AIRC-net) show that the performance of our algorithm is barely affected in the BT scenes in insufficient light conditions, and complete target contours are obtained in the MSI of different bands. However, we notice that when the number of bands is reduced to four, the detection results of the AIRC-net are greatly affected. A possible reason is that imaging noise caused by poor lighting conditions aggravated the mismatch between the prior spectrum and the in-scene signature, causing a degradation in the performance of the algorithms [5,7]. In this experimental scenario, we find out that the optimization of the t-OTSU algorithm is not obvious. This is because the targets account for a large proportion and the difference between the thresholds obtained by the t-OTSU algorithm and those obtained by the OTSU algorithm is small.

Experimental Results in the BF Scene
To explore the effectiveness of the proposed algorithm in detecting the camouflaged targets at different distances, we use lenses of 16 mm and 35 mm focal lengths to simulate distance variations in the urban BF scene containing people, roads, fountains, and green and brown bushes, in which the ASRC-net is set as a target. As compared with the image acquired by a 16 mm focal length lens shown in Figure 15a, the target occupies a much larger proportion in the image acquired by a 35 mm focal length lens shown in Figure 15c. In addition, the reference images of the target for different focal lengths are shown in Figure 15b,d.
The detection results of different methods are shown in Figure 16 (35 mm) and Figure 17 (16 mm), respectively. The results show that the complete target contours in MSI of different focal lengths are detected by using the proposed method. However, since the OTSU algorithm is very sensitive to the target size [23], a large number of misjudged areas are generated by CEM-OTSU, ACE-OTSU, and HCEM-OTSU. The situation is even worse with a 16 mm focal length than with a 35 mm focal length. There is no doubt that the optimization effect of the t-OTSU algorithm is more prominent in images at 16 mm focal length than at 35 mm focal length.
The performance of the eight methods at different focal lengths are shown in Figure 18 (35 mm) and Figure 19 (16 mm), respectively. The robustness of the proposed method is shown by comparing Figures 18 and 19. The A C is stable and above 0.995, F a is around 0.001, while F 1 stays above 0.93. In the MSIs of the 35 mm focal length, the overall performance of several comparison methods fluctuated greatly with the decrease in bands. However, in the images of the 16 mm focal length, the overall performance of several comparison methods showed a monotonically decreasing trend without fluctuations, which indicated that these methods were more influenced by the imaging distance. It is noteworthy that there is a great improvement in the performance of HCEM-T when the number of bands is reduced from 25 to 12. We also see similar phenomena in the work of other researchers. For instance, Christian [35] pointed out that the principal component analysis (PCA) method used fewer bands, but still obtaining better performance than using all bands. It is also noted in the literature [36] that the improved sparse subspace clustering (ISSC) and linear prediction (LP) methods also used fewer bands and yet obtained higher accuracy than using all bands. However, we may need additional theoretical studies to fully explain the above observations. As can be seen in the comparison of evaluation metrics in Figure 13 (ASRC-net) and Figure 14 (AIRC-net), the performance of several compared algorithms (CEM-T, HCEM-T, CEM-OTSU, and HCEM-OTSU) are relatively stable when the number of bands ranges from seven to 25. The CEM-t-OTSU and CEM-OTSU algorithms also exhibit very similar performance due to the similarity of the thresholds. However, the ACE algorithm is more sensitive to the change of bands number. On the contrary, our algorithm maintains a stable performance, as the C A and 1 F remain above 0.99 and 0.85, respectively, and the a F remains stable around 0.0015. By analyzing the results in the BT scene, our algorithm can well overcome the impact of imaging noise caused by the insufficient light intensity (1370 lux). In the BF scene, the proposed method detects the complete target contour at different distances, which demonstrates the robustness of the proposed method to variations of distance.

Experimental Results in the UAV Scene
To verify the versatility of our proposed method, we used a UAV-mounted snapshot multispectral camera to capture two camouflaged targets in a complex scene at an altitude of 50 m. As shown in Figure 20a, the scene contains roads, trees, grass, a pool, and other objects. The camouflaged tent (CT) is used to simulate illegal buildings, and the vehicle covered with the camouflaged net is used to simulate camouflaged vehicles (CV). These two targets are covered by the AIRC-net. In addition, the reference images of the two targets are shown in Figure 20c,d. The previous experiments prove that the performance of the CEM algorithm is better than the other two comparison algorithms (ACE and HCEM), so we only take the CEM algorithm as a comparison in this scenario. It should be noted that in this scenario, the CEM algorithm is combined with a fixed threshold of 0.7, and the value of the parameter t of the t-OTSU algorithm is set to 0.65.

Experimental Results in the BF Scene
To explore the effectiveness of the proposed algorithm in detecting the camouflaged targets at different distances, we use lenses of 16 mm and 35 mm focal lengths to simulate distance variations in the urban BF scene containing people, roads, fountains, and green and brown bushes, in which the ASRC-net is set as a target. As compared with the image acquired by a 16 mm focal length lens shown in Figure 15a, the target occupies a much larger proportion in the image acquired by a 35 mm focal length lens shown in Figure 15c. In addition, the reference images of the target for different focal lengths are shown in Figure 15b,d. The detection results of different methods are shown in Figure 16 (35 mm) and Figure  17 (16 mm), respectively. The results show that the complete target contours in MSI of different focal lengths are detected by using the proposed method. However, since the OTSU algorithm is very sensitive to the target size [23], a large number of misjudged areas are generated by CEM-OTSU, ACE-OTSU, and HCEM-OTSU. The situation is even worse with a 16 mm focal length than with a 35 mm focal length. There is no doubt that the optimization effect of the t-OTSU algorithm is more prominent in images at 16 mm focal length than at 35 mm focal length.
The performance of the eight methods at different focal lengths are shown in Figure  18 (35 mm) and Figure 19 (16 mm), respectively. The robustness of the proposed method is shown by comparing Figures 18 and 19. The C A is stable and above 0.995, a F is around 0.001, while 1 F stays above 0.93. In the MSIs of the 35 mm focal length, the overall performance of several comparison methods fluctuated greatly with the decrease in bands. However, in the images of the 16 mm focal length, the overall performance of several comparison methods showed a monotonically decreasing trend without fluctuations, which indicated that these methods were more influenced by the imaging distance. It is noteworthy that there is a great improvement in the performance of HCEM-T when the number of bands is reduced from 25 to 12. We also see similar phenomena in the work of

Experimental Results in the BF Scene
To explore the effectiveness of the proposed algorithm in detecting the camouflaged targets at different distances, we use lenses of 16 mm and 35 mm focal lengths to simulate distance variations in the urban BF scene containing people, roads, fountains, and green and brown bushes, in which the ASRC-net is set as a target. As compared with the image acquired by a 16 mm focal length lens shown in Figure 15a, the target occupies a much larger proportion in the image acquired by a 35 mm focal length lens shown in Figure 15c. In addition, the reference images of the target for different focal lengths are shown in Figure 15b,d. The detection results of different methods are shown in Figure 16 (35 mm) and Figure  17 (16 mm), respectively. The results show that the complete target contours in MSI of different focal lengths are detected by using the proposed method. However, since the OTSU algorithm is very sensitive to the target size [23], a large number of misjudged areas are generated by CEM-OTSU, ACE-OTSU, and HCEM-OTSU. The situation is even worse with a 16 mm focal length than with a 35 mm focal length. There is no doubt that the optimization effect of the t-OTSU algorithm is more prominent in images at 16 mm focal length than at 35 mm focal length.
The performance of the eight methods at different focal lengths are shown in Figure  18 (35 mm) and Figure 19 (16 mm), respectively. The robustness of the proposed method is shown by comparing Figures 18 and 19. The C A is stable and above 0.995, a F is around 0.001, while 1 F stays above 0.93. In the MSIs of the 35 mm focal length, the overall performance of several comparison methods fluctuated greatly with the decrease in bands. However, in the images of the 16 mm focal length, the overall performance of several comparison methods showed a monotonically decreasing trend without fluctuations, which indicated that these methods were more influenced by the imaging distance. It is noteworthy that there is a great improvement in the performance of HCEM-T when the number of bands is reduced from 25 to 12. We also see similar phenomena in the work of other researchers. For instance, Christian [35] pointed out that the principal component analysis (PCA) method used fewer bands, but still obtaining better performance than using all bands. It is also noted in the literature [36] that the improved sparse subspace clustering (ISSC) and linear prediction (LP) methods also used fewer bands and yet obtained higher accuracy than using all bands. However, we may need additional theoretical studies to fully explain the above observations. In the BF scene, the proposed method detects the complete target contour at different distances, which demonstrates the robustness of the proposed method to variations of distance.    The detection results of the two targets are shown in Figure 21a,b, respectively. Compared with the previous scenes, such as the Lawn scene and the BT scene, the performance of the CEM algorithm significantly declines in this scenario, and it is difficult to identify the position of the camouflaged targets. In our method, the completeness of the contour decreases as the bands decrease, but the position of the camouflaged targets can still be easily distinguished. When the number of bands reduces to four, the performance of the proposed algorithm degrades greatly.

Experimental Results in the UAV Scene
To verify the versatility of our proposed method, we used a UAV-mounted snapshot multispectral camera to capture two camouflaged targets in a complex scene at an altitude of 50 m. As shown in Figure 20a, the scene contains roads, trees, grass, a pool, and other objects. The camouflaged tent (CT) is used to simulate illegal buildings, and the vehicle covered with the camouflaged net is used to simulate camouflaged vehicles (CV). These two targets are covered by the AIRC-net. In addition, the reference images of the two targets are shown in Figure 20c and Figure 20d. The previous experiments prove that the performance of the CEM algorithm is better than the other two comparison algorithms (ACE and HCEM), so we only take the CEM algorithm as a comparison in this scenario. It should be noted that in this scenario, the CEM algorithm is combined with a fixed threshold of 0.7, and the value of the parameter t of the t-OTSU algorithm is set to 0.65.

Experimental Results in the UAV Scene
To verify the versatility of our proposed method, we used a UAV-mounted snapshot multispectral camera to capture two camouflaged targets in a complex scene at an altitude of 50 m. As shown in Figure 20a, the scene contains roads, trees, grass, a pool, and other objects. The camouflaged tent (CT) is used to simulate illegal buildings, and the vehicle covered with the camouflaged net is used to simulate camouflaged vehicles (CV). These two targets are covered by the AIRC-net. In addition, the reference images of the two targets are shown in Figure 20c and Figure 20d. The previous experiments prove that the performance of the CEM algorithm is better than the other two comparison algorithms (ACE and HCEM), so we only take the CEM algorithm as a comparison in this scenario. It should be noted that in this scenario, the CEM algorithm is combined with a fixed threshold of 0.7, and the value of the parameter t of the t-OTSU algorithm is set to 0.65.  The detection results of the two targets are shown in Figure 21a,b, respectively. Compared with the previous scenes, such as the Lawn scene and the BT scene, the performance of the CEM algorithm significantly declines in this scenario, and it is difficult to identify the position of the camouflaged targets. In our method, the completeness of the contour  Figure 22 shows that the performance of the proposed method is significantly better than the CEM algorithm. In addition, the advantages of the t-OTSU algorithm gradually become apparent as the number of bands decreases. Interestingly, the proposed method has better detection results for the CV target than the CT targets. We analyze that the reason may lie in the fact that the CT target is propped up, with a less flat surface and more folds, which leads to an increase in spectral variability that reduces the target detection performance [7]. The CT target surface is flatter and more regular in shape, which is more favorable for our algorithm. When the number of bands is reduced to four, the detection accuracy of the proposed algorithm decreases significantly, and we consider that it may be caused by the unsuitability of the selected bands. Since these four common bands are selected based on the previous experimental scenarios, the UAV scene varies considerably relative to the previous scenes, and other more representative band subsets exist in the UAV scene. In general, the proposed algorithm is more robust than other algorithms, showing good detection performance in this complex UAV scene. The detection results of the two targets are shown in Figure 21a,b, respectively. Compared with the previous scenes, such as the Lawn scene and the BT scene, the performance of the CEM algorithm significantly declines in this scenario, and it is difficult to identify the position of the camouflaged targets. In our method, the completeness of the contour decreases as the bands decrease, but the position of the camouflaged targets can still be easily distinguished. When the number of bands reduces to four, the performance of the proposed algorithm degrades greatly. F using different methods for the two targets. Figure 22 shows that the performance of the proposed method is significantly better than the CEM algorithm. In addition, the advantages of the t-OTSU algorithm gradually In the Supplementary Materials, we also use the proposed algorithm to detect the targets one by one in the same scene. The comparison with other algorithms further illustrates the effectiveness and robustness of our algorithm.

Parameter Analysis
Regarding the choice of threshold, the threshold is usually set larger to improve the accuracy of target detection in related applications of HSI [3]. However, as demonstrated in the experiment, setting a larger fixed threshold in the MSI will lead to an incomplete target contour, and the OTSU algorithm is greatly affected by the size of the target. In the scene where the target is relatively small, the target segmentation accuracy of the OTSU algorithm is worse. To address the problem, based on the analysis of the contour of the camouflaged target, we proposed the t-OTSU algorithm and the ORE algorithm. As an important parameter in the proposed t-OTSU algorithm, t greatly affects the performance and robustness of the proposed method. Therefore, in order to determine the appropriate value of t, experiments with different values of t are conducted to obtain the mean values of evaluation metrics for several scenes.
As shown in Table 3, the values of A C , F a , and F 1 of the proposed method show a decreasing trend. This indicates that the performance of the proposed algorithm decreases as the value of t increases from 0.3 to 0.6. When t is set to 0.2 and 0.3, there is an unobvious change in the performance of the algorithm. In order to improve the accuracy of target detection in most scenes, the parameter t is set to 0.3.

Discussion
In our experiments, the band selection method was utilized to obtain MSI containing only a few bands and thus reduce the computational effort. Experimental results show that as the number of bands decreases, the performance of the comparison algorithms also declines [37]. Interestingly, we noticed that different methods show different sensitivities to changes in the number of bands. The ACE algorithm is the most sensitive in that its performance declines the most as the number of bands decreases. The reduction of bands also leads to an improvement in the performance of specific algorithms, which means that more but redundant spectral bands may be more harmful than fewer but non-redundant bands [21].
In addition, an important influencing factor in the practical applications of spectral image target detection is the lighting conditions. Experiments prove that the combination of a few bands and poor illumination conditions is a disaster for the target detection applications in MSI. The performance of the recently proposed HCEM algorithm is even worse than the original CEM algorithm. The main reason may be that the reweighted target spectrum in each layer gradually deviates from the real target spectrum due to the interference of the background noise caused by the poor lighting condition. It also suggests that the algorithms that perform well in the laboratory may not necessarily be superior in real-world applications [5].
Combined with the subsequent UAV experiments, it can be found that the proposed algorithm has a better detection capability for the camouflaged targets with regular shapes and flat surfaces. For irregularly shaped targets, the spectral inconsistency brought by their surface shadows or folds weakens the performance of the algorithms.
At the same time, we compare the UAV experiment with the results of the previous three experiments and find out that the performance of the algorithm slightly declines. The reason is that the UAV scenario is more complex than the other three experimental scenarios, and the original waveband subset is not fully suitable for this scenario. This also shows that in order to achieve excellent target detection with fewer bands, it is necessary to select different subsets of bands for different scenarios [13,14].
The flow chart of the proposed algorithm ( Figure 1) shows that the CEM detector plays a key role in distinguishing the camouflaged nets and vegetation, and the ORE algorithm plays a critical role in obtaining an accurate and complete target contour. Although the results show that the performance of CEM-T is better than that of CEM-t-OTSU in simple scenarios, the t-OTSU algorithm showed its advantages when the scene becomes more complicated, such as the last UAV scene. In summary, the t-OTSU algorithm is mainly used to adaptively segment out the complete target contour, contributing to the robustness of the proposed algorithm in different scenarios. Meanwhile, the ORE algorithm is used to accurately obtain the target region, playing a significant role in improving the performance of our algorithm. For the application of camouflaged targets detection, these two algorithms are both indispensable in our study. In general, this algorithm can overcome influencing factors such as low spectral resolution and poor lighting conditions and is more suitable for fast and high-precision detection of unoccluded targets within 100 m. However, when the light intensity in the scene is lower than 800 lux, the detection performance of our algorithm is significantly reduced. Meanwhile, it should be noted that this algorithm is not suitable for the detection of unknown camouflaged targets, which is also a common problem of algorithms such as CEM and HCEM.

Conclusions
In this work, we introduce a rapid and accurate camouflaged targets detection method by using a snapshot multispectral camera. Firstly, we screen several general-purpose bands used for the detection of the camouflaged targets based on snapshot multispectral imaging technology and band selection methods. Additionally, a method based on the constrained energy minimization (CEM) algorithm and improved OTSU algorithm is proposed to adaptively segment the camouflaged target region in MSIs. The CEM detector is used to obtain the initial detection results. Then, the improved OTSU algorithm adds a minimum threshold t to segment the target region. In order to eliminate the interference of misjudged areas and obtain a complete camouflaged target contour, the object region extraction (ORE) algorithm is proposed. The experimental results of two different targets in four typical urban scenes show that our proposed algorithms achieve better performance of camouflaged target detection using only four bands as compared to other algorithms that using all 25 multispectral bands, which is of great importance in practical applications. The proposed method achieves A C over 0.99, F a while remains under 0.002, and F 1 is over 0.9. In addition, the proposed method also exhibits the best robustness in experiments using multiple focal length lenses to simulate a variation in imaging distances.
A snapshot multispectral camera is a relatively new imaging device in recent years. It is extremely limited to the select matched spatial resolution and band range of a snapshot multispectral camera. In the future, the snapshot camera could be customized so that it is more suitable for target detection according to the proposed algorithm. Meanwhile, we will build a data set of camouflaged targets based on the snapshot multispectral camera, using deep learning-based algorithms to detect unknown camouflaged targets in complex scenes under long-distance, poor lighting conditions.